In [3]:
import pandas as pd

Important objects in Pandas - **DataFrame** and **Series**

## DataFrames

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [4]:
example = pd.DataFrame({'Apples':[10, 20, 30], 'Banana': [30, 40, 30]}, index=['2001', '2002', '2003'])
print(example)

      Apples  Banana
2001      10      30
2002      20      40
2003      30      30


In [5]:
movies = pd.DataFrame({'James Bond': ['Tomorrow Never Dies', 'Casino Royale'], 'Jason Bourne' : ['Bourne Identity', 'Bourne Supremacy']}, index=['2010', '2020'])
print(movies)

               James Bond      Jason Bourne
2010  Tomorrow Never Dies   Bourne Identity
2020        Casino Royale  Bourne Supremacy


## Series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list.
A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name. The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together".

In [6]:
basic_series = pd.Series([1, 2, 3, 4, 5])
print(basic_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


## Reading Data Files

In [7]:
read_reviews = pd.read_csv('movies.csv')
read_reviews.head()

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,70,1.747542,64,$41.94,2008
1,Youth in Revolt,Comedy,The Weinstein Company,52,1.09,68,$19.62,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,35,1.211818,43,$26.66,2010
3,When in Rome,Comedy,Disney,44,0.0,15,$43.04,2010
4,What Happens in Vegas,Comedy,Fox,72,6.267647,28,$219.37,2008


## Writing the dataframe to CSV

In [8]:
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals
animals.to_csv('cows_and_goats.csv')

## Indexing in Pandas

In Pandas, `loc` and `iloc` are used for indexing and selecting data from a DataFrame, but they have some differences in how they work.

1. `loc`:
   - `loc` is primarily label-based indexing. It is used to select data based on the labels (row and column names) in the DataFrame.
   - The syntax for using `loc` is `df.loc[row_label, column_label]`, where `row_label` and `column_label` can be single values, lists, slices, or boolean arrays.
   - When using `loc`, both the start and stop indices are included, similar to how slices work in Python.
   - The labels you use with `loc` must exist in the DataFrame; otherwise, you'll get a `KeyError`.

Example:
```python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df.loc[1, 'Name'])  # Output: Bob
print(df.loc[1:2, 'Age'])  # Output: 1    30
                           #         2    22
                           #         Name: Age, dtype: int64
```

2. `iloc`:
   - `iloc` is primarily integer-location based indexing. It is used to select data based on the integer indices of the rows and columns in the DataFrame.
   - The syntax for using `iloc` is `df.iloc[row_index, column_index]`, where `row_index` and `column_index` can be single integers, lists of integers, slices, or boolean arrays.
   - When using `iloc`, the start index is included, but the stop index is excluded, just like standard Python slicing.

Example:
```python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df.iloc[1, 0])  # Output: Bob
print(df.iloc[1:3, 1])  # Output: 1    30
                        #         2    22
                        #         Name: Age, dtype: int64
```

In summary, `loc` is used when you want to select data based on labels (names), while `iloc` is used when you want to select data based on integer indices. Both are powerful tools for data selection in Pandas and are useful in different scenarios.

In [9]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df.loc[1, 'Name'])  # Output: Bob
print(df.loc[1:2, 'Age'])  # Output: 1    30
                           #         2    22
                           #         Name: Age, dtype: int64

Bob
1    30
2    22
Name: Age, dtype: int64


In [10]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df.iloc[1, 0])  # Output: Bob
print(df.iloc[1:3, 1])  # Output: 1    30
                        #         2    22
                        #         Name: Age, dtype: int64

Bob
1    30
2    22
Name: Age, dtype: int64
