# Pandas 


## Data Structure with Pandas 

In [7]:
import pandas as pd
import numpy as np

Pandas has two main data structures:
- DataFrame, which is two-dimensional
- Series, which is one-dimensional

### DataFrame Pandas 

In [4]:
# Creating a DataFrame using lists

myListe = [['Alice', 'Bob', 'Charlie'],[25, 30, 35],['New York', 'Los Angeles', 'Chicago']]

myDataFrame = pd.DataFrame(myListe)
myDataFrame


Unnamed: 0,0,1,2
0,Alice,Bob,Charlie
1,25,30,35
2,New York,Los Angeles,Chicago


As we haven't supplied any column labels, Pandas assigns automatically assigns numeric column labels to each column.

In [6]:
# Let's create another DataFrame using the same list, but this time with customized column labels
myListe = [['Alice', 'Bob', 'Charlie'],[25, 30, 35],['New York', 'Los Angeles', 'Chicago']]

myDataFrame = pd.DataFrame(myListe,columns=['names','ages','cities'])
myDataFrame

Unnamed: 0,names,ages,cities
0,Alice,Bob,Charlie
1,25,30,35
2,New York,Los Angeles,Chicago


### convert a NumPy Array into a DataFrame using the same method.

In [9]:
my_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
myDataFrame = pd.DataFrame(my_array,columns=['col1','col2','col3'])
print(myDataFrame)

   col1  col2  col3
0     1     2     3
1     4     5     6
2     7     8     9


### Creating a DataFrame using a dictionary


- We can also pass a dictionary to the pandas.DataFrame() function to
create a DataFrame.

In [11]:
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
cities = ['New York', 'Los Angeles', 'Chicago']

data = {
    'Name': names,
    'Age': ages,
    'City': cities
}
myDataFrame = pd.DataFrame(data)
print(myDataFrame)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


### Loading a csv file as a DataFrame

In [17]:
df = pd.read_csv(f'Data\cereals.csv')
df

Unnamed: 0,name,calories,protein,vitamins,rating
0,100% Bran,70,4,25,68.402973
1,100% Natural Bran,120,3,0,33.983679
2,All-Bran,70,4,25,59.425505
3,All-Bran with Extra Fiber,50,4,25,93.704912
4,Almond Delight,110,2,25,34.384843
5,Apple Cinnamon Cheerios,110,2,25,29.509541
6,Apple Jacks,110,2,25,33.174094
7,Basic 4,130,3,25,37.038562
8,Bran Chex,90,2,25,49.120253
9,Bran Flakes,90,3,25,53.313813


We can define one of the existing columns as the DataFrame's new index column using the .set_index() function.

In [19]:
df.set_index('name',inplace=True)
df

Unnamed: 0_level_0,calories,protein,vitamins,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100% Bran,70,4,25,68.402973
100% Natural Bran,120,3,0,33.983679
All-Bran,70,4,25,59.425505
All-Bran with Extra Fiber,50,4,25,93.704912
Almond Delight,110,2,25,34.384843
Apple Cinnamon Cheerios,110,2,25,29.509541
Apple Jacks,110,2,25,33.174094
Basic 4,130,3,25,37.038562
Bran Chex,90,2,25,49.120253
Bran Flakes,90,3,25,53.313813


head()

In [20]:
# The first five elements 
df.head()

Unnamed: 0_level_0,calories,protein,vitamins,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100% Bran,70,4,25,68.402973
100% Natural Bran,120,3,0,33.983679
All-Bran,70,4,25,59.425505
All-Bran with Extra Fiber,50,4,25,93.704912
Almond Delight,110,2,25,34.384843


tail()

In [21]:
# The last five elements 
df.tail()

Unnamed: 0_level_0,calories,protein,vitamins,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Apple Cinnamon Cheerios,110,2,25,29.509541
Apple Jacks,110,2,25,33.174094
Basic 4,130,3,25,37.038562
Bran Chex,90,2,25,49.120253
Bran Flakes,90,3,25,53.313813


### Statistical summary


- We can use the describe() function to obtain a quick statistical summary of each column in the DataFrame.

In [22]:
df.describe()

Unnamed: 0,calories,protein,vitamins,rating
count,10.0,10.0,10.0,10.0
mean,95.0,2.9,22.5,49.205817
std,25.495098,0.875595,7.905694,20.315297
min,50.0,2.0,0.0,29.509541
25%,75.0,2.0,25.0,34.08397
50%,100.0,3.0,25.0,43.079408
75%,110.0,3.75,25.0,57.897582
max,130.0,4.0,25.0,93.704912
