# Pandas
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

## Why Pandas:
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

### Importing Pandas

In [1]:
import pandas as pd

In [2]:
# pandas version
print(pd.__version__)

1.3.4


# DataFrames:
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

## Creating DataFrames:

In [3]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

   calories  duration
0       420        50
1       380        40
2       390        45


### Named Indexes

In [4]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)
print("\n")
print(df.loc["day2"])

      calories  duration
day1       420        50
day2       380        40
day3       390        45


calories    380
duration     40
Name: day2, dtype: int64


### Dataframe from dict narrays / lists:

In [5]:
# intialise data of dict/lists.
data = {'Name':
               ['Tom', 'nick', 'krish', 'jack'],
        'Age':
               [20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

    Name  Age
0    Tom   20
1   nick   21
2  krish   19
3   jack   18


## Adding new column to existing DataFrame in Pandas

### By declaring a new list as a column. 

In [41]:
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
		'Height': [5.1, 6.2, 5.1, 5.2],
		'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']

# Using 'Address' as the column name
# and equating it to the list
df['Address'] = address

# Observe the result
print(df)


     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna


## By using DataFrame.insert()

In [46]:
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
		'Height': [5.1, 6.2, 5.1, 5.2],
		'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# df = pd.DataFrame(data,index=['w','x','y','z'])

# Using DataFrame.insert() to add a column
df.insert(2, "Age", [21, 23, 24, 21], True)

# Observe the result
print(df)


     Name  Height  Age Qualification
0     Jai     5.1   21           Msc
1  Princi     6.2   23            MA
2  Gaurav     5.1   24           Msc
3    Anuj     5.2   21           Msc


## By using a dictionary

In [73]:
data = {'Name': ['Jai', 'Pranay', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MCS', 'MBA', 'BDA']}
  
# Define a dictionary with key values of
# an existing column and their respective
# value pairs as the # values for our new column.
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
           'Patna': 'Gaurav', 'Chennai': 'Anuj'}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
  
# Provide 'Address' as the column name
df['Address'] = address
  
# Observe the output
print(df)

     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Pranay     6.2           MCS  Bangalore
2  Gaurav     5.1           MBA      Patna
3    Anuj     5.2           BDA    Chennai


### Using Dataframe.assign() method

In [63]:
df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])
  
# Observe the result
print(df2)

     Name  Height Qualification    Address    address
0     Jai     5.1           Msc      Delhi      Delhi
1  Pranay     6.2           MCS  Bangalore  Bangalore
2  Gaurav     5.1           MBA      Patna    Chennai
3    Anuj     5.2           BDA    Chennai      Patna


## Delete Columns: drop()

In [72]:
print(df)
df2=df.drop(['Address'],axis = 1)
df2

     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Pranay     6.2           MCS  Bangalore
2  Gaurav     5.1           MBA      Patna
3    Anuj     5.2           BDA    Chennai


Unnamed: 0,Name,Height,Qualification
0,Jai,5.1,Msc
1,Pranay,6.2,MCS
2,Gaurav,5.1,MBA
3,Anuj,5.2,BDA


### truncate()

In [77]:
print(df2)

result = df.truncate(before = '0', after = '2')
  
# Print the result
print(result)

     Name  Height Qualification
0     Jai     5.1           Msc
1  Pranay     6.2           MCS
2  Gaurav     5.1           MBA
3    Anuj     5.2           BDA
     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Pranay     6.2           MCS  Bangalore
2  Gaurav     5.1           MBA      Patna


## Dealing with Rows and Columns

### Column selection:

In [39]:

# Define a dictionary containing employee data
data = {'Name':['Jai', 'Pranay', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Pune', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
 
# select two columns
print(df[['Name', 'Qualification']])

     Name Qualification
0     Jai           Msc
1  Pranay            MA
2  Gaurav           MCA
3    Anuj           Phd


### Row Selection:

In [7]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)
print("\n")
print(df.loc[[0, 1]])
print("\n")
print(df.loc[2])

   calories  duration
0       420        50
1       380        40
2       390        45


   calories  duration
0       420        50
1       380        40


calories    390
duration     45
Name: 2, dtype: int64


## Indexing and Selecting Data:
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

### Selecting a single columns

In [8]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

first = data["calories"]

print(first) 

[420, 380, 390]


### Selecting a single row

In [9]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

first = df.loc["day1"]
second = df.loc["day2"]


print(first,"\n",second) 

calories    420
duration     50
Name: day1, dtype: int64 
 calories    380
duration     40
Name: day2, dtype: int64


### Selecting Row using iloc

In [10]:
row2 = df.iloc[2] 
print(row2)


calories    390
duration     45
Name: day3, dtype: int64


## Loading CSV into Dataframe

In [11]:
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')

print(df.head())
#print(df.to_string())

      Island  CulmenLength  CulmenDepth  FlipperLength  BodyMass  Species
0  Torgersen          39.1         18.7          181.0    3750.0        0
1  Torgersen          39.5         17.4          186.0    3800.0        0
2  Torgersen          40.3         18.0          195.0    3250.0        0
3  Torgersen           NaN          NaN            NaN       NaN        0
4  Torgersen          36.7         19.3          193.0    3450.0        0


## Working with Missing Values

In [12]:
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')

# using isnull() function  
df.isnull()

Unnamed: 0,Island,CulmenLength,CulmenDepth,FlipperLength,BodyMass,Species
0,False,False,False,False,False,False
1,False,False,False,False,False,False
2,False,False,False,False,False,False
3,False,True,True,True,True,False
4,False,False,False,False,False,False
...,...,...,...,...,...,...
339,False,False,False,False,False,False
340,False,False,False,False,False,False
341,False,False,False,False,False,False
342,False,False,False,False,False,False


In [13]:
#fill missing values using fillna()
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')
df.fillna(0)

Unnamed: 0,Island,CulmenLength,CulmenDepth,FlipperLength,BodyMass,Species
0,Torgersen,39.1,18.7,181.0,3750.0,0
1,Torgersen,39.5,17.4,186.0,3800.0,0
2,Torgersen,40.3,18.0,195.0,3250.0,0
3,Torgersen,0.0,0.0,0.0,0.0,0
4,Torgersen,36.7,19.3,193.0,3450.0,0
...,...,...,...,...,...,...
339,Dream,55.8,19.8,207.0,4000.0,2
340,Dream,43.5,18.1,202.0,3400.0,2
341,Dream,49.6,18.2,193.0,3775.0,2
342,Dream,50.8,19.0,210.0,4100.0,2


In [14]:
#dropping null values using dropna()
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')

df.dropna()


Unnamed: 0,Island,CulmenLength,CulmenDepth,FlipperLength,BodyMass,Species
0,Torgersen,39.1,18.7,181.0,3750.0,0
1,Torgersen,39.5,17.4,186.0,3800.0,0
2,Torgersen,40.3,18.0,195.0,3250.0,0
4,Torgersen,36.7,19.3,193.0,3450.0,0
5,Torgersen,39.3,20.6,190.0,3650.0,0
...,...,...,...,...,...,...
339,Dream,55.8,19.8,207.0,4000.0,2
340,Dream,43.5,18.1,202.0,3400.0,2
341,Dream,49.6,18.2,193.0,3775.0,2
342,Dream,50.8,19.0,210.0,4100.0,2


# Iterating over rows and columns
Iteration is a general term for taking each item of something, one after another.

Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary.

## Iterating over rows :

In [15]:
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')
for i, j in df.iterrows():
    print(i, j)
    print()

0 Island           Torgersen
CulmenLength          39.1
CulmenDepth           18.7
FlipperLength        181.0
BodyMass            3750.0
Species                  0
Name: 0, dtype: object

1 Island           Torgersen
CulmenLength          39.5
CulmenDepth           17.4
FlipperLength        186.0
BodyMass            3800.0
Species                  0
Name: 1, dtype: object

2 Island           Torgersen
CulmenLength          40.3
CulmenDepth           18.0
FlipperLength        195.0
BodyMass            3250.0
Species                  0
Name: 2, dtype: object

3 Island           Torgersen
CulmenLength           NaN
CulmenDepth            NaN
FlipperLength          NaN
BodyMass               NaN
Species                  0
Name: 3, dtype: object

4 Island           Torgersen
CulmenLength          36.7
CulmenDepth           19.3
FlipperLength        193.0
BodyMass            3450.0
Species                  0
Name: 4, dtype: object

5 Island           Torgersen
CulmenLength          39.3
Culm

Name: 54, dtype: object

55 Island           Biscoe
CulmenLength       41.4
CulmenDepth        18.6
FlipperLength     191.0
BodyMass         3700.0
Species               0
Name: 55, dtype: object

56 Island           Biscoe
CulmenLength       39.0
CulmenDepth        17.5
FlipperLength     186.0
BodyMass         3550.0
Species               0
Name: 56, dtype: object

57 Island           Biscoe
CulmenLength       40.6
CulmenDepth        18.8
FlipperLength     193.0
BodyMass         3800.0
Species               0
Name: 57, dtype: object

58 Island           Biscoe
CulmenLength       36.5
CulmenDepth        16.6
FlipperLength     181.0
BodyMass         2850.0
Species               0
Name: 58, dtype: object

59 Island           Biscoe
CulmenLength       37.6
CulmenDepth        19.1
FlipperLength     194.0
BodyMass         3750.0
Species               0
Name: 59, dtype: object

60 Island           Biscoe
CulmenLength       35.7
CulmenDepth        16.9
FlipperLength     185.0
BodyMass        

214 Island           Biscoe
CulmenLength       45.7
CulmenDepth        13.9
FlipperLength     214.0
BodyMass         4400.0
Species               1
Name: 214, dtype: object

215 Island           Biscoe
CulmenLength       54.3
CulmenDepth        15.7
FlipperLength     231.0
BodyMass         5650.0
Species               1
Name: 215, dtype: object

216 Island           Biscoe
CulmenLength       45.8
CulmenDepth        14.2
FlipperLength     219.0
BodyMass         4700.0
Species               1
Name: 216, dtype: object

217 Island           Biscoe
CulmenLength       49.8
CulmenDepth        16.8
FlipperLength     230.0
BodyMass         5700.0
Species               1
Name: 217, dtype: object

218 Island           Biscoe
CulmenLength       46.2
CulmenDepth        14.4
FlipperLength     214.0
BodyMass         4650.0
Species               1
Name: 218, dtype: object

219 Island           Biscoe
CulmenLength       49.5
CulmenDepth        16.2
FlipperLength     229.0
BodyMass         5800.0
Specie

Name: 301, dtype: object

302 Island            Dream
CulmenLength       50.5
CulmenDepth        18.4
FlipperLength     200.0
BodyMass         3400.0
Species               2
Name: 302, dtype: object

303 Island            Dream
CulmenLength       49.5
CulmenDepth        19.0
FlipperLength     200.0
BodyMass         3800.0
Species               2
Name: 303, dtype: object

304 Island            Dream
CulmenLength       46.4
CulmenDepth        17.8
FlipperLength     191.0
BodyMass         3700.0
Species               2
Name: 304, dtype: object

305 Island            Dream
CulmenLength       52.8
CulmenDepth        20.0
FlipperLength     205.0
BodyMass         4550.0
Species               2
Name: 305, dtype: object

306 Island            Dream
CulmenLength       40.9
CulmenDepth        16.6
FlipperLength     187.0
BodyMass         3200.0
Species               2
Name: 306, dtype: object

307 Island            Dream
CulmenLength       54.2
CulmenDepth        20.8
FlipperLength     201.0
Body

## Iterating over Columns :

In [16]:
df = pd.read_csv('C:/Users/Pranay/Downloads/penguins.csv')
df.head()


Unnamed: 0,Island,CulmenLength,CulmenDepth,FlipperLength,BodyMass,Species
0,Torgersen,39.1,18.7,181.0,3750.0,0
1,Torgersen,39.5,17.4,186.0,3800.0,0
2,Torgersen,40.3,18.0,195.0,3250.0,0
3,Torgersen,,,,,0
4,Torgersen,36.7,19.3,193.0,3450.0,0


In [17]:
# creating a list of dataframe columns
columns = list(df)
 
for i in columns:
 
    # printing the third element of the column
    print (df[i][3])

Torgersen
nan
nan
nan
nan
0


# Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.



In [23]:
a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

0    1
1    7
2    2
dtype: int64


# Creating Series

In [25]:
import pandas as pd
 
# import numpy as np
import numpy as np
 
# simple array
data = np.array(['p','r','a','n','a','y'])
 
ser = pd.Series(data)
print(ser)

0    p
1    r
2    a
3    n
4    a
5    y
dtype: object


In [26]:
#Creating a series from Lists:
list = ['g', 'e', 'e', 'k', 's']
  
# create series form a list
ser = pd.Series(list)
print(ser)

0    g
1    e
2    e
3    k
4    s
dtype: object


In [34]:
#Creating a series from Dictionary:

dict = {'Geeks': 10,
		'for': 20,
		'geeks': 30}

# create series from dictionary
ser = pd.Series(dict)

print(ser)


Geeks    10
for      20
geeks    30
dtype: int64


In [35]:
#Creating a series from Scalar value: 
ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])
 
print(ser)

0    10
1    10
2    10
3    10
4    10
5    10
dtype: int64


In [36]:
#Creating a Series using range function:
import pandas as pd
ser=pd.Series(range(10))
print(ser)


0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64


# Create Labels

In [29]:
a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)


x    1
y    7
z    2
dtype: int64
7


In [30]:
print(myvar["y"])

7


### Key/Value Objects as Series

In [51]:
calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)


day1    420
day2    380
day3    390
dtype: int64


In [52]:
calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)

day1    420
day2    380
dtype: int64
