``` Origin story of Pandas ``` :
1. in 2008 , the pandas development team was created by Wes McKinney and the other members of the Data Wrangling for Python (PyData) community.
2. The first version of pandas was released in 2009.

``` Pandas ``` : 
pandas is a Python library that is used for analyzing and manipulating data. It is known for its data structures and data analysis tools.

``` Use Cases ``` :
1. Data Cleaning
2. Data Analysis
3. Data Transformation
4. Data Visualization (Basic level)
5. Data Aggregation
6. File Handling
7. Data Filtering and Selection
8. Time Series Analysis

``` Series ```:
A series is a one dimensional labeled array that can hold any data type (integer, string, float, python objects, etc.). The axis labels are collectively called the index.


In [80]:
## multiple ways to create a Series.
import numpy as np
import pandas as pd


In [81]:
labels = ['a','b','c','d']
my_list = [10,20,30,40]
arr = np.array([10,20,30,40])
d = {'d':10,'b':20,'e':30,'d':40}

In [82]:
pd.Series(my_list)

0    10
1    20
2    30
3    40
dtype: int64

In [83]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
d    40
dtype: int64

In [84]:
pd.Series(arr)

0    10
1    20
2    30
3    40
dtype: int32

In [85]:
pd.Series(d)

d    40
b    20
e    30
dtype: int64

``` Data Frames ```:        
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a database table, or a series of related arrays.

In [86]:
data = {
    "Name " : ["Ajay","Binod","Chotu","Dhruv"],
    "Age" : [28,34,29,42],
    "City" : ["Delhi","Mumbai","Pune","Chennai"],
    "Salary" : [61000,42000,30000,40000],
}

In [87]:
df=pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Salary
0,Ajay,28,Delhi,61000
1,Binod,34,Mumbai,42000
2,Chotu,29,Pune,30000
3,Dhruv,42,Chennai,40000


In [88]:
data_list = [
    ['John', 28, 'New York', 65000],
    ['Anna', 34, 'Paris', 70000],
    ['Peter', 29, 'Berlin', 62000],
    ['Linda', 42, 'London', 85000]
]
df2 = pd.DataFrame(data_list)
df2

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [89]:
columns = ['Name','Age','City','Salary']
df3 = pd.DataFrame(data_list,columns=columns)
df3

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [90]:
df3['Name']

0     John
1     Anna
2    Peter
3    Linda
Name: Name, dtype: object

In [91]:
## to slice multiple columns
df3[['Name','Salary']]

Unnamed: 0,Name,Salary
0,John,65000
1,Anna,70000
2,Peter,62000
3,Linda,85000


In [92]:
## to add new column
df3['Designation']= ['Manager','Clerk','Engineer','Doctor']
df3

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Manager
1,Anna,34,Paris,70000,Clerk
2,Peter,29,Berlin,62000,Engineer
3,Linda,42,London,85000,Doctor


In [93]:
## to remove a column
df3.drop('Designation',axis=1)

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [94]:
## to permanently remove a column
df3.drop('Designation',axis=1,inplace=True)
df3

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [95]:
## to selecting a row
df3.loc[0] ## loc means location

Name          John
Age             28
City      New York
Salary       65000
Name: 0, dtype: object

In [96]:
## iloc  means index location
df3.iloc[0]

Name          John
Age             28
City      New York
Salary       65000
Name: 0, dtype: object

In [97]:
## subset of rows
df3.iloc[[0,1]] 

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000


In [98]:
## subset of columns
df3[['Name','Age']]

Unnamed: 0,Name,Age
0,John,28
1,Anna,34
2,Peter,29
3,Linda,42


In [99]:
## subset of rowns and columns
df3.loc[[0,1],['Name','Age']]

Unnamed: 0,Name,Age
0,John,28
1,Anna,34


In [100]:
df3.loc[[2,3]][['Name','Age']]

Unnamed: 0,Name,Age
2,Peter,29
3,Linda,42


In [101]:
## Conditional Selection
df3

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [102]:
df3[df3['Age']>30]

Unnamed: 0,Name,Age,City,Salary
1,Anna,34,Paris,70000
3,Linda,42,London,85000


In [103]:
df3[(df3['Age']>30) & (df3['City'] == 'Paris')]

Unnamed: 0,Name,Age,City,Salary
1,Anna,34,Paris,70000


``` Missing Data ``` :

In [104]:
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, np.nan],
    'D': [1, np.nan, np.nan, np.nan, 5]
}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C,D
0,1.0,,1.0,1.0
1,2.0,2.0,2.0,
2,,3.0,3.0,
3,4.0,4.0,,
4,5.0,5.0,,5.0


In [105]:
df.isna()

Unnamed: 0,A,B,C,D
0,False,True,False,False
1,False,False,False,True
2,True,False,False,True
3,False,False,True,True
4,False,False,True,False


In [106]:
df.isna().sum()

A    1
B    1
C    2
D    3
dtype: int64

In [107]:
## to check which columns have null values
df.isna().any()

A    True
B    True
C    True
D    True
dtype: bool

In [109]:
## to remove null values 
## it work on row basis
df.dropna() ## this will drop all rows beacuse all rows have null values

Unnamed: 0,A,B,C,D


In [None]:
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [1, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, np.nan],
    'D': [1, np.nan, np.nan, np.nan, 5]
}
df = pd.DataFrame(data)
df
df.dropna() ## herre one row doesn't have null values so it remain and other got remove

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0


In [111]:
df

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
2,,3,3.0,
3,4.0,4,,
4,5.0,5,,5.0


In [114]:
df.dropna(thresh = 3) 
## thresh is used for conditional statement 
## here thresh =3 means row must contain 3 non-null values to not get remove

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
4,5.0,5,,5.0


In [115]:
df

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
2,,3,3.0,
3,4.0,4,,
4,5.0,5,,5.0


In [118]:
## to fill missing alues
df.fillna(0)

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,0.0
2,0.0,3,3.0,0.0
3,4.0,4,0.0,0.0
4,5.0,5,0.0,5.0


In [None]:
## to fill null values different for every column
values = {'A':100 ,'B':200 ,'C': 300,'D': 400}
df.fillna(values)

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,400.0
2,100.0,3,3.0,400.0
3,4.0,4,300.0,400.0
4,5.0,5,300.0,5.0
