![](logo.png)
## Day Objectives

# Pandas
- Pandas is a built in library using for data analysis. You'll be using Pandas heavily for data manipulation, visualisation, building machine learning models, etc.
- Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.
- There are two main data structures in Pandas - Series and Dataframes. The default way to store data is dataframes, and thus manipulating dataframes quickly is probably the most important skill set for data analysis.
- Source: https://pandas.pydata.org/pandas-docs/stable/overview.html

In [1]:
import pandas as pd

# Pandas Series
- we can create series by using
    - list
    - tuples
    - dict
    - numpy
    - date range

In [2]:
pd.__version__

'1.0.5'

In [3]:
# Pandas Series using List
li = [2,3,45,56,56]
s1 = pd.Series(li)
s1

0     2
1     3
2    45
3    56
4    56
dtype: int64

In [4]:
type(s1)

pandas.core.series.Series

In [5]:
# using tuple
t = (2,2.3,4.56,"apssdc")
s2 = pd.Series(t)
s2

0         2
1       2.3
2      4.56
3    apssdc
dtype: object

In [6]:
# using Dict
di = {"Name":"Swathi","Pin":"0006","College":"BVC"}
s3 = pd.Series(di)
s3

Name       Swathi
Pin          0006
College       BVC
dtype: object

In [9]:
# using numpy
import numpy as np
n1 = np.array([12,3,4.5,"SDC"])
print(n1)
s4 = pd.Series(n1)
s4

['12' '3' '4.5' 'SDC']


0     12
1      3
2    4.5
3    SDC
dtype: object

In [11]:
# re-indexing
s4.index = ["a",23,2.34,"d"]
s4

a        12
23        3
2.34    4.5
d       SDC
dtype: object

## Slicing & Indexing

- Indexing : Accessing Particular Element
- Slicing  : accessing Sub data or some data of given range

In [14]:
s4.index

Index(['a', 23, 2.34, 'd'], dtype='object')

In [15]:
s4["a"]

'12'

In [16]:
s3

Name       Swathi
Pin          0006
College       BVC
dtype: object

In [17]:
s3["Name"]

'Swathi'

In [18]:
s3["Pin"]

'0006'

In [19]:
s1

0     2
1     3
2    45
3    56
4    56
dtype: int64

In [20]:
s1[0]

2

In [21]:
s1[3]

56

In [22]:
s1

0     2
1     3
2    45
3    56
4    56
dtype: int64

In [23]:
s1[0::2]

0     2
2    45
4    56
dtype: int64

In [24]:
s1[1::2]

1     3
3    56
dtype: int64

In [25]:
s1[1:3] # 3 index exclusive

1     3
2    45
dtype: int64

In [26]:
s1[3:]

3    56
4    56
dtype: int64

In [27]:
# Fancy Slicing
s1[[1,3,2]]

1     3
3    56
2    45
dtype: int64

In [28]:
s5 = pd.date_range(start = "2021-06-29", end = "2021-07-17")
s5

DatetimeIndex(['2021-06-29', '2021-06-30', '2021-07-01', '2021-07-02',
               '2021-07-03', '2021-07-04', '2021-07-05', '2021-07-06',
               '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-10',
               '2021-07-11', '2021-07-12', '2021-07-13', '2021-07-14',
               '2021-07-15', '2021-07-16', '2021-07-17'],
              dtype='datetime64[ns]', freq='D')

In [31]:
s1.index = ["a","b","c"]

ValueError: Length mismatch: Expected axis has 5 elements, new values have 3 elements

```Note``` 
 **the Number of elements in the Index list is always equal to the number of
elements in the specified series.**

## Task
- Create one Pandas Series having 1 to 10 index and  values are squares of the index value
1 - 1
2 - 4
3 - 9
4 - 16
5 - 25
.
.
10 - 100

# Pandas DataFrame


In [32]:
# creating pandas df using List
li = [[1,2,3,34,4,5],[2,3,4,5,6,7]]
df1 = pd.DataFrame(li)
df1
# always column and row index values start from 0

Unnamed: 0,0,1,2,3,4,5
0,1,2,3,34,4,5
1,2,3,4,5,6,7


In [33]:
df1.shape # rows,columns

(2, 6)

In [35]:
t = ((1,2,3),(4,5,6))
df2 = pd.DataFrame(t, columns = ["A","B","C"])
df2

Unnamed: 0,A,B,C
0,1,2,3
1,4,5,6


In [36]:
df2.index = ["X","Y"]
df2

Unnamed: 0,A,B,C
X,1,2,3
Y,4,5,6


In [38]:
# using Dict
d1 = {
    "Name" : ["Teja","Swathi","Jahnavi","Chinnu", np.nan],
    "Branch" : ["CSe","ECE","CS","IT", "Civil"],
    "Gender" : ["Male","Female","Female","Female", np.nan]
}
df3 = pd.DataFrame(d1)
df3
# Nan - not a number

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male
1,Swathi,ECE,Female
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


In [39]:
df3.shape

(5, 3)

In [42]:
type(df3)

pandas.core.frame.DataFrame

In [43]:
df3.columns

Index(['Name', 'Branch', 'Gender'], dtype='object')

In [44]:
df3.index

RangeIndex(start=0, stop=5, step=1)

In [46]:
print(type(df3["Name"]))
df3["Name"] # it return pandas series

<class 'pandas.core.series.Series'>


0       Teja
1     Swathi
2    Jahnavi
3     Chinnu
4        NaN
Name: Name, dtype: object

In [48]:
df3[["Name","Branch"]] # sub df 

Unnamed: 0,Name,Branch
0,Teja,CSe
1,Swathi,ECE
2,Jahnavi,CS
3,Chinnu,IT
4,,Civil


In [49]:
df3[0] # 2-D

KeyError: 0

In [51]:
df3[0:1] # accesinng a particular row

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male


In [52]:
df3[2:3]

Unnamed: 0,Name,Branch,Gender
2,Jahnavi,CS,Female


In [53]:
df3[2:]

Unnamed: 0,Name,Branch,Gender
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


In [56]:
len(df3)

5

In [57]:
df3[3:4]

Unnamed: 0,Name,Branch,Gender
3,Chinnu,IT,Female


# iloc -- for accessing rows using integer indicies
# loc -- for accessing rows other than integer indicies

In [59]:
df3.iloc[0]

Name      Teja
Branch     CSe
Gender    Male
Name: 0, dtype: object

In [60]:
df3.iloc[4]

Name        NaN
Branch    Civil
Gender      NaN
Name: 4, dtype: object

In [61]:
df3.iloc[2:4]

Unnamed: 0,Name,Branch,Gender
2,Jahnavi,CS,Female
3,Chinnu,IT,Female


In [64]:
df3.loc[3, "Name"]

'Chinnu'

In [68]:
df3.loc[[3,2, 1],"Name"]

3     Chinnu
2    Jahnavi
1     Swathi
Name: Name, dtype: object

In [69]:
df3.loc[[1,4,2,3],["Name","Gender"]]

Unnamed: 0,Name,Gender
1,Swathi,Female
4,,
2,Jahnavi,Female
3,Chinnu,Female


# Indexing & Re-Indexing


In [70]:
df3.index = ["a","b","c","d","e"]
df3

Unnamed: 0,Name,Branch,Gender
a,Teja,CSe,Male
b,Swathi,ECE,Female
c,Jahnavi,CS,Female
d,Chinnu,IT,Female
e,,Civil,


In [72]:
df3.loc["b"]

Name      Swathi
Branch       ECE
Gender    Female
Name: b, dtype: object

In [75]:
df3.set_index("Name", inplace = True) 

In [76]:
df3

Unnamed: 0_level_0,Branch,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Teja,CSe,Male
Swathi,ECE,Female
Jahnavi,CS,Female
Chinnu,IT,Female
,Civil,


In [78]:
df3.loc["Teja"]

Branch     CSe
Gender    Male
Name: Teja, dtype: object

In [79]:
df3.loc["Jahnavi"]

Branch        CS
Gender    Female
Name: Jahnavi, dtype: object

In [82]:
df3.reset_index(inplace = True)

In [83]:
df3

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male
1,Swathi,ECE,Female
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


# Merging / Combining

In [84]:
df3

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male
1,Swathi,ECE,Female
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


In [85]:
d1 = {
    "Name" : ["Teja","Swathi","Jahnavi","Chinnu", np.nan],
    "Branch" : ["CSe","ECE","CS","IT", "Civil"],
    "PIN" : [567,654,456,456,np.nan]
}
df4 = pd.DataFrame(d1)
df4

Unnamed: 0,Name,Branch,PIN
0,Teja,CSe,567.0
1,Swathi,ECE,654.0
2,Jahnavi,CS,456.0
3,Chinnu,IT,456.0
4,,Civil,


In [87]:
pd.concat([df3,df4],axis = 0) # concat the df at row 
# axis =0  rows
# axis = 1 columns

Unnamed: 0,Name,Branch,Gender,PIN
0,Teja,CSe,Male,
1,Swathi,ECE,Female,
2,Jahnavi,CS,Female,
3,Chinnu,IT,Female,
4,,Civil,,
0,Teja,CSe,,567.0
1,Swathi,ECE,,654.0
2,Jahnavi,CS,,456.0
3,Chinnu,IT,,456.0
4,,Civil,,


In [88]:
pd.concat([df3,df4],axis = 1)

Unnamed: 0,Name,Branch,Gender,Name.1,Branch.1,PIN
0,Teja,CSe,Male,Teja,CSe,567.0
1,Swathi,ECE,Female,Swathi,ECE,654.0
2,Jahnavi,CS,Female,Jahnavi,CS,456.0
3,Chinnu,IT,Female,Chinnu,IT,456.0
4,,Civil,,,Civil,


In [89]:
df3.append(df4)

Unnamed: 0,Name,Branch,Gender,PIN
0,Teja,CSe,Male,
1,Swathi,ECE,Female,
2,Jahnavi,CS,Female,
3,Chinnu,IT,Female,
4,,Civil,,
0,Teja,CSe,,567.0
1,Swathi,ECE,,654.0
2,Jahnavi,CS,,456.0
3,Chinnu,IT,,456.0
4,,Civil,,


In [90]:
df4.append(df3)

Unnamed: 0,Name,Branch,PIN,Gender
0,Teja,CSe,567.0,
1,Swathi,ECE,654.0,
2,Jahnavi,CS,456.0,
3,Chinnu,IT,456.0,
4,,Civil,,
0,Teja,CSe,,Male
1,Swathi,ECE,,Female
2,Jahnavi,CS,,Female
3,Chinnu,IT,,Female
4,,Civil,,


In [91]:
pd.merge(df3,df4) # it return common data/ records from each df

Unnamed: 0,Name,Branch,Gender,PIN
0,Teja,CSe,Male,567.0
1,Swathi,ECE,Female,654.0
2,Jahnavi,CS,Female,456.0
3,Chinnu,IT,Female,456.0
4,,Civil,,


In [93]:
d1 = {
    "Name" : ["Teja","Swathi","Jahnavi"],
    "Branch" : ["CSe","ECE","CS"],
    "PIN" : [567,654,456]
}
df4 = pd.DataFrame(d1)
df4

Unnamed: 0,Name,Branch,PIN
0,Teja,CSe,567
1,Swathi,ECE,654
2,Jahnavi,CS,456


In [94]:
df3

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male
1,Swathi,ECE,Female
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


In [95]:
pd.merge(df3,df4)

Unnamed: 0,Name,Branch,Gender,PIN
0,Teja,CSe,Male,567
1,Swathi,ECE,Female,654
2,Jahnavi,CS,Female,456


In [100]:
pd.merge(df4,df3, how = "inner") # common data / intersection

Unnamed: 0,Name,Branch,PIN,Gender
0,Teja,CSe,567,Male
1,Swathi,ECE,654,Female
2,Jahnavi,CS,456,Female


In [101]:
pd.merge(df4,df3, how = "outer") # all data /  union

Unnamed: 0,Name,Branch,PIN,Gender
0,Teja,CSe,567.0,Male
1,Swathi,ECE,654.0,Female
2,Jahnavi,CS,456.0,Female
3,Chinnu,IT,,Female
4,,Civil,,


In [102]:
help(pd.merge)

Help on function merge in module pandas.core.reshape.merge:

merge(left, right, how: str = 'inner', on=None, left_on=None, right_on=None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes=('_x', '_y'), copy: bool = True, indicator: bool = False, validate=None) -> 'DataFrame'
    Merge DataFrame or named Series objects with a database-style join.
    
    The join is done on columns or indexes. If joining columns on
    columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
    on indexes or indexes on a column or columns, the index will be passed on.
    
    Parameters
    ----------
    left : DataFrame
    right : DataFrame or named Series
        Object to merge with.
    how : {'left', 'right', 'outer', 'inner'}, default 'inner'
        Type of merge to be performed.
    
        * left: use only keys from left frame, similar to a SQL left outer join;
          preserve key order.
        * right: use only keys from right fra

In [103]:
pd.merge(df4,df3, how = "left") # df4

Unnamed: 0,Name,Branch,PIN,Gender
0,Teja,CSe,567,Male
1,Swathi,ECE,654,Female
2,Jahnavi,CS,456,Female


In [104]:
df4

Unnamed: 0,Name,Branch,PIN
0,Teja,CSe,567
1,Swathi,ECE,654
2,Jahnavi,CS,456


In [105]:
pd.merge(df4,df3, how = "right") # df3

Unnamed: 0,Name,Branch,PIN,Gender
0,Teja,CSe,567.0,Male
1,Swathi,ECE,654.0,Female
2,Jahnavi,CS,456.0,Female
3,Chinnu,IT,,Female
4,,Civil,,


In [106]:
df3

Unnamed: 0,Name,Branch,Gender
0,Teja,CSe,Male
1,Swathi,ECE,Female
2,Jahnavi,CS,Female
3,Chinnu,IT,Female
4,,Civil,


# File Reading

In [107]:
data = pd.read_csv("https://raw.githubusercontent.com/AP-Skill-Development-Corporation/PublicWorkshop_ML/main/Day3_Pandas/PublicBatch.csv")
data # github

Unnamed: 0,S.No,College Name,Program Name,Student Name,Registrationid,Phone,Email,Payment Status,Registered Date
0,1,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Mandapaka Anusha,18A91A0534,6300452000.0,anushamandapaka77@gmail.com,success,6/19/2021
1,2,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Meneti Mounika,18A91A05F0,9704807000.0,mounikameneti9912@gmail.com,success,6/19/2021
2,3,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Sheik Abdul Hakim,19A95A0504,9676215000.0,hakeemabd007@gmail.com,success,6/19/2021
3,4,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Sambattula Navya Prasanna,18A91A0552,8328650000.0,navyasambattula@gmail.com,success,6/19/2021
4,5,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Vijaya Durga Velagala,18A91A05H8,9502362000.0,vijayadurga.velagala123@gmail.com,success,6/19/2021
5,6,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Budha Bala Atyutha Sri Sai,18A91A0510,8500388000.0,balasai599@gmail.com,success,6/19/2021
6,7,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Karri Sirish Kumar,18A91A0529,9491691000.0,karrisirish2000@gmail.com,success,6/19/2021
7,8,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Ramavarapu Mary Ratnam,18A91A0547,7981134000.0,rpchinnu123@gmail.com,success,6/19/2021
8,9,Aditya Engineering College,Machine Learning Using Python Online 2021-22,M.L.S.Namratha,18A91A05E7,7013641000.0,namrathamalladi@gmail.com,success,6/21/2021
9,10,BVC Institute of Technology and Science,Machine Learning Using Python Online 2021-22,Naga Swathi Menda,18H41F0006,9533339000.0,nagaswathimenda97@gmail.com,success,6/25/2021


In [108]:
datafile = pd.read_excel("2020-07-25.xlsx")
datafile

Unnamed: 0.1,Unnamed: 0,Roll Number,2020-07-25
0,0,17B81A04H1,P
1,1,198A5F0019,P
2,2,17KD1A0560,P
3,3,17KH1A0455,P
4,4,1210316262,P
5,5,18P31A0555,P
6,6,18B01A0211,P
7,7,Y18IT048,P
8,8,17B81A05B2,P
9,9,169X1A04E0,P


In [109]:
data = pd.read_csv("iris.csv")
data # local repo

Unnamed: 0.1,Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Target
0,0,5.1,3.5,1.4,0.2,0
1,1,4.9,3.0,1.4,0.2,0
2,2,4.7,3.2,1.3,0.2,0
3,3,4.6,3.1,1.5,0.2,0
4,4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...,...
145,145,6.7,3.0,5.2,2.3,2
146,146,6.3,2.5,5.0,1.9,2
147,147,6.5,3.0,5.2,2.0,2
148,148,6.2,3.4,5.4,2.3,2
