![](logo.png)
## Day Objectives
# Pandas
- Pandas is a built in library using for data analysis. You'll be using Pandas heavily for data manipulation, visualisation, building machine learning models, etc.
- Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.
- There are two main data structures in Pandas - Series and Dataframes. The default way to store data is dataframes, and thus manipulating dataframes quickly is probably the most important skill set for data analysis.
- Source: https://pandas.pydata.org/pandas-docs/stable/overview.html
## Pandas Series
- A series is similar to a 1-D numpy array, and contains values of the same type (numeric, character, datetime  etc.). A dataframe is simply a table where each column is a pandas series.

## creating series
- List
- Tuple
- Dictionary
- Numpy
- Date_Range
- Series Indexing

## Data Analysis with Pandas
* Pandas DataFrame
* Combining & Merging
* File I/O
* Indexing
* Grouping
* Features
* Filtering
* Sorting
* Statistical
* Plotting
* Saving

|S.No |Name |Gender|
|--|--|--|
|1 | Mercy | Female|
|2 | Cherry | Male |
|3 | Raju | Male |



In [1]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [4]:
import pandas as pd

# Pandas Series 

In [5]:
# creating pandas series using list
li = [12,343,445,656,7]
s1 = pd.Series(li)
s1

0     12
1    343
2    445
3    656
4      7
dtype: int64

In [6]:
type(s1)

pandas.core.series.Series

In [7]:
t = (1,2,3,4,"string",34.45)
s2 = pd.Series(t)
s2

0         1
1         2
2         3
3         4
4    string
5     34.45
dtype: object

In [12]:
import numpy as np
di = {"A":123,"B":456,"C":789,"D": np.nan}
s3 = pd.Series(di)
s3
# NaN = not a number , special type float

A    123.0
B    456.0
C    789.0
D      NaN
dtype: float64

In [15]:
n = np.array(li)
print(n)
s4 = pd.Series(n)
s4

[ 12 343 445 656   7]


0     12
1    343
2    445
3    656
4      7
dtype: int32

In [17]:
s4

0     12
1    343
2    445
3    656
4      7
dtype: int32

# Slicing & Indexing

In [18]:
s4[2]

445

In [20]:
s4[0]

12

In [21]:
s4[:3]

0     12
1    343
2    445
dtype: int32

In [22]:
s4[2:]

2    445
3    656
4      7
dtype: int32

In [23]:
s3

A    123.0
B    456.0
C    789.0
D      NaN
dtype: float64

In [26]:
s3["A"]

123.0

In [27]:
s3["D"]

nan

In [29]:
s3["A":"C"] # explicit slicing

A    123.0
B    456.0
C    789.0
dtype: float64

In [31]:
s3[1], s3["B"]

(456.0, 456.0)

In [32]:
# 0,2 from s3
s3[0::2]

A    123.0
C    789.0
dtype: float64

In [33]:
# Fancy Slicing
# 0,3,1
s3[[0,3,1]]

A    123.0
D      NaN
B    456.0
dtype: float64

In [34]:
s3.index

Index(['A', 'B', 'C', 'D'], dtype='object')

In [35]:
s3.values

array([123., 456., 789.,  nan])

In [37]:
s5 = pd.date_range(start= "2021-07-21", end = "2021-08-07")
s5

DatetimeIndex(['2021-07-21', '2021-07-22', '2021-07-23', '2021-07-24',
               '2021-07-25', '2021-07-26', '2021-07-27', '2021-07-28',
               '2021-07-29', '2021-07-30', '2021-07-31', '2021-08-01',
               '2021-08-02', '2021-08-03', '2021-08-04', '2021-08-05',
               '2021-08-06', '2021-08-07'],
              dtype='datetime64[ns]', freq='D')

In [38]:
s3.dtype

dtype('float64')

In [39]:
s3.index

Index(['A', 'B', 'C', 'D'], dtype='object')

In [40]:
s3.index = ["x","y","z","q"]
s3

x    123.0
y    456.0
z    789.0
q      NaN
dtype: float64

# Task :
- Create one series having index values start from 0 to 15 and values are mul by 5 with index 

In [42]:
s = pd.Series(np.arange(0,15)*5,index = np.arange(0,16))
s

ValueError: Length of passed values is 15, index implies 16.

In [43]:
s = pd.Series(np.arange(0,16)*5,index = np.arange(0,16))
s

0      0
1      5
2     10
3     15
4     20
5     25
6     30
7     35
8     40
9     45
10    50
11    55
12    60
13    65
14    70
15    75
dtype: int32

# Pandas DataFrames

In [48]:
# creating Df using list
a2 = [[1,2,3],[4,5,6],[6,7,8]]
df1 = pd.DataFrame(a2)
df1
# default columns and rows starts from 0

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,6,7,8


In [50]:
df1.columns = ["s","q","t"]
df1

Unnamed: 0,s,q,t
0,1,2,3
1,4,5,6
2,6,7,8


In [52]:
df1.index = ["a","b","c"]
df1

Unnamed: 0,s,q,t
a,1,2,3
b,4,5,6
c,6,7,8


In [59]:
di = {
    "Name"  : ["Lavanya","Praveen","Anurag","Nagaraju"],
    "Fcolor":["White","Black","Red",np.nan],
    "Gender":["f","m","m","m"]
}
df2 = pd.DataFrame(di)
# df2.index = np.arange(1,4)
df2

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m
2,Anurag,Red,m
3,Nagaraju,,m


In [60]:
df2.index

RangeIndex(start=0, stop=4, step=1)

In [61]:
df2.columns

Index(['Name', 'Fcolor', 'Gender'], dtype='object')

In [63]:
type(df2)

pandas.core.frame.DataFrame

In [64]:
df2.shape # rows,columns

(4, 3)

In [65]:
df2.ndim

2

In [67]:
type(df2["Name"])

pandas.core.series.Series

In [68]:
df2["Name"] # series

0     Lavanya
1     Praveen
2      Anurag
3    Nagaraju
Name: Name, dtype: object

In [70]:
df2[["Name","Gender"]] # df

Unnamed: 0,Name,Gender
0,Lavanya,f
1,Praveen,m
2,Anurag,m
3,Nagaraju,m


In [72]:
df2[0]

KeyError: 0

In [73]:
df2[0:1]

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f


In [75]:
df2[0:3]

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m
2,Anurag,Red,m


In [76]:
df2[::2]

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
2,Anurag,Red,m


In [77]:
df2[1::2]

Unnamed: 0,Name,Fcolor,Gender
1,Praveen,Black,m
3,Nagaraju,,m


In [80]:
# access Praveen Fcolor
df2[1:2]["Fcolor"]

1    Black
Name: Fcolor, dtype: object

In [81]:
# access fcolor of lavanya & anurag
df2[::2]["Fcolor"]

0    White
2      Red
Name: Fcolor, dtype: object

## iloc -- for accessing rows using integer indicies
## loc -- for accessing rows other than integer indicies

In [82]:
df2.iloc[0]

Name      Lavanya
Fcolor      White
Gender          f
Name: 0, dtype: object

In [83]:
df2.iloc[-1]

Name      Nagaraju
Fcolor         NaN
Gender           m
Name: 3, dtype: object

In [84]:
df2.iloc[0:2]

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m


In [86]:
df2.loc[1,"Gender"]

'm'

In [87]:
df2.loc[[0,2,3],"Gender"]

0    f
2    m
3    m
Name: Gender, dtype: object

In [88]:
df2.loc[[0,2,3],["Name","Gender"]]

Unnamed: 0,Name,Gender
0,Lavanya,f
2,Anurag,m
3,Nagaraju,m


In [89]:
df2.set_index("Name")

Unnamed: 0_level_0,Fcolor,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Lavanya,White,f
Praveen,Black,m
Anurag,Red,m
Nagaraju,,m


In [90]:
df2

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m
2,Anurag,Red,m
3,Nagaraju,,m


In [91]:
df2.set_index("Name", inplace = True)
df2

Unnamed: 0_level_0,Fcolor,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Lavanya,White,f
Praveen,Black,m
Anurag,Red,m
Nagaraju,,m


In [92]:
df2.loc["Lavanya","Fcolor"]

'White'

In [93]:
df2.loc[["Lavanya","Praveen"],["Gender"]]

Unnamed: 0_level_0,Gender
Name,Unnamed: 1_level_1
Lavanya,f
Praveen,m


In [96]:
df2.reset_index(inplace= True)

In [97]:
df2

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m
2,Anurag,Red,m
3,Nagaraju,,m


# Merge / concat / Append

In [106]:
d3 = {
    "Name"  : ["Lavanya","Praveen","Anurag","Nagaraju", "Priya"],
    "Branch" : ["Cse","IT","ECE", "CSe", "Mech"]
}
df3 = pd.DataFrame(d3)
df3

Unnamed: 0,Name,Branch
0,Lavanya,Cse
1,Praveen,IT
2,Anurag,ECE
3,Nagaraju,CSe
4,Priya,Mech


In [107]:
pd.concat([df2,df3],axis = 0)
# axis = 0 at rows, axis =1 at columns

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,
1,Praveen,Black,m,
2,Anurag,Red,m,
3,Nagaraju,,m,
0,Lavanya,,,Cse
1,Praveen,,,IT
2,Anurag,,,ECE
3,Nagaraju,,,CSe
4,Priya,,,Mech


In [108]:
pd.concat([df2,df3],axis = 1) # concat at columns

Unnamed: 0,Name,Fcolor,Gender,Name.1,Branch
0,Lavanya,White,f,Lavanya,Cse
1,Praveen,Black,m,Praveen,IT
2,Anurag,Red,m,Anurag,ECE
3,Nagaraju,,m,Nagaraju,CSe
4,,,,Priya,Mech


In [109]:
df2.append(df3) # at rows

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,
1,Praveen,Black,m,
2,Anurag,Red,m,
3,Nagaraju,,m,
0,Lavanya,,,Cse
1,Praveen,,,IT
2,Anurag,,,ECE
3,Nagaraju,,,CSe
4,Priya,,,Mech


In [110]:
df3.append(df2)

Unnamed: 0,Name,Branch,Fcolor,Gender
0,Lavanya,Cse,,
1,Praveen,IT,,
2,Anurag,ECE,,
3,Nagaraju,CSe,,
4,Priya,Mech,,
0,Lavanya,,White,f
1,Praveen,,Black,m
2,Anurag,,Red,m
3,Nagaraju,,,m


In [111]:
pd.merge(df2,df3) # common data members 

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,Cse
1,Praveen,Black,m,IT
2,Anurag,Red,m,ECE
3,Nagaraju,,m,CSe


In [113]:
df2

Unnamed: 0,Name,Fcolor,Gender
0,Lavanya,White,f
1,Praveen,Black,m
2,Anurag,Red,m
3,Nagaraju,,m


In [116]:
pd.merge(df2,df3, how = "left") # priority is df2

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,Cse
1,Praveen,Black,m,IT
2,Anurag,Red,m,ECE
3,Nagaraju,,m,CSe


In [115]:
help(pd.merge)

Help on function merge in module pandas.core.reshape.merge:

merge(left, right, how: str = 'inner', on=None, left_on=None, right_on=None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes=('_x', '_y'), copy: bool = True, indicator: bool = False, validate=None) -> 'DataFrame'
    Merge DataFrame or named Series objects with a database-style join.
    
    The join is done on columns or indexes. If joining columns on
    columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
    on indexes or indexes on a column or columns, the index will be passed on.
    
    Parameters
    ----------
    left : DataFrame
    right : DataFrame or named Series
        Object to merge with.
    how : {'left', 'right', 'outer', 'inner'}, default 'inner'
        Type of merge to be performed.
    
        * left: use only keys from left frame, similar to a SQL left outer join;
          preserve key order.
        * right: use only keys from right fra

In [117]:
pd.merge(df2,df3, how = "right") # df3

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,Cse
1,Praveen,Black,m,IT
2,Anurag,Red,m,ECE
3,Nagaraju,,m,CSe
4,Priya,,,Mech


In [118]:
pd.merge(df2,df3, how = "inner") # common data - use intersection of keys from both frames

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,Cse
1,Praveen,Black,m,IT
2,Anurag,Red,m,ECE
3,Nagaraju,,m,CSe


In [120]:
pd.merge(df2,df3, how = "outer") # all -use union of keys from both frames

Unnamed: 0,Name,Fcolor,Gender,Branch
0,Lavanya,White,f,Cse
1,Praveen,Black,m,IT
2,Anurag,Red,m,ECE
3,Nagaraju,,m,CSe
4,Priya,,,Mech


# File Reading

In [121]:
data = pd.read_excel("2020-07-25.xlsx")
data

Unnamed: 0.1,Unnamed: 0,Roll Number,2020-07-25
0,0,17B81A04H1,P
1,1,198A5F0019,P
2,2,17KD1A0560,P
3,3,17KH1A0455,P
4,4,1210316262,P
5,5,18P31A0555,P
6,6,18B01A0211,P
7,7,Y18IT048,P
8,8,17B81A05B2,P
9,9,169X1A04E0,P


In [123]:
data = pd.read_csv("birds.csv")
data

Unnamed: 0.1,Unnamed: 0,huml,humw,ulnal,ulnaw,feml,femw,tibl,tibw,tarl,tarw,len_type,Mean_Features,huml_and_humw
0,count,419.0,419.0,417.0,418.0,418.0,419.0,418.0,419.0,419.0,419.0,420.0,420.0,419.0
1,mean,64.650501,4.370573,69.115372,3.597249,36.872416,3.220883,64.662823,3.182339,39.229976,2.930024,1.580952,29.14716,69.021074
2,std,53.834549,2.854617,58.784775,2.186747,19.979082,2.023581,37.838145,2.080827,23.184313,2.185673,0.493992,18.996025,56.462551
3,min,9.85,1.14,14.09,1.0,11.83,0.93,5.5,0.87,7.77,0.66,1.0,7.782,12.73
4,25%,25.17,2.19,28.05,1.87,21.2975,1.715,36.4175,1.565,23.035,1.425,1.0,14.5285,27.2
5,50%,44.18,3.5,43.71,2.945,31.13,2.52,52.12,2.49,31.74,2.23,2.0,22.3825,48.06
6,75%,90.31,5.81,97.52,4.77,47.12,4.135,82.87,4.255,50.25,3.5,2.0,39.08225,96.19
7,max,420.0,17.84,422.0,12.0,117.07,11.64,240.0,11.03,175.0,14.09,2.0,137.74,437.84


In [124]:
data = pd.read_csv("https://raw.githubusercontent.com/AP-Skill-Development-Corporation/Machine-Learning-Online-Training-Public-20-07-2021-to-07-08-2021-/main/Day3_Pandas/PublicBatch.csv")
data

Unnamed: 0,S.No,College Name,Program Name,Student Name,Registrationid,Phone,Email,Payment Status,Registered Date
0,1,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Mandapaka Anusha,18A91A0534,6300452000.0,anushamandapaka77@gmail.com,success,6/19/2021
1,2,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Meneti Mounika,18A91A05F0,9704807000.0,mounikameneti9912@gmail.com,success,6/19/2021
2,3,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Sheik Abdul Hakim,19A95A0504,9676215000.0,hakeemabd007@gmail.com,success,6/19/2021
3,4,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Sambattula Navya Prasanna,18A91A0552,8328650000.0,navyasambattula@gmail.com,success,6/19/2021
4,5,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Vijaya Durga Velagala,18A91A05H8,9502362000.0,vijayadurga.velagala123@gmail.com,success,6/19/2021
5,6,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Budha Bala Atyutha Sri Sai,18A91A0510,8500388000.0,balasai599@gmail.com,success,6/19/2021
6,7,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Karri Sirish Kumar,18A91A0529,9491691000.0,karrisirish2000@gmail.com,success,6/19/2021
7,8,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Ramavarapu Mary Ratnam,18A91A0547,7981134000.0,rpchinnu123@gmail.com,success,6/19/2021
8,9,Aditya Engineering College,Machine Learning Using Python Online 2021-22,M.L.S.Namratha,18A91A05E7,7013641000.0,namrathamalladi@gmail.com,success,6/21/2021
9,10,BVC Institute of Technology and Science,Machine Learning Using Python Online 2021-22,Naga Swathi Menda,18H41F0006,9533339000.0,nagaswathimenda97@gmail.com,success,6/25/2021


In [126]:
data.head(3)

Unnamed: 0,S.No,College Name,Program Name,Student Name,Registrationid,Phone,Email,Payment Status,Registered Date
0,1,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Mandapaka Anusha,18A91A0534,6300452000.0,anushamandapaka77@gmail.com,success,6/19/2021
1,2,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Meneti Mounika,18A91A05F0,9704807000.0,mounikameneti9912@gmail.com,success,6/19/2021
2,3,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Sheik Abdul Hakim,19A95A0504,9676215000.0,hakeemabd007@gmail.com,success,6/19/2021


In [127]:
data.tail()

Unnamed: 0,S.No,College Name,Program Name,Student Name,Registrationid,Phone,Email,Payment Status,Registered Date
29,30,MVGR College of Engineering,Machine Learning Using Python Online 2021-22,Vasireddy Teja,19335A0415,7095737000.0,teja16052001@gmail.com,success,6/27/2021
30,31,MVGR College of Engineering,Machine Learning Using Python Online 2021-22,Thida Priyanka,18331A04F5,,teedapriyanka@gmail.com,,
31,32,Gayatri Vidya Parishad College of Engineering,Machine Learning Using Python Online 2021-22,Metta Likhitha,19131A04F2,7569078000.0,likhithametta888@gmail.com,,
32,33,Gayatri Vidya Parishad College of Engineering,Machine Learning Using Python Online 2021-22,Pallapothula Venkata Sreechandana,19131A04H4,8688858000.0,19131a04h4@gvpce.ac.in,,
33,34,Gayatri Vidya Parishad College of Engineering,Machine Learning Using Python Online 2021-22,Marada Gowrav Srikalyan,19131A04E4,7396917000.0,gowravsrikalyan1220@gmail.com,,


In [133]:
data.sample(3) # random selection

Unnamed: 0,S.No,College Name,Program Name,Student Name,Registrationid,Phone,Email,Payment Status,Registered Date
0,1,Aditya Engineering College,Machine Learning Using Python Online 2021-22,Mandapaka Anusha,18A91A0534,6300452000.0,anushamandapaka77@gmail.com,success,6/19/2021
31,32,Gayatri Vidya Parishad College of Engineering,Machine Learning Using Python Online 2021-22,Metta Likhitha,19131A04F2,7569078000.0,likhithametta888@gmail.com,,
27,28,Vignan s Institute of Engineering for Women,Machine Learning Using Python Online 2021-22,Divya Naramsetty,18NM1A0311,9346382000.0,divyanaramsetty@gmail.com,success,6/21/2021


In [135]:
type(data)

pandas.core.frame.DataFrame

# Exporting Data

In [136]:
data[:100].to_csv("student.csv")

In [137]:
data.to_excel("Details.xlsx")

In [139]:
data.to_dict()

{'S.No': {0: 1,
  1: 2,
  2: 3,
  3: 4,
  4: 5,
  5: 6,
  6: 7,
  7: 8,
  8: 9,
  9: 10,
  10: 11,
  11: 12,
  12: 13,
  13: 14,
  14: 15,
  15: 16,
  16: 17,
  17: 18,
  18: 19,
  19: 20,
  20: 21,
  21: 22,
  22: 23,
  23: 24,
  24: 25,
  25: 26,
  26: 27,
  27: 28,
  28: 29,
  29: 30,
  30: 31,
  31: 32,
  32: 33,
  33: 34},
 'College Name': {0: 'Aditya Engineering College',
  1: 'Aditya Engineering College',
  2: 'Aditya Engineering College',
  3: 'Aditya Engineering College',
  4: 'Aditya Engineering College',
  5: 'Aditya Engineering College',
  6: 'Aditya Engineering College',
  7: 'Aditya Engineering College',
  8: 'Aditya Engineering College',
  9: 'BVC Institute of Technology and Science',
  10: 'BVC Institute of Technology and Science',
  11: 'BVC Institute of Technology and Science',
  12: 'BVC Institute of Technology and Science',
  13: 'Eluru College of Engineering and Technology',
  14: 'G Pulla Reddy Engineering College (Autonomous)',
  15: 'Gayatri Vidya Parishad Colle

In [140]:
data.columns

Index(['S.No', 'College Name', 'Program Name', 'Student Name',
       'Registrationid', 'Phone', 'Email', 'Payment Status',
       'Registered Date'],
      dtype='object')

In [141]:
data.index

RangeIndex(start=0, stop=34, step=1)

In [142]:
data.describe()

Unnamed: 0,S.No,Phone
count,34.0,33.0
mean,17.5,8612149000.0
std,9.958246,915948200.0
min,1.0,6300452000.0
25%,9.25,7995139000.0
50%,17.5,8688858000.0
75%,25.75,9381230000.0
max,34.0,9704807000.0


In [146]:
data["S.No"].sum()

595

In [147]:
data.count()

S.No               34
College Name       34
Program Name       34
Student Name       34
Registrationid     34
Phone              33
Email              34
Payment Status     30
Registered Date    30
dtype: int64

In [149]:
data.max()

S.No                                                             34
College Name      Vignan s LARA Institute of Technology and Science
Program Name           Machine Learning Using Python Online 2021-22
Student Name                            Viwanth Manikanta Kankatala
Registrationid                                        AP19110010123
Phone                                                   9.70481e+09
Email                          yaminisrilakshmipeddireddy@gmail.com
dtype: object

In [150]:
data.min()

S.No                                                         1
College Name                        Aditya Engineering College
Program Name      Machine Learning Using Python Online 2021-22
Student Name                              Bh Tarun Surya Varma
Registrationid                                       170040262
Phone                                              6.30045e+09
Email                                   19131a04h4@gvpce.ac.in
dtype: object

In [154]:
data["S.No"].idxmax()

33