# Introduction to Pandas

**Reference :** NCERT Informatics Practices Textbook for Std XII Chapters 2,3

PANDAS (PANel DAta) is a high-level data manipulation 
tool used for analysing data. It is very easy to import 
and export data using Pandas library which has a very 
rich set of functions. It is built on packages like NumPy 
and Matplotlib and gives us a single, convenient place 
to do most of our data analysis and visualisation work. 
Pandas has two important data structures, namely – 
Series and DataFrame to make the process of 
analysing data organised, effective and efficient.

In [None]:
import pandas as pd

## Series

A Series is a one-dimensional array containing a 
sequence of values of any data type (int, float, list, 
string, etc) which by default have numeric data labels 
starting from zero. The data label associated with a 
particular value is called its index. We can also assign 
values of other data types as index. We can imagine a 
Pandas Series as a column in a spreadsheet.

### Creating a Series

In [None]:
10,20,30->int64
10.0,20,30->float
'a',10,20->str->object
arrayName.astype(float/str/int64)
43.5->43

In [None]:
series1 = pd.Series([10,20,30]) #create a Series
print(series1) #Display the series

0    10
1    20
2    30
dtype: int64


In [None]:
series2 = pd.Series(["Kavi","Shyam","Ravi"], index=[3,5,1])
print(series2) #Display the series

3     Kavi
5    Shyam
1     Ravi
dtype: object


In [None]:
series2 = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(series2) #Display the series

Feb    2
Mar    3
Apr    4
May    a
dtype: object


In [None]:
import numpy as np
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
print(series3)

0    1
1    2
2    3
3    4
dtype: int64


In [None]:
series4 = pd.Series(array1, index = ["Jan", "Feb", "Mar", "Apr"])
print(series4)

Jan    1
Feb    2
Mar    3
Apr    4
dtype: int64


In [None]:
dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}
print(dict1) #Display the dictionary

{'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}


In [None]:
series8 = pd.Series(dict1) 
print(series8) 

India    NewDelhi
UK         London
Japan       Tokyo
dtype: object


### Accessing Elements of a Series

#### Indexing & Slicing

In [None]:
print(series4)

Jan    1
Feb    2
Mar    3
Apr    4
dtype: int64


In [None]:
series4['Jan']

1

In [None]:
series4[1]

2

In [None]:
series4.loc['Feb']

2

In [None]:
series4.iloc[1]

2

In [None]:
series4.loc[['Feb','Apr']]

Feb    2
Apr    4
dtype: int64

In [None]:
series4.iloc[[1,3]]

Feb    2
Apr    4
dtype: int64

In [None]:
series4.iloc[1:3]#similar to python list, element 3 is not printed

Feb    2
Mar    3
dtype: int64

In [None]:
series4.iloc[1::2]

Feb    2
Apr    4
dtype: int64

In [None]:
series4.iloc[1:3:2]

Feb    2
dtype: int64

In [None]:
series4.loc['Feb':'Apr']#label based all labels asked for are printed

Feb    2
Mar    3
Apr    4
dtype: int64

### Atributes of Series

* name
* index.name
* values
* size
* empty

In [None]:
series4.name="Test"

In [None]:
print(series4.name)

Test


In [None]:
print(series4)

Jan    1
Feb    2
Mar    3
Apr    4
Name: Test, dtype: int64


In [None]:
series4.index.name="Months"
print(series4)

Months
Jan    1
Feb    2
Mar    3
Apr    4
Name: Test, dtype: int64


In [None]:
series4.values

array([1, 2, 3, 4])

In [None]:
series4.size#NaN is also counted

4

In [None]:
series4.empty

False

### Methods of Series

* head
* count
* tail

In [None]:
series4.head(2)

Months
Jan    1
Feb    2
Name: Test, dtype: int64

In [None]:
series4.tail(3)

Months
Feb    2
Mar    3
Apr    4
Name: Test, dtype: int64

In [None]:
series4.count()#NaN is not counted

4

## Dataframe

A DataFrame is a two-dimensional labelled data structure. It contains rows and columns, and therefore 
has both a row and column index. Each column can 
have a different type of value such as numeric, string, 
boolean, etc., as in tables of a database

### Creating a Dataframe

In [None]:
dFrameEmt = pd.DataFrame()
print(dFrameEmt)

Empty DataFrame
Columns: []
Index: []


In [None]:
array1 = np.array([10,20,30])
array2 = np.array([100,200,300])
array3 = np.array([-10,-20,-30, -40])

In [None]:
dFrame2 = pd.DataFrame(array1)#axis=0->row, axis=1->column
dFrame2

Unnamed: 0,0
0,10
1,20
2,30


In [None]:
dFrame5 = pd.DataFrame([array1, array3, array2], columns=[ 'A', 'B', 'C', 'D'])
dFrame5

Unnamed: 0,A,B,C,D
0,10,20,30,
1,-10,-20,-30,-40.0
2,100,200,300,


In [None]:
# Create list of dictionaries
listDict = [{'a':10, 'b':20}, {'a':5, 'b':10, 'c':20}]

In [None]:
dFrameListDict = pd.DataFrame(listDict)
dFrameListDict

Unnamed: 0,a,b,c
0,10,20,
1,5,10,20.0


In [None]:
# Create dictionary of lists
listDict = {'a':[10,5], 'b':[20,10], 'c':[np.NaN,20]}

In [None]:
dFrameListDict = pd.DataFrame(listDict)
dFrameListDict

Unnamed: 0,a,b,c
0,10,20,
1,5,10,20.0


In [None]:
dictForest = {'State': ['Assam', 'Delhi', 'Kerala'], 'GArea': [78438, 1483, 38852] , 'VDF' : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest

Unnamed: 0,State,GArea,VDF
0,Assam,78438,2797.0
1,Delhi,1483,6.72
2,Kerala,38852,1663.0


In [None]:
dFrameForest1 = pd.DataFrame(dictForest, columns = ['State','VDF', 'GArea'])
dFrameForest1

Unnamed: 0,State,VDF,GArea
0,Assam,2797.0,78438
1,Delhi,6.72,1483
2,Kerala,1663.0,38852


In [None]:
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000], index = ['a', 'b', 'c', 'd', 'e'])
seriesC = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])

In [None]:
dFrame7 = pd.DataFrame([seriesA, seriesB])
dFrame7

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5
1,1000,2000,-1000,-5000,1000


In [None]:
dFrame8 = pd.DataFrame([seriesA, seriesC])
dFrame8

Unnamed: 0,a,b,c,d,e,z,y
0,1.0,2.0,3.0,4.0,5.0,,
1,-10.0,,-50.0,,100.0,10.0,20.0


In [None]:
dFrame8.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   a       2 non-null      float64
 1   b       1 non-null      float64
 2   c       2 non-null      float64
 3   d       1 non-null      float64
 4   e       2 non-null      float64
 5   z       1 non-null      float64
 6   y       1 non-null      float64
dtypes: float64(7)
memory usage: 240.0 bytes


In [None]:
ResultSheet={
'Arnab': pd.Series([90, 91, 97], index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96], index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88], index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67], index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99], index=['Maths','Science','Hindi'])}

In [None]:
ResultDF = pd.DataFrame(ResultSheet)
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika
Maths,90,92,89,81,94
Science,91,81,91,71,95
Hindi,97,96,88,67,99


In [None]:
ResultDF.T#Transpose

Unnamed: 0,Maths,Science,Hindi
Arnab,90,91,97
Ramit,92,81,96
Samridhi,89,91,88
Riya,81,71,67
Mallika,94,95,99


In [None]:
ResultDF.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, Maths to Hindi
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Arnab     3 non-null      int64
 1   Ramit     3 non-null      int64
 2   Samridhi  3 non-null      int64
 3   Riya      3 non-null      int64
 4   Mallika   3 non-null      int64
dtypes: int64(5)
memory usage: 224.0+ bytes


In [None]:
df = pd.read_csv("sample_data/california_housing_test.csv")
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.3,34.26,43.0,1510.0,310.0,809.0,277.0,3.599,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           3000 non-null   float64
 1   latitude            3000 non-null   float64
 2   housing_median_age  3000 non-null   float64
 3   total_rooms         3000 non-null   float64
 4   total_bedrooms      3000 non-null   float64
 5   population          3000 non-null   float64
 6   households          3000 non-null   float64
 7   median_income       3000 non-null   float64
 8   median_house_value  3000 non-null   float64
dtypes: float64(9)
memory usage: 211.1 KB


In [None]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
mean,-119.5892,35.63539,28.845333,2599.578667,529.950667,1402.798667,489.912,3.807272,205846.275
std,1.994936,2.12967,12.555396,2155.593332,415.654368,1030.543012,365.42271,1.854512,113119.68747
min,-124.18,32.56,1.0,6.0,2.0,5.0,2.0,0.4999,22500.0
25%,-121.81,33.93,18.0,1401.0,291.0,780.0,273.0,2.544,121200.0
50%,-118.485,34.27,29.0,2106.0,437.0,1155.0,409.5,3.48715,177650.0
75%,-118.02,37.69,37.0,3129.0,636.0,1742.75,597.25,4.656475,263975.0
max,-114.49,41.92,52.0,30450.0,5419.0,11935.0,4930.0,15.0001,500001.0


In [None]:
df.describe(include='all')

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
mean,-119.5892,35.63539,28.845333,2599.578667,529.950667,1402.798667,489.912,3.807272,205846.275
std,1.994936,2.12967,12.555396,2155.593332,415.654368,1030.543012,365.42271,1.854512,113119.68747
min,-124.18,32.56,1.0,6.0,2.0,5.0,2.0,0.4999,22500.0
25%,-121.81,33.93,18.0,1401.0,291.0,780.0,273.0,2.544,121200.0
50%,-118.485,34.27,29.0,2106.0,437.0,1155.0,409.5,3.48715,177650.0
75%,-118.02,37.69,37.0,3129.0,636.0,1742.75,597.25,4.656475,263975.0
max,-114.49,41.92,52.0,30450.0,5419.0,11935.0,4930.0,15.0001,500001.0


In [None]:
df = pd.read_csv("sample_data/mnist_test.csv",header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784
0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Columns: 785 entries, 0 to 784
dtypes: int64(785)
memory usage: 59.9 MB


### Operations on rows and columns in DataFrames

In [None]:
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika
Maths,90,92,89,81,94
Science,91,81,91,71,95
Hindi,97,96,88,67,99


In [None]:
ResultDF['Preeti']=[89,78,76]
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,90,92,89,81,94,89
Science,91,81,91,71,95,78
Hindi,97,96,88,67,99,76


In [None]:
ResultDF['Ramit']=[99, 98, 78]
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,90,99,89,81,94,89
Science,91,98,91,71,95,78
Hindi,97,78,88,67,99,76


In [None]:
ResultDF.Ramit=[99, 89, 78,76]
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,0,99,0,0,0,0
Science,90,89,91,71,95,78
Hindi,90,78,88,67,99,76
English,95,76,95,80,95,99


In [None]:
ResultDF['Arnab']=90
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,90,99,89,81,94,89
Science,90,98,91,71,95,78
Hindi,90,78,88,67,99,76


In [None]:
ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,90,99,89,81,94,89
Science,90,98,91,71,95,78
Hindi,90,78,88,67,99,76
English,85,86,83,80,90,89


In [None]:
ResultDF.loc['English'] = [95, 86, 95, 80, 95,99]
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,90,99,89,81,94,89
Science,90,98,91,71,95,78
Hindi,90,78,88,67,99,76
English,95,86,95,80,95,99


In [None]:
ResultDF.loc['Maths']=0
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,0,0,0,0,0,0
Science,90,98,91,71,95,78
Hindi,90,78,88,67,99,76
English,95,86,95,80,95,99


In [None]:
ResultDF.loc[:,'Mallika']=0
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,0,99,0,0,0,0
Science,90,89,91,71,0,78
Hindi,90,78,88,67,0,76
English,95,76,95,80,0,99


In [None]:
ResultDF.loc[:,['Arnab','Mallika']]=10
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,10,99,0,0,10,0
Science,10,89,91,71,10,78
Hindi,10,78,88,67,10,76
English,10,76,95,80,10,99


In [None]:
ResultDF.iloc[2]=0
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,10,99,0,0,10,0
Science,10,89,91,71,10,78
Hindi,0,0,0,0,0,0
English,10,76,95,80,10,99


In [None]:
ResultDF.iloc[:,2]=0
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,10,99,0,0,10,0
Science,10,89,0,71,10,78
Hindi,0,0,0,0,0,0
English,10,76,0,80,10,99


In [None]:
ResultDF = ResultDF.drop('Science', axis=0)
ResultDF

Unnamed: 0,Arnab,Ramit,Samridhi,Riya,Mallika,Preeti
Maths,10,99,0,0,10,0
Hindi,0,0,0,0,0,0
English,10,76,0,80,10,99


In [None]:
ResultDF = ResultDF.drop(['Samridhi','Ramit','Riya'], axis=1)
ResultDF

Unnamed: 0,Arnab,Mallika,Preeti
Maths,10,10,0
Hindi,0,0,0
English,10,10,99


### Accessing DataFrames Element through Slicing

In [None]:
ResultDF.loc['Maths': 'Science']

Unnamed: 0,Arnab,Mallika,Preeti


In [None]:
ResultDF.loc['Maths': 'Science', 'Arnab':'Samridhi']

Unnamed: 0,Arnab,Mallika,Preeti


In [None]:
iloc

In [None]:
iloc

### Joining, Merging and Concatenation of  DataFrames

In [None]:
dFrame1=pd.DataFrame([[1, 2, 3], [4, 5], [6]], columns=['C1', 'C2', 'C3'], index=['R1','R2', 'R3'])
dFrame1

Unnamed: 0,C1,C2,C3
R1,1,2.0,3.0
R2,4,5.0,
R3,6,,


In [None]:
dFrame2=pd.DataFrame([[10, 20], [30], [40, 50]], columns=['C2', 'C5'], index=['R4', 'R2', 'R5'])
dFrame2

Unnamed: 0,C2,C5
R4,10,20.0
R2,30,
R5,40,50.0


In [None]:
dFrame1.append(dFrame2)

Unnamed: 0,C1,C2,C3,C5
R1,1.0,2.0,3.0,
R2,4.0,5.0,,
R3,6.0,,,
R4,,10.0,,20.0
R2,,30.0,,
R5,,40.0,,50.0


In [None]:
dFrame1=pd.DataFrame([[1, 2, 3], [4, 5], [6]], columns=['C1', 'C2', 'C3'], index=['R1','R2', 'R3'])
dFrame1

Unnamed: 0,C1,C2,C3
R1,1,2.0,3.0
R2,4,5.0,
R3,6,,


In [None]:
dFrame2.append(dFrame1)

Unnamed: 0,C2,C5,C1,C3
R4,10.0,20.0,,
R2,30.0,,,
R5,40.0,50.0,,
R1,2.0,,1.0,3.0
R2,5.0,,4.0,
R3,,,6.0,


In [None]:
dFrame2.append(dFrame1, sort= True)

Unnamed: 0,C1,C2,C3,C5
R4,,10.0,,20.0
R2,,30.0,,
R5,,40.0,,50.0
R1,1.0,2.0,3.0,
R2,4.0,5.0,,
R3,6.0,,,


In [None]:
dFrame1.append(dFrame2, ignore_index=True)

Unnamed: 0,C1,C2,C3,C5
0,1.0,2.0,3.0,
1,4.0,5.0,,
2,6.0,,,
3,,10.0,,20.0
4,,30.0,,
5,,40.0,,50.0


Reference : https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

In [None]:
concat

In [None]:
merge

### Attributes & Methods of DataFrames

* index
* columns
* dtypes
* values
* shape
* size
* T
* head()
* tail()
* empty
* describe
* info

### Data Aggregations

In [None]:
marksUT= {'Name':['Raman','Raman','Raman','Zuhaire','Zuhaire','Zuhaire', 'Ashravy','Ashravy','Ashravy','Mishti','Mishti','Mishti'],
 'UT':[1,2,3,1,2,3,1,2,3,1,2,3],
 'Maths':[22,21,14,20,23,22,23,24,12,15,18,17],
 'Science':[21,20,19,17,15,18,19,22,25,22,21,18],
 'S.St':[18,17,15,22,21,19,20,24,19,25,25,20],
 'Hindi':[20,22,24,24,25,23,15,17,21,22,24,25],
 'Eng':[21,24,23,19,15,13,22,21,23,22,23,20]
 }
df=pd.DataFrame(marksUT)
print(df)

       Name  UT  Maths  Science  S.St  Hindi  Eng
0     Raman   1     22       21    18     20   21
1     Raman   2     21       20    17     22   24
2     Raman   3     14       19    15     24   23
3   Zuhaire   1     20       17    22     24   19
4   Zuhaire   2     23       15    21     25   15
5   Zuhaire   3     22       18    19     23   13
6   Ashravy   1     23       19    20     15   22
7   Ashravy   2     24       22    24     17   21
8   Ashravy   3     12       25    19     21   23
9    Mishti   1     15       22    25     22   22
10   Mishti   2     18       21    25     24   23
11   Mishti   3     17       18    20     25   20


In [None]:
df.sort_values(by=['UT'],inplace=True)

In [None]:
df

Unnamed: 0,Name,UT,Maths,Science,S.St,Hindi,Eng
0,Raman,1,22,21,18,20,21
3,Zuhaire,1,20,17,22,24,19
6,Ashravy,1,23,19,20,15,22
9,Mishti,1,15,22,25,22,22
1,Raman,2,21,20,17,22,24
4,Zuhaire,2,23,15,21,25,15
7,Ashravy,2,24,22,24,17,21
10,Mishti,2,18,21,25,24,23
2,Raman,3,14,19,15,24,23
5,Zuhaire,3,22,18,19,23,13


In [None]:
df.aggregate('max')

Name       Zuhaire
UT               3
Maths           24
Science         25
S.St            25
Hindi           25
Eng             24
dtype: object

In [None]:
df.drop(['Name','UT'],axis=1,inplace=True)

In [None]:
df.aggregate('max',axis=1)

0     22
1     24
2     24
3     24
4     25
5     23
6     23
7     24
8     25
9     25
10    25
11    25
dtype: int64

In [None]:
 df.aggregate(['max','mean'])

Unnamed: 0,Name,UT,Maths,Science,S.St,Hindi,Eng
max,Zuhaire,3.0,24.0,25.0,25.0,25.0,24.0
mean,,2.0,19.25,19.75,20.416667,21.833333,20.5


In [None]:
 df[['Maths','Science']].aggregate(['max','mean'],axis=1)

Unnamed: 0,max,mean
0,22.0,21.5
1,21.0,20.5
2,19.0,16.5
3,20.0,18.5
4,23.0,19.0
5,22.0,20.0
6,23.0,21.0
7,24.0,23.0
8,25.0,18.5
9,22.0,18.5


### Indexing

In [None]:
df.set_index(['Name','UT'], inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Maths,Science,S.St,Hindi,Eng
Name,UT,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Raman,1,22,21,18,20,21
Raman,2,21,20,17,22,24
Raman,3,14,19,15,24,23
Zuhaire,1,20,17,22,24,19
Zuhaire,2,23,15,21,25,15
Zuhaire,3,22,18,19,23,13
Ashravy,1,23,19,20,15,22
Ashravy,2,24,22,24,17,21
Ashravy,3,12,25,19,21,23
Mishti,1,15,22,25,22,22


In [None]:
df.reset_index(inplace=True)
df

Unnamed: 0,UT,Name,Maths,Science,S.St,Hindi,Eng
0,1,Ashravy,23,19,20,15,22
1,1,Mishti,15,22,25,22,22
2,1,Raman,22,21,18,20,21
3,1,Zuhaire,20,17,22,24,19
4,2,Ashravy,24,22,24,17,21
5,2,Mishti,18,21,25,24,23
6,2,Raman,21,20,17,22,24
7,2,Zuhaire,23,15,21,25,15
8,3,Ashravy,12,25,19,21,23
9,3,Mishti,17,18,20,25,20


In [None]:
df.set_index(['UT','Name'], inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Maths,Science,S.St,Hindi,Eng
UT,Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Raman,22,21,18,20,21
2,Raman,21,20,17,22,24
3,Raman,14,19,15,24,23
1,Zuhaire,20,17,22,24,19
2,Zuhaire,23,15,21,25,15
3,Zuhaire,22,18,19,23,13
1,Ashravy,23,19,20,15,22
2,Ashravy,24,22,24,17,21
3,Ashravy,12,25,19,21,23
1,Mishti,15,22,25,22,22


In [None]:
df.reset_index(inplace=True)
df

Unnamed: 0,UT,Name,Maths,Science,S.St,Hindi,Eng
0,1,Raman,22,21,18,20,21
1,2,Raman,21,20,17,22,24
2,3,Raman,14,19,15,24,23
3,1,Zuhaire,20,17,22,24,19
4,2,Zuhaire,23,15,21,25,15
5,3,Zuhaire,22,18,19,23,13
6,1,Ashravy,23,19,20,15,22
7,2,Ashravy,24,22,24,17,21
8,3,Ashravy,12,25,19,21,23
9,1,Mishti,15,22,25,22,22


### Sorting a dataframe

In [None]:
df.sort_values(by=['UT'],inplace=True)
df

Unnamed: 0,UT,Name,Maths,Science,S.St,Hindi,Eng
0,1,Raman,22,21,18,20,21
3,1,Zuhaire,20,17,22,24,19
6,1,Ashravy,23,19,20,15,22
9,1,Mishti,15,22,25,22,22
1,2,Raman,21,20,17,22,24
4,2,Zuhaire,23,15,21,25,15
7,2,Ashravy,24,22,24,17,21
10,2,Mishti,18,21,25,24,23
2,3,Raman,14,19,15,24,23
5,3,Zuhaire,22,18,19,23,13


In [None]:
df.set_index(['UT','Name'], inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Maths,Science,S.St,Hindi,Eng
UT,Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Raman,22,21,18,20,21
1,Zuhaire,20,17,22,24,19
1,Ashravy,23,19,20,15,22
1,Mishti,15,22,25,22,22
2,Raman,21,20,17,22,24
2,Zuhaire,23,15,21,25,15
2,Ashravy,24,22,24,17,21
2,Mishti,18,21,25,24,23
3,Raman,14,19,15,24,23
3,Zuhaire,22,18,19,23,13


In [None]:
df.sort_index(inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Maths,Science,S.St,Hindi,Eng
Name,UT,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashravy,1,23,19,20,15,22
Ashravy,2,24,22,24,17,21
Ashravy,3,12,25,19,21,23
Mishti,1,15,22,25,22,22
Mishti,2,18,21,25,24,23
Mishti,3,17,18,20,25,20
Raman,1,22,21,18,20,21
Raman,2,21,20,17,22,24
Raman,3,14,19,15,24,23
Zuhaire,1,20,17,22,24,19


In [None]:
df.sort_index(axis=1, inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Eng,Hindi,Maths,S.St,Science
Name,UT,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashravy,1,22,15,23,20,19
Ashravy,2,21,17,24,24,22
Ashravy,3,23,21,12,19,25
Mishti,1,22,22,15,25,22
Mishti,2,23,24,18,25,21
Mishti,3,20,25,17,20,18
Raman,1,21,20,22,18,21
Raman,2,24,22,21,17,20
Raman,3,23,24,14,15,19
Zuhaire,1,19,24,20,22,17


### Group by functions

Reference : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

### Pivot Tables

Reference : https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html

# Thank You

Arun P R