# Pandas stuff

- Pandas is a library for data manipulation. It is highly compatible with Numpy

- Have a look at its [documentation](https://pandas.pydata.org/docs/)

- Import the libraries in a similar way

In [1]:
import numpy as np
import pandas as pd

- We first try to generate a random matrix using numpy and then convert it to dataframe using pandas

In [19]:
M1 = np.random.randn(4,3)
df = pd.DataFrame(M1)
M1

array([[ 0.18958499,  2.08857389, -0.81712981],
       [-0.44601169,  0.91602901, -1.14629838],
       [-1.5102345 , -2.98845593,  1.01898122],
       [-1.2425057 , -0.29850487, -0.48765869]])

In [20]:
df

Unnamed: 0,0,1,2
0,0.189585,2.088574,-0.81713
1,-0.446012,0.916029,-1.146298
2,-1.510235,-2.988456,1.018981
3,-1.242506,-0.298505,-0.487659


- Access the rows and columns of the dataframe using ```df.index``` and ```df.columns```

In [21]:
df.columns = ["SPI","CPI","Motivation"]              #Giving a meaning to the random table
df.index = ["sem1", "sem2", "sem3", "sem4"]
df

Unnamed: 0,SPI,CPI,Motivation
sem1,0.189585,2.088574,-0.81713
sem2,-0.446012,0.916029,-1.146298
sem3,-1.510235,-2.988456,1.018981
sem4,-1.242506,-0.298505,-0.487659


- Access individual index and columns in this way:

In [24]:
df.Motivation           #Have a look at My Motivation across semesters

sem1   -0.817130
sem2   -1.146298
sem3    1.018981
sem4   -0.487659
Name: Motivation, dtype: float64

- Use ```df[conditions]``` to filter out the data 

In [25]:
condition = df[df.Motivation < 0]       #Semesters Where I had Less than ZERO Motivation
condition

Unnamed: 0,SPI,CPI,Motivation
sem1,0.189585,2.088574,-0.81713
sem2,-0.446012,0.916029,-1.146298
sem4,-1.242506,-0.298505,-0.487659


- ```df.apply``` is used for applying a specific function to a column

In [28]:
df["Motivation"].apply(np.round)   #rounds off the Motivation column

sem1   -1.0
sem2   -1.0
sem3    1.0
sem4   -0.0
Name: Motivation, dtype: float64

- ```df.loc[condition, 'column'] = new_value``` is used to change all rows of a specific column based on a condition

In [30]:
df.loc[df.CPI > 1, "Motivation"] = 1   #changes motivation to 1 where cpi > 1
df

Unnamed: 0,SPI,CPI,Motivation
sem1,0.189585,2.088574,1.0
sem2,-0.446012,0.916029,-1.146298
sem3,-1.510235,-2.988456,1.018981
sem4,-1.242506,-0.298505,-0.487659


- ```df.sort_values``` is used to sort columns

In [33]:
df1 = df.sort_values(by=["SPI"], ascending=False)     #Sorts highest to lowest SPI
df1

Unnamed: 0,SPI,CPI,Motivation
sem1,0.189585,2.088574,1.0
sem2,-0.446012,0.916029,-1.146298
sem4,-1.242506,-0.298505,-0.487659
sem3,-1.510235,-2.988456,1.018981


- You can explore more useful functions like these, and try it out on your notebooks!