# Pandas

What is pandas? Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

It is very popular in the data science community. What is data science? Data Science is a process of analysing large set of data points to get answers on questions related to that data set.

# Installation 

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install [Pandas](https://pypi.org/project/pandas/).

```bash
pip install pandas
```

Check it with 

In [1]:
import pandas as pd


Hope you didn't get any errors during the execution of the above commamnd. If you did, please check your Pandas installation again

# Basics

## DataFrames

It is used to represent data with rows and columns. Pretty much the same as an excel sheet

Lets see how we can load one into our program. 
I will be using the *data1.csv* file for this session


In [2]:
df = pd.read_csv('data1.csv')
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


## Viewing



The number of rows and columns can be viewed by using the **shape** function

In [3]:
cols, rows = df.shape
print(cols, rows)

6 4


You can view the first few or the last few rows by using the **head** and **tail** commands. Optional parameters of number can be provided

In [4]:
df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


To view the columns present in the data frame

In [5]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

## Slicing and Indexing

A very straight-forward topic. Pass the same way as you do for lists/ arrays

In [6]:
df[1:3]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


## Some more types of Viewing

Call the columns you want by accesing them like how you would in a python dictionary

In [7]:
df[['day', 'event']]

Unnamed: 0,day,event
0,1/1/2017,Rain
1,1/2/2017,Sunny
2,1/3/2017,Snow
3,1/4/2017,Snow
4,1/5/2017,Rain
5,1/6/2017,Sunny


## Max, Min, etc

Try the max, min, mean, count or percentiles functions

In [8]:
df['temperature'].max()

35

In [9]:
df['temperature'].min()

24

___

A very useful way to get all stats is by using **describe**

Note: This works only for columns having numbers


In [10]:
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


## SQL of Pandas

You can provide required parameters to you hearts content. They are intutive and simple. I highly recommend you to play around with these. 

A few examples given below will get yoyu started

In [11]:
df[df.temperature>=30]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


---

In [12]:
df[df.temperature == df.temperature.max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


___

In [13]:
df[ ['day', 'temperature'] ]  [ df['temperature'] == df['temperature'].max() ]

Unnamed: 0,day,temperature
1,1/2/2017,35


___

## Indices

You can optionaly set your own column to be the index of your data frame using the **set_index** function. Optional arguement of **inplace = True** can be provided to overwrite the data frame to have the index as the custom index. 

In [14]:
df.set_index('day')

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,31,2,Sunny


In [15]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


--- 
You can see it didn't change it in the original data frame but just returned it. Lets see what happens with the inplace arguement. 
Hint : It's equivalent to 
```python 
df = df.set_index('day')
```

In [16]:
df.set_index('day', inplace=True)

In [17]:
df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,31,2,Sunny


---
I can call for the row by accessing it by the index key

In [18]:
df.loc['1/2/2017']

temperature       35
windspeed          7
event          Sunny
Name: 1/2/2017, dtype: object

___ 
Reset to original index using the **reset_index** 
Don't forget the *inplace=True* placeholder