# Operations on Pandas

This notebook will cover the following topics: 
* Filtering dataframes 
    * Single and multiple conditions
* Creating new columns
* Lambda functions 
* Group by and aggregate functions
* Pivot data
* Merging data frames
    * Joins and concatenations

In [2]:
import os

import pandas as pd

In [11]:
csv_weatherdata_path = os.path.join("assets", "weatherdata.csv")

df_weather = pd.read_csv(csv_weatherdata_path)
df_weather.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0


**^** The dataframe above is an example of a timeseries data.

# #Filtering
**Question)** Find the days which had sunshine for more that 4 hours. These days will have increased sales of sunscreen. 

In [15]:
df_weather['Sunshine'] > 4

0         False
1         False
2         False
3         False
4         False
          ...  
142188    False
142189    False
142190    False
142191    False
142192    False
Name: Sunshine, Length: 142193, dtype: bool

In [None]:
# Filtering all rows where Sunshine > 4

df_weather[df_weather['Sunshine'] > 4]

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed
5939,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,SSW,48.0
5940,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,S,37.0
5941,2009-01-03,Cobar,15.5,34.1,0.0,12.6,13.3,SE,30.0
5942,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,NNE,46.0
5943,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,WNW,31.0
...,...,...,...,...,...,...,...,...,...
139108,2017-06-20,Darwin,19.3,33.4,0.0,6.0,11.0,ENE,35.0
139109,2017-06-21,Darwin,21.2,32.6,0.0,7.6,8.6,E,37.0
139110,2017-06-22,Darwin,20.7,32.8,0.0,5.6,11.0,E,33.0
139111,2017-06-23,Darwin,19.5,31.8,0.0,6.2,10.6,ESE,26.0


**Question)** The cold drink sales will most likely increase on the days which have high sunshine(>5) and high max temperature(>35). Use the filter operation to filter out these days

In [17]:
df_weather[(df_weather['Sunshine'] > 5) & (df_weather['MaxTemp'] > 35)]

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed
5939,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,SSW,48.0
5942,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,NNE,46.0
5943,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,WNW,31.0
5944,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,WNW,35.0
5948,2009-01-10,Cobar,19.0,35.5,0.0,12.0,12.3,ENE,48.0
...,...,...,...,...,...,...,...,...,...
138862,2016-10-17,Darwin,25.1,35.2,0.0,7.4,11.5,NNE,39.0
138879,2016-11-03,Darwin,24.4,35.5,0.0,7.8,9.9,NW,35.0
138892,2016-11-16,Darwin,25.7,35.2,0.0,5.4,11.3,NW,26.0
138905,2016-11-29,Darwin,25.8,35.1,0.8,4.8,6.4,SSE,46.0


**^ Note:** The construction of the filter condition, it has individual filter conditions separated in parenthesis

## #Datetime Index

**Syntax:** `pd.datetime(series)`


In [19]:
pd.DatetimeIndex(df_weather['Date'])

DatetimeIndex(['2008-12-01', '2008-12-02', '2008-12-03', '2008-12-04',
               '2008-12-05', '2008-12-06', '2008-12-07', '2008-12-08',
               '2008-12-09', '2008-12-10',
               ...
               '2017-06-15', '2017-06-16', '2017-06-17', '2017-06-18',
               '2017-06-19', '2017-06-20', '2017-06-21', '2017-06-22',
               '2017-06-23', '2017-06-24'],
              dtype='datetime64[ns]', name='Date', length=142193, freq=None)

In [20]:
pd.DatetimeIndex(df_weather['Date']).year

Index([2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
       ...
       2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017],
      dtype='int32', name='Date', length=142193)

In [21]:
pd.DatetimeIndex(df_weather['Date']).month

Index([12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
       ...
        6,  6,  6,  6,  6,  6,  6,  6,  6,  6],
      dtype='int32', name='Date', length=142193)

In [22]:
pd.DatetimeIndex(df_weather['Date']).day

Index([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       ...
       15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
      dtype='int32', name='Date', length=142193)

## #Adding columns to a DataFrame
**Syntax:** `DataFrame_name['new_column_name'] = Series  # operation that returns a Series`

In [23]:
df_weather['Year'] = pd.DatetimeIndex(df_weather['Date']).year
df_weather.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,Year
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,2008
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,2008
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,2008
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,2008
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,2008


In [24]:
df_weather['month'] = pd.DatetimeIndex(df_weather['Date']).month
df_weather['Day of month'] = pd.DatetimeIndex(df_weather['Date']).day

df_weather.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,Year,month,Day of month
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,2008,12,1
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,2008,12,2
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,2008,12,3
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,2008,12,4
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,2008,12,5



**Question)** The temperature given is in Celcius, convert it in Fahrenheit and store it in a new column. 

In [26]:
df_weather['MaxTemp_F'] = df_weather['MaxTemp'] * 9/5 + 32
df_weather.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,Year,month,Day of month,MaxTemp_F
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,2008,12,1,73.22
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,2008,12,2,77.18
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,2008,12,3,78.26
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,2008,12,4,82.4
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,2008,12,5,90.14


## #Using Lambda functions.

**Exercise:** create a new column which highlights the days which have rainfall more than 50 mm as rainy days and the rest are not.

In [28]:
df_weather['Rainfall'].head()

0    0.6
1    0.0
2    0.0
3    0.0
4    1.0
Name: Rainfall, dtype: float64

In [None]:
df_weather['Rainfall'].apply(lambda x: 'Rainy' if x > 50 else 'Not Rainy')

# df_weather.Rainfall.apply(lambda x: 'Rainy' if x > 50 else 'Not Rainy')

0         Not Rainy
1         Not Rainy
2         Not Rainy
3         Not Rainy
4         Not Rainy
            ...    
142188    Not Rainy
142189    Not Rainy
142190    Not Rainy
142191    Not Rainy
142192    Not Rainy
Name: Rainfall, Length: 142193, dtype: object

In [32]:
df_weather['is_rainy'] = df_weather['Rainfall'].apply(lambda x: 'Rainy' if x > 50 else 'Not Rainy')
df_weather.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,Year,month,Day of month,MaxTemp_F,is_rainy
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,2008,12,1,73.22,Not Rainy
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,2008,12,2,77.18,Not Rainy
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,2008,12,3,78.26,Not Rainy
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,2008,12,4,82.4,Not Rainy
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,2008,12,5,90.14,Not Rainy


In [None]:
# Extract all rows where it was raining

df_weather[df_weather['is_rainy'] == 'Rainy']

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,Year,month,Day of month,MaxTemp_F,is_rainy
429,2010-02-05,Albury,19.2,26.1,52.2,,,SE,33.0,2010,2,5,78.98,Rainy
455,2010-03-08,Albury,18.1,25.5,66.0,,,NW,56.0,2010,3,8,77.90,Rainy
690,2010-10-31,Albury,13.8,18.7,50.8,,,NNW,52.0,2010,10,31,65.66,Rainy
704,2010-11-14,Albury,19.2,22.6,52.6,,,N,26.0,2010,11,14,72.68,Rainy
787,2011-02-05,Albury,20.4,23.0,99.2,,,NW,28.0,2011,2,5,73.40,Rainy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
140532,2017-02-03,Katherine,23.4,33.0,62.0,,,NNW,33.0,2017,2,3,91.40,Rainy
140571,2017-03-14,Katherine,23.0,35.0,79.0,31.0,,ESE,22.0,2017,3,14,95.00,Rainy
140578,2017-03-22,Katherine,24.1,34.5,61.4,,,N,31.0,2017,3,22,94.10,Rainy
142013,2016-12-26,Uluru,22.1,27.4,83.8,,,ENE,72.0,2016,12,26,81.32,Rainy
