### Review of Pandas

There are two default data structures in Pandas.
- Series: Homogenous and Indexed
- Dataframe: Created from multiple series

df.head() -- return top n rows from a Dataframe
df.tail() -- return the bottom n rows from a Dataframe
df[col_name] -- return the specified series
df.shape -- returns # of rows x columns
.isin -- filter using a range or iterable
df[df[column_name] condition] -- returns rows where condition is met
df.loc[rows, columns] -- create a subsection dataframe
df.columns -- returns column names 
df.iloc[rows, columns] -- returns location of subsection

In [19]:
import pandas as pd

air_qual = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2.csv', parse_dates=True)
air_qual

Unnamed: 0,datetime,station_antwerp,station_paris,station_london
0,2019-05-07 02:00:00,,,23.0
1,2019-05-07 03:00:00,50.5,25.0,19.0
2,2019-05-07 04:00:00,45.0,27.7,19.0
3,2019-05-07 05:00:00,,50.4,16.0
4,2019-05-07 06:00:00,,61.9,
...,...,...,...,...
1030,2019-06-20 22:00:00,,21.4,
1031,2019-06-20 23:00:00,,24.9,
1032,2019-06-21 00:00:00,,26.5,
1033,2019-06-21 01:00:00,,21.8,


## Creating Derived Columns

In [20]:
air_qual['london_mg_per_cubic'] = air_qual['station_london'] * 1.882
air_qual

Unnamed: 0,datetime,station_antwerp,station_paris,station_london,london_mg_per_cubic
0,2019-05-07 02:00:00,,,23.0,43.286
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758
3,2019-05-07 05:00:00,,50.4,16.0,30.112
4,2019-05-07 06:00:00,,61.9,,
...,...,...,...,...,...
1030,2019-06-20 22:00:00,,21.4,,
1031,2019-06-20 23:00:00,,24.9,,
1032,2019-06-21 00:00:00,,26.5,,
1033,2019-06-21 01:00:00,,21.8,,


In [21]:
air_qual['avg_station'] = (air_qual['station_paris'] + air_qual['station_antwerp'] + air_qual['station_london']) / 3
air_qual

Unnamed: 0,datetime,station_antwerp,station_paris,station_london,london_mg_per_cubic,avg_station
0,2019-05-07 02:00:00,,,23.0,43.286,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,31.500000
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,30.566667
3,2019-05-07 05:00:00,,50.4,16.0,30.112,
4,2019-05-07 06:00:00,,61.9,,,
...,...,...,...,...,...,...
1030,2019-06-20 22:00:00,,21.4,,,
1031,2019-06-20 23:00:00,,24.9,,,
1032,2019-06-21 00:00:00,,26.5,,,
1033,2019-06-21 01:00:00,,21.8,,,


## Renaming Columns


In [22]:
air_qual.rename(columns={'station_antwerp' : 'BT3003'}, inplace=True)
air_qual

Unnamed: 0,datetime,BT3003,station_paris,station_london,london_mg_per_cubic,avg_station
0,2019-05-07 02:00:00,,,23.0,43.286,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,31.500000
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,30.566667
3,2019-05-07 05:00:00,,50.4,16.0,30.112,
4,2019-05-07 06:00:00,,61.9,,,
...,...,...,...,...,...,...
1030,2019-06-20 22:00:00,,21.4,,,
1031,2019-06-20 23:00:00,,24.9,,,
1032,2019-06-21 00:00:00,,26.5,,,
1033,2019-06-21 01:00:00,,21.8,,,


In [23]:
air_qual.columns = air_qual.columns.str.replace('_', " ")
air_qual

Unnamed: 0,datetime,BT3003,station paris,station london,london mg per cubic,avg station
0,2019-05-07 02:00:00,,,23.0,43.286,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,31.500000
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,30.566667
3,2019-05-07 05:00:00,,50.4,16.0,30.112,
4,2019-05-07 06:00:00,,61.9,,,
...,...,...,...,...,...,...
1030,2019-06-20 22:00:00,,21.4,,,
1031,2019-06-20 23:00:00,,24.9,,,
1032,2019-06-21 00:00:00,,26.5,,,
1033,2019-06-21 01:00:00,,21.8,,,
