# Cleaning methodology for pollutant data

This notebook contains the key steps taken to gather and clean air pollution data. As the data set is quite large and can only be downloaded in small chunks, we are currently investigation different ways of downloading the data in order to increase efficiency. As we continue to explore the data, more substantial filtering and cleaning will take place. 

In [1]:
import pandas as pd
import numpy as np

## Step 1: Identify boundary for monitoring. 

Having manually sifted through the data, we were able to split several monitoring sites into "inner" and "outer" ranges based on their geographic location. The data points closest to our area of focus (Heathrow Airport) have been categorised as "inner" locations and will be vital in our investigation into emissions. The "outer" locations will be used as a way of comparisson to gain a deeper understanding of the scale of the impact arising from air pollution. 

|CCG | Borough| Inner location - monitoring station| Outer location - monitoring site|
|---:|:-----|:-----------|:------------|
|Hillingdon |Hillingdon| Hillingdon South Ruislip, Hillingdon 2 Hillingdon Hospital, Hillingdon Oxford Avenue, Hillingdon Harmondsworth, Hillingdon Harmondsworth Osiris, Hillingdon Hayes, Heathrow LHR2, Heathrow Bath Road, Hillingdon Sipson, Heathrow Green Gates| |
|East Berkshire | Slough |Slough Town Centre Wellington Street,Slough Brands Hill London Road,Slough Windmill Bath Road,Slough Colnbrook,Slough Town Centre A4,Slough Lakeside 1 Osiris,Slough Colnbrook Osiris,Slough Chalvey,Slough Lakeside 2,Slough Lakeside 2 Osiris,Slough - Dennis Way LP11,Slough - Monksfield Way LP20,Slough - The Hawthorns LP2,Slough - Erica Close LP3,Slough - Hatton Avenue LP13,Slough - St Andrews Way LP12,Slough - The Hawthorns LP10,Slough - Francis Way LP13,Slough - The Hawthorns LP1,Slough - Monksfield Way LP19,Slough - Brighton Spur LP3,Slough - Bower Way LP1,Slough - Hatton Avenue LP3,Slough - Cinder Track LP37|
|Hounslow| Hounslow | Hounslow Cranford, Hounslow Chiswick, Hounslow Brentford, Hounslow Heston, Hounslow Hatton Cross, Hounslow Feltham, Hounslow Gunnersbury |
|Ealing | Ealing | Ealing Horn Lane
|Buckinghamshire|South Bucks|Iver Thorney Lane North, Iver North Park Road, Iver Primary School|
|Surrey Heartlands|Richmond|- |Elmbridge|
| - | Spelthorne | Spelthorne Shepperton Squire's Bridge Road, Spelthorne knowle Green, Spelthorne Sunbury Cross, Heathrow Oaks Road|
| - |Waverly and Woking | - | H&F Shepherd’s Bush, Godalming Ockford Road 2|
|South West London| Richmond | London Teddington Bushy Park. |
|Hammersmith & Fulham|London Borough of Hammersmith and Fulham|-| H&F Hammersmith Town Centre, H&F Shepherd’s Bush|
|Watford|Hertfordshire and Bedfordshire|-|Watford Town Hall|
|Oxfordshire|Oxfordshire|-|Oxford High St, Oxford St Ebbes (Cal Club), Oxford Center Roadside, Oxford St Ebbes|
| Berkshire West| Readiing| - | Reading Caversham Road, Reading Oxford Road, Reading London Road, Reading New Town|

## Step 2: Format the Data

In [2]:
boroughs = ['hammersmithAndFulham', 'oxford', 'reading', 'watford']

In [3]:
li = []

for borough in boroughs:
    print(borough)
    df = pd.read_excel('./raw_data/{}.xlsx'.format(borough), header=[0,1], sheet_name=None)
    df = pd.concat(df.values(), axis=0)
    df = pd.DataFrame(df, columns=df.keys())
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=False)

hammersmithAndFulham
oxford
reading
watford


In [4]:
# frame.to_excel("concatinated_pollution_data.xlsx")

In [5]:
# df = pd.read_excel('concatinated_pollution_data.xlsx', header=[0,1], sheet_name=None)
# df = pd.concat(df.values(), axis=0)
# df = pd.DataFrame(df, columns=df.keys())
# df.head()

In [6]:
# pd.set_option('display.max_columns', 100)

## Step 3: Unify missing data

In [7]:
#frame = frame.rename(columns={"index": "Date", "Unnamed: 0_level_1": "Hour"})
#frame = frame.set_index(["Date","Time"])
#frame = frame["Date"].dt.date
#frame = frame.reset_index()
#frame = frame.rename(columns={"Date": "Index", "index": "Date", "Unnamed: 0_level_1": "Hour"})
frame

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,...,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,...,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Carbon monoxide,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,01:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,02:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,03:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,04:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,05:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,06:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,07:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,08:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,09:00:00,,,,,,
2004-01-01 00:00:00,No data,No data,No data,No data,No data,No data,No data,No data,No data,No data,...,,No data,No data,10:00:00,,,,,,


In [8]:
frame.columns

MultiIndex(levels=[['Broadway', 'Brook Green', 'H&F Hammersmith Town Centre', 'H&F Shepherd's Bush', 'Oxford High St', 'Oxford St Ebbes (Cal Club)', 'Reading Caversham Road', 'Reading Kings Road', 'Reading Oxford Road', 'Scrubs Lane', 'Time', 'Watford Roadside', 'Watford Town Hall'], ['Carbon monoxide', 'Nitrogen dioxide', 'Ozone', 'PM10 particulate matter (Hourly measured)', 'PM2.5 particulate matter (Hourly measured)', 'Sulphur dioxide', 'Unnamed: 0_level_1']],
           labels=[[0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 11, 11, 12, 12, 12], [1, 3, 5, 1, 3, 1, 2, 3, 4, 1, 3, 1, 3, 2, 1, 3, 1, 3, 1, 3, 1, 3, 6, 0, 1, 2, 1, 3, 4]],
           names=['Date', None],
           sortorder=0)

In [9]:
df = frame.replace('No data', np.nan)
df = df.replace('No Data', np.nan)
df.head()

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,...,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,...,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Carbon monoxide,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01 00:00:00,,,,,,,,,,,...,,,,01:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,02:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,03:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,04:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,05:00:00,,,,,,


In [10]:
#df.reset_index()
#df.set_index(['Date', 'Time'], inplace=True)
df

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,...,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,...,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Carbon monoxide,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01 00:00:00,,,,,,,,,,,...,,,,01:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,02:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,03:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,04:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,05:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,06:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,07:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,08:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,09:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,10:00:00,,,,,,


## Step 4: Identify closed monitoring stations

In [11]:
df.dropna(axis=1, how='all')
df

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,...,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,...,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Carbon monoxide,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01 00:00:00,,,,,,,,,,,...,,,,01:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,02:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,03:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,04:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,05:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,06:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,07:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,08:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,09:00:00,,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,10:00:00,,,,,,


In [12]:
non_null_columns = [col for col in df.columns if df.loc[:, col].notna().any()]
open_monitoring_sites = df[non_null_columns]
open_monitoring_sites

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,...,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,...,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,01:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,02:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,03:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,04:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,05:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,06:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,07:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,08:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,09:00:00,,,,,
2004-01-01 00:00:00,,,,,,,,,,,...,,,,,10:00:00,,,,,


## Step 5: Setting Date and Time columns to Datetime

In [13]:
# open_monitoring_sites[("Time", "Unnamed: 0_level_1")] = [str(x)[-8:] for x in open_monitoring_sites[("Time", "Unnamed: 0_level_1")]]

In [14]:
open_monitoring_sites.columns.get_level_values(0)

Index(['Broadway', 'Broadway', 'Broadway', 'Brook Green', 'Brook Green',
       'H&F Hammersmith Town Centre', 'H&F Hammersmith Town Centre',
       'H&F Hammersmith Town Centre', 'H&F Hammersmith Town Centre',
       'H&F Shepherd's Bush', 'H&F Shepherd's Bush', 'Oxford High St',
       'Oxford High St', 'Oxford St Ebbes (Cal Club)',
       'Reading Caversham Road', 'Reading Caversham Road',
       'Reading Kings Road', 'Reading Kings Road', 'Reading Oxford Road',
       'Reading Oxford Road', 'Scrubs Lane', 'Scrubs Lane', 'Time',
       'Watford Roadside', 'Watford Roadside', 'Watford Town Hall',
       'Watford Town Hall', 'Watford Town Hall'],
      dtype='object', name='Date')

In [15]:
hourlyOuter = open_monitoring_sites.reset_index()
hourlyOuter

Date,index,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,...,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Time,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Unnamed: 1_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),...,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Unnamed: 0_level_1,Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
0,2004-01-01 00:00:00,,,,,,,,,,...,,,,,01:00:00,,,,,
1,2004-01-01 00:00:00,,,,,,,,,,...,,,,,02:00:00,,,,,
2,2004-01-01 00:00:00,,,,,,,,,,...,,,,,03:00:00,,,,,
3,2004-01-01 00:00:00,,,,,,,,,,...,,,,,04:00:00,,,,,
4,2004-01-01 00:00:00,,,,,,,,,,...,,,,,05:00:00,,,,,
5,2004-01-01 00:00:00,,,,,,,,,,...,,,,,06:00:00,,,,,
6,2004-01-01 00:00:00,,,,,,,,,,...,,,,,07:00:00,,,,,
7,2004-01-01 00:00:00,,,,,,,,,,...,,,,,08:00:00,,,,,
8,2004-01-01 00:00:00,,,,,,,,,,...,,,,,09:00:00,,,,,
9,2004-01-01 00:00:00,,,,,,,,,,...,,,,,10:00:00,,,,,


In [16]:
hourlyOuter.columns

MultiIndex(levels=[['Broadway', 'Brook Green', 'H&F Hammersmith Town Centre', 'H&F Shepherd's Bush', 'Oxford High St', 'Oxford St Ebbes (Cal Club)', 'Reading Caversham Road', 'Reading Kings Road', 'Reading Oxford Road', 'Scrubs Lane', 'Time', 'Watford Roadside', 'Watford Town Hall', 'index'], ['Carbon monoxide', 'Nitrogen dioxide', 'Ozone', 'PM10 particulate matter (Hourly measured)', 'PM2.5 particulate matter (Hourly measured)', 'Sulphur dioxide', 'Unnamed: 0_level_1', '']],
           labels=[[13, 0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12], [7, 1, 3, 5, 1, 3, 1, 2, 3, 4, 1, 3, 1, 3, 2, 1, 3, 1, 3, 1, 3, 1, 3, 6, 1, 2, 1, 3, 4]],
           names=['Date', None])

In [41]:
hourlyOuter = hourlyOuter.set_index([('index', ''), ("Time", "Unnamed: 0_level_1")], inplace = True, drop = False)
hourlyOuter

TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.

In [49]:
hourlyOuter = open_monitoring_sites.stack(0, dropna=True).rename_axis(('Date', 'Location'))
hourlyOuter

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,H&F Shepherd's Bush,Oxford High St,Oxford High St,Oxford St Ebbes (Cal Club),Reading Caversham Road,Reading Caversham Road,Reading Kings Road,Reading Kings Road,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
index,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2
2004-01-05,,,,58.33125,32.1,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,56.80125,17.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,64.45125,18.4,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,66.93750,24.8,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,67.32000,19.1,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,64.83375,4.7,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,61.20000,36.3,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,52.21125,26.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,39.01500,37.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,33.46875,8.6,,,,,,,,,,,,,,,,,,,,,,


In [17]:
hourlyOuter.to_csv("hourlyOuter.csv")

In [18]:
#open_monitoring_sites[("index", "")] = [str(x)[0:10] for x in open_monitoring_sites[("index", "")]]
open_monitoring_sites.info()

<class 'pandas.core.frame.DataFrame'>
Index: 567944 entries, 2004-01-01 00:00:00 to End
Data columns (total 28 columns):
(Broadway, Nitrogen dioxide)                                                 37098 non-null float64
(Broadway, PM10 particulate matter (Hourly measured))                        29386 non-null float64
(Broadway, Sulphur dioxide)                                                  21209 non-null float64
(Brook Green, Nitrogen dioxide)                                              41795 non-null float64
(Brook Green, PM10 particulate matter (Hourly measured))                     41586 non-null float64
(H&F Hammersmith Town Centre, Nitrogen dioxide)                              14201 non-null float64
(H&F Hammersmith Town Centre, Ozone)                                         14952 non-null float64
(H&F Hammersmith Town Centre, PM10 particulate matter (Hourly measured))     14785 non-null float64
(H&F Hammersmith Town Centre, PM2.5 particulate matter (Hourly measured))    14

In [19]:
#open_monitoring_sites = open_monitoring_sites.set_index("index")
open_monitoring_sites.drop(columns=["Time"], axis = 1, inplace = True)
open_monitoring_sites.dropna(how='all', inplace = True)
pd.set_option('display.max_columns', None)
open_monitoring_sites.index = pd.to_datetime(open_monitoring_sites.index)
open_monitoring_sites.info()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 527206 entries, 2004-01-05 to 2020-12-01
Data columns (total 27 columns):
(Broadway, Nitrogen dioxide)                                                 37098 non-null float64
(Broadway, PM10 particulate matter (Hourly measured))                        29386 non-null float64
(Broadway, Sulphur dioxide)                                                  21209 non-null float64
(Brook Green, Nitrogen dioxide)                                              41795 non-null float64
(Brook Green, PM10 particulate matter (Hourly measured))                     41586 non-null float64
(H&F Hammersmith Town Centre, Nitrogen dioxide)                              14201 non-null float64
(H&F Hammersmith Town Centre, Ozone)                                         14952 non-null float64
(H&F Hammersmith Town Centre, PM10 particulate matter (Hourly measured))     14785 non-null float64
(H&F Hammersmith Town Centre, PM2.5 particulate matter (Hourly measured))

convert to datetime

In [20]:
#open_monitoring_sites = open_monitoring_sites.reset_index()
#open_monitoring_sites['index'] = pd.to_datetime(open_monitoring_sites['index'], format='%Y-%m-%d',utc=True)
#open_monitoring_sites[("index", "")] = [str(x)[0:10] for x in open_monitoring_sites[("index", "")]]
#open_monitoring_sites = open_monitoring_sites.set_index("index")
open_monitoring_sites

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,H&F Shepherd's Bush,Oxford High St,Oxford High St,Oxford St Ebbes (Cal Club),Reading Caversham Road,Reading Caversham Road,Reading Kings Road,Reading Kings Road,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-05,,,,58.33125,32.1,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,56.80125,17.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,64.45125,18.4,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,66.93750,24.8,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,67.32000,19.1,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,64.83375,4.7,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,61.20000,36.3,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,52.21125,26.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,39.01500,37.5,,,,,,,,,,,,,,,,,,,,,,
2004-01-05,,,,33.46875,8.6,,,,,,,,,,,,,,,,,,,,,,


aggregate to daily

In [21]:
import datetime
dailyOuter = open_monitoring_sites.resample('D').mean()
#open_monitoring_sites.groupby(pd.Grouper(freq='1D', level=0)).mean()
dailyOuter

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,H&F Shepherd's Bush,Oxford High St,Oxford High St,Oxford St Ebbes (Cal Club),Reading Caversham Road,Reading Caversham Road,Reading Kings Road,Reading Kings Road,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-01,,,,,,,,,,,,36.913043,12.208333,50.782609,,,,,,,,,25.291667,38.416667,,21.083333,
2004-01-02,,,,,,,,,,,,59.000000,31.500000,22.666667,,,,,,,,,36.083333,17.083333,,29.000000,
2004-01-03,,,,,,,,,,,,53.875000,27.958333,5.750000,,,,,,,,,54.875000,3.416667,,41.416667,
2004-01-04,,,,,,,,,,,,60.875000,27.708333,15.583333,,,,,,,,,38.083333,11.666667,,35.458333,
2004-01-05,,,,51.573750,21.533333,,,,,,,49.875000,14.083333,23.666667,,,,,,,,,43.625000,10.173913,,20.416667,
2004-01-06,,,,44.768438,26.470833,,,,,,,56.083333,21.541667,40.750000,,,,,,,,,40.375000,27.500000,,26.708333,
2004-01-07,,,,34.385156,21.622727,,,,,,,38.125000,20.833333,32.333333,,,,,,,,,36.250000,27.000000,,26.954545,
2004-01-08,,,,27.412500,16.883333,,,,,,,28.409091,11.875000,54.916667,,,,,,,,,26.083333,41.166667,,18.208333,
2004-01-09,,,,44.776406,10.755556,,,,,,,64.166667,24.916667,48.416667,,,,,,,,,40.125000,34.416667,,19.208333,
2004-01-10,,,,47.262656,,,,,,,,39.791667,27.000000,36.416667,,,,,,,,,43.375000,19.500000,,23.083333,


In [22]:
monthlyOuter = dailyOuter.resample('M').mean()
monthlyOuter

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,H&F Shepherd's Bush,Oxford High St,Oxford High St,Oxford St Ebbes (Cal Club),Reading Caversham Road,Reading Caversham Road,Reading Kings Road,Reading Kings Road,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-01-31,64.905934,30.345425,13.844517,42.480162,17.608884,,,,,,,57.440031,23.602151,38.675783,,,,,,,,,39.654776,28.162948,,21.575339,
2004-02-29,78.543240,30.822677,13.715188,43.149878,25.101131,,,,,,,67.152349,35.455068,37.974138,,,,,,,,,49.100280,37.662430,,29.661726,
2004-03-31,88.475190,38.453748,13.255275,47.358281,30.381672,,,,,,,68.916231,33.135753,45.091164,,,,,,,,,45.520278,43.936554,,33.665160,
2004-04-30,69.550567,18.498929,,46.261747,24.915628,,,,,,,69.260732,27.215217,51.538889,,,,,,,,,44.210266,50.645070,,25.997283,
2004-05-31,,26.007404,,38.657640,23.690325,,,,,,,69.631162,33.325521,46.887097,,,,,,,,,42.678062,44.713184,,26.009759,
2004-06-30,,24.278234,,26.030984,16.278429,,,,,,,58.260200,22.622094,47.200505,,,,,,,,,29.673527,47.137071,,16.887154,
2004-07-31,58.460819,29.527998,5.205597,26.933920,18.725994,,,,,,,59.093720,24.283719,44.371736,,,,,,,,,32.682598,41.682796,,17.913721,
2004-08-31,98.589786,33.225742,6.269515,34.656094,21.595582,,,,,,,55.775995,26.044990,49.487538,,,,,,,,,30.210963,47.688523,,23.123045,
2004-09-30,65.070284,30.790088,4.022235,36.435250,22.291243,,,,,,,59.060402,29.149494,40.230808,,,,,,,,,30.098068,39.487681,,21.870894,
2004-10-31,84.003286,29.096106,2.816355,37.948875,18.431791,,,,,,,56.159865,23.334327,34.350631,,,,,,,,,32.935676,30.039800,,18.428047,


In [23]:
yearlyOuter = monthlyOuter.resample('Y').mean()
yearlyOuter

Date,Broadway,Broadway,Broadway,Brook Green,Brook Green,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Hammersmith Town Centre,H&F Shepherd's Bush,H&F Shepherd's Bush,Oxford High St,Oxford High St,Oxford St Ebbes (Cal Club),Reading Caversham Road,Reading Caversham Road,Reading Kings Road,Reading Kings Road,Reading Oxford Road,Reading Oxford Road,Scrubs Lane,Scrubs Lane,Watford Roadside,Watford Roadside,Watford Town Hall,Watford Town Hall,Watford Town Hall
Unnamed: 0_level_1,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Sulphur dioxide,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,PM10 particulate matter (Hourly measured),Nitrogen dioxide,Ozone,Nitrogen dioxide,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured)
2004-12-31,76.864912,29.886028,9.171105,40.446501,22.430841,,,,,,,63.545897,27.789444,40.376279,,,,,,,,,38.836547,37.226716,,24.200917,
2005-12-31,73.645854,32.511715,8.271812,39.992881,23.61579,,,,,,,49.362863,25.594434,39.652825,,,,,,,43.337274,36.718461,37.958796,35.039582,,24.798499,
2006-12-31,83.225576,30.188318,6.046027,39.172744,22.460462,,,,,,,55.820565,28.368755,45.274464,39.607957,,,,,,,,37.260015,39.207049,,23.530552,
2007-12-31,82.886805,29.188278,4.742114,37.434564,22.075413,,,,,,,58.257026,25.961705,39.945394,41.77794,27.2706,67.692945,,50.04151,36.082138,,,35.79176,43.263053,,23.221057,
2008-12-31,74.42444,29.484994,,38.729231,22.24255,,,,,,,54.460016,24.872508,41.216238,39.537848,24.068738,55.607601,23.009083,35.180101,26.515397,,,,,33.700041,21.123616,
2009-12-31,,,,50.337135,20.222782,,,,,,,54.833902,25.106133,46.349644,45.26161,27.457637,51.048373,22.119648,36.632995,24.910811,,,,,39.265491,21.864279,
2010-12-31,,,,,,,,,,,,60.259746,23.630868,49.67752,48.390188,26.590733,53.390083,25.553894,39.187717,24.514128,,,,,38.791007,22.7182,
2011-12-31,,,,,,,,,,83.648985,32.772381,52.553017,22.913411,38.608096,47.474231,32.145254,52.530364,30.821617,37.864561,25.297234,,,,,38.547865,24.512667,
2012-12-31,,,,,,,,,,92.041562,37.709165,57.90692,20.511619,35.10002,46.647561,32.732919,53.490968,28.627653,38.480783,23.490679,,,,,38.553193,21.794203,
2013-12-31,,,,,,,,,,77.797963,26.497921,52.14528,22.290244,36.761817,43.501107,29.646648,60.544799,23.82819,36.996852,17.398115,,,,,39.230868,23.628673,


## Step 6: Removing Multi-tier Columns

In [24]:
dailyOuterStacked = dailyOuter.stack(0, dropna=True).rename_axis(('Date', 'Location'))

In [25]:
monthlyOuterStacked = monthlyOuter.stack(0, dropna=True).rename_axis(('Date','Location'))

In [26]:
yearlyOuterStacked = yearlyOuter.stack(0, dropna=True).rename_axis(('Date','Location'))

## Dictionary to CCG

In [27]:
dailyOuterStacked.index.get_level_values(1).unique()

Index(['Oxford High St', 'Oxford St Ebbes (Cal Club)', 'Watford Roadside',
       'Watford Town Hall', 'Brook Green', 'Broadway', 'Scrubs Lane',
       'Reading Caversham Road', 'Reading Kings Road', 'Reading Oxford Road',
       'H&F Shepherd's Bush', 'H&F Hammersmith Town Centre'],
      dtype='object', name='Location')

In [28]:
def toCCG(x):
    if x == 'Oxford High St' or x == 'Oxford St Ebbes (Cal Club)':
        return "Oxford"
    elif x == 'Watford Roadside' or x == 'Watford Town Hall':
        return "Watford"
    elif x == 'Brook Green' or x == 'Broadway' or x== "H&F Shepherd's Bush" or x== 'H&F Hammersmith Town Centre' or x=='Scrubs Lane':
        return "Hammersmith and Fulham"
    elif x == 'Reading Caversham Road' or x== 'Reading Kings Road' or x=='Reading Oxford Road':
        return "Reading"
    else: return x

In [29]:
dailyOuterStacked["CCG"] = [toCCG(x) for x in dailyOuterStacked.index.get_level_values(1)]
monthlyOuterStacked["CCG"] = [toCCG(x) for x in monthlyOuterStacked.index.get_level_values(1)]
yearlyOuterStacked["CCG"] = [toCCG(x) for x in yearlyOuterStacked.index.get_level_values(1)]

dailyOuterCCG is grouped by CCG

In [30]:
dailyOuterCCG = dailyOuterStacked.groupby(["CCG"])#.resample('D').mean()
monthlyOuterCCG = monthlyOuterStacked.groupby(["CCG"])#.resample('M').mean()
yearlyOuterCCG = yearlyOuterStacked.groupby(["CCG"])#.resample('Y').mean()

In [31]:
#dailyOuterCCG.set_index("CCG", append=True, inplace=True)
dailyOuterCCG.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,CCG
Date,Location,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2004-01-01,Oxford High St,36.913043,,12.208333,,,Oxford
2004-01-01,Oxford St Ebbes (Cal Club),,50.782609,,,,Oxford
2004-01-01,Watford Roadside,25.291667,38.416667,,,,Watford
2004-01-01,Watford Town Hall,,,21.083333,,,Watford
2004-01-02,Oxford High St,59.0,,31.5,,,Oxford
2004-01-02,Oxford St Ebbes (Cal Club),,22.666667,,,,Oxford
2004-01-02,Watford Roadside,36.083333,17.083333,,,,Watford
2004-01-02,Watford Town Hall,,,29.0,,,Watford
2004-01-03,Oxford High St,53.875,,27.958333,,,Oxford
2004-01-03,Watford Roadside,54.875,3.416667,,,,Watford


trying to do daily, monthly, yearly by CCG

In [32]:
dailyOuterCCGgrouped = dailyOuterStacked.pivot(columns="CCG")
dailyOuterCCGgrouped.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
Unnamed: 0_level_1,CCG,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Location,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2004-01-01,Oxford High St,,36.913043,,,,,,,,12.208333,,,,,,,,,,
2004-01-01,Oxford St Ebbes (Cal Club),,,,,,50.782609,,,,,,,,,,,,,,
2004-01-01,Watford Roadside,,,,25.291667,,,,38.416667,,,,,,,,,,,,
2004-01-01,Watford Town Hall,,,,,,,,,,,,21.083333,,,,,,,,
2004-01-02,Oxford High St,,59.0,,,,,,,,31.5,,,,,,,,,,


In [33]:
dailyOuterCCGgrouped = dailyOuterCCGgrouped.reset_index().set_index("Date")
dailyOuterCCGgrouped.head()

Unnamed: 0_level_0,Location,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
CCG,Unnamed: 1_level_1,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2004-01-01,Oxford High St,,36.913043,,,,,,,,12.208333,,,,,,,,,,
2004-01-01,Oxford St Ebbes (Cal Club),,,,,,50.782609,,,,,,,,,,,,,,
2004-01-01,Watford Roadside,,,,25.291667,,,,38.416667,,,,,,,,,,,,
2004-01-01,Watford Town Hall,,,,,,,,,,,,21.083333,,,,,,,,
2004-01-02,Oxford High St,,59.0,,,,,,,,31.5,,,,,,,,,,


In [34]:
dailyOuterCCGgrouped = dailyOuterCCGgrouped.drop(columns = "Location").resample("D").mean()
dailyOuterCCGgrouped.head()

  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
CCG,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
2004-01-01,,36.913043,,25.291667,,50.782609,,38.416667,,12.208333,,21.083333,,,,,,,,
2004-01-02,,59.0,,36.083333,,22.666667,,17.083333,,31.5,,29.0,,,,,,,,
2004-01-03,,53.875,,54.875,,5.75,,3.416667,,27.958333,,41.416667,,,,,,,,
2004-01-04,,60.875,,38.083333,,15.583333,,11.666667,,27.708333,,35.458333,,,,,,,,
2004-01-05,51.57375,49.875,,43.625,,23.666667,,10.173913,21.533333,14.083333,,20.416667,,,,,,,,


In [35]:
dailyOuterCCGgrouped = dailyOuterCCGgrouped.stack(1, dropna=True).rename_axis(('Date', 'CCG'))
dailyOuterCCGgrouped.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide
Date,CCG,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2004-01-01,Oxford,36.913043,50.782609,12.208333,,
2004-01-01,Watford,25.291667,38.416667,21.083333,,
2004-01-02,Oxford,59.0,22.666667,31.5,,
2004-01-02,Watford,36.083333,17.083333,29.0,,
2004-01-03,Oxford,53.875,5.75,27.958333,,


monthly CCG

In [36]:
monthlyOuterCCGgrouped = monthlyOuterStacked.pivot(columns="CCG")
monthlyOuterCCGgrouped

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
Unnamed: 0_level_1,CCG,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Location,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2004-01-31,Broadway,64.905934,,,,,,,,30.345425,,,,,,,,13.844517,,,
2004-01-31,Brook Green,42.480162,,,,,,,,17.608884,,,,,,,,,,,
2004-01-31,Oxford High St,,57.440031,,,,,,,,23.602151,,,,,,,,,,
2004-01-31,Oxford St Ebbes (Cal Club),,,,,,38.675783,,,,,,,,,,,,,,
2004-01-31,Watford Roadside,,,,39.654776,,,,28.162948,,,,,,,,,,,,
2004-01-31,Watford Town Hall,,,,,,,,,,,,21.575339,,,,,,,,
2004-02-29,Broadway,78.543240,,,,,,,,30.822677,,,,,,,,13.715188,,,
2004-02-29,Brook Green,43.149878,,,,,,,,25.101131,,,,,,,,,,,
2004-02-29,Oxford High St,,67.152349,,,,,,,,35.455068,,,,,,,,,,
2004-02-29,Oxford St Ebbes (Cal Club),,,,,,37.974138,,,,,,,,,,,,,,


In [37]:
monthlyOuterCCGgrouped = monthlyOuterCCGgrouped.reset_index().set_index("Date")
monthlyOuterCCGgrouped

Unnamed: 0_level_0,Location,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
CCG,Unnamed: 1_level_1,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2004-01-31,Broadway,64.905934,,,,,,,,30.345425,,,,,,,,13.844517,,,
2004-01-31,Brook Green,42.480162,,,,,,,,17.608884,,,,,,,,,,,
2004-01-31,Oxford High St,,57.440031,,,,,,,,23.602151,,,,,,,,,,
2004-01-31,Oxford St Ebbes (Cal Club),,,,,,38.675783,,,,,,,,,,,,,,
2004-01-31,Watford Roadside,,,,39.654776,,,,28.162948,,,,,,,,,,,,
2004-01-31,Watford Town Hall,,,,,,,,,,,,21.575339,,,,,,,,
2004-02-29,Broadway,78.543240,,,,,,,,30.822677,,,,,,,,13.715188,,,
2004-02-29,Brook Green,43.149878,,,,,,,,25.101131,,,,,,,,,,,
2004-02-29,Oxford High St,,67.152349,,,,,,,,35.455068,,,,,,,,,,
2004-02-29,Oxford St Ebbes (Cal Club),,,,,,37.974138,,,,,,,,,,,,,,


In [38]:
monthlyOuterCCGgrouped = monthlyOuterCCGgrouped.drop(columns = "Location").resample("M").mean()
monthlyOuterCCGgrouped

Unnamed: 0_level_0,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Nitrogen dioxide,Ozone,Ozone,Ozone,Ozone,PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,Sulphur dioxide,Sulphur dioxide,Sulphur dioxide
CCG,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford,Hammersmith and Fulham,Oxford,Reading,Watford
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
2004-01-31,53.693048,57.440031,,39.654776,,38.675783,,28.162948,23.977154,23.602151,,21.575339,,,,,13.844517,,,
2004-02-29,60.846559,67.152349,,49.100280,,37.974138,,37.662430,27.961904,35.455068,,29.661726,,,,,13.715188,,,
2004-03-31,67.916736,68.916231,,45.520278,,45.091164,,43.936554,34.417710,33.135753,,33.665160,,,,,13.255275,,,
2004-04-30,57.906157,69.260732,,44.210266,,51.538889,,50.645070,21.707278,27.215217,,25.997283,,,,,,,,
2004-05-31,38.657640,69.631162,,42.678062,,46.887097,,44.713184,24.848865,33.325521,,26.009759,,,,,,,,
2004-06-30,26.030984,58.260200,,29.673527,,47.200505,,47.137071,20.278331,22.622094,,16.887154,,,,,,,,
2004-07-31,42.697370,59.093720,,32.682598,,44.371736,,41.682796,24.126996,24.283719,,17.913721,,,,,5.205597,,,
2004-08-31,66.622940,55.775995,,30.210963,,49.487538,,47.688523,27.410662,26.044990,,23.123045,,,,,6.269515,,,
2004-09-30,50.752767,59.060402,,30.098068,,40.230808,,39.487681,26.540666,29.149494,,21.870894,,,,,4.022235,,,
2004-10-31,60.976081,56.159865,,32.935676,,34.350631,,30.039800,23.763948,23.334327,,18.428047,,,,,2.816355,,,


In [39]:
monthlyOuterCCGgrouped = monthlyOuterCCGgrouped.stack(1, dropna=True).rename_axis(('Date', 'CCG'))
monthlyOuterCCGgrouped

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide
Date,CCG,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2004-01-31,Hammersmith and Fulham,53.693048,,23.977154,,13.844517
2004-01-31,Oxford,57.440031,38.675783,23.602151,,
2004-01-31,Watford,39.654776,28.162948,21.575339,,
2004-02-29,Hammersmith and Fulham,60.846559,,27.961904,,13.715188
2004-02-29,Oxford,67.152349,37.974138,35.455068,,
2004-02-29,Watford,49.100280,37.662430,29.661726,,
2004-03-31,Hammersmith and Fulham,67.916736,,34.417710,,13.255275
2004-03-31,Oxford,68.916231,45.091164,33.135753,,
2004-03-31,Watford,45.520278,43.936554,33.665160,,
2004-04-30,Hammersmith and Fulham,57.906157,,21.707278,,


yearly CCG

In [40]:
yearlyOuterCCGgrouped = yearlyOuterStacked.pivot(columns="CCG")
yearlyOuterCCGgrouped = yearlyOuterCCGgrouped.reset_index().set_index("Date").drop(columns = "Location").resample("Y").mean().stack(1, dropna=True).rename_axis(('Date', 'CCG'))
yearlyOuterCCGgrouped.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide
Date,CCG,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2004-12-31,Hammersmith and Fulham,58.655707,,26.158435,,9.171105
2004-12-31,Oxford,63.545897,40.376279,27.789444,,
2004-12-31,Watford,38.836547,37.226716,24.200917,,
2005-12-31,Hammersmith and Fulham,52.325337,,30.948655,,8.271812
2005-12-31,Oxford,49.362863,39.652825,25.594434,,


format date index

In [41]:
# monthlyOuterStacked.index = [str(x)[0:7] for x in monthlyOuterStacked.index]
# yearlyOuterStacked.index = [str(x)[0:4] for x in yearlyOuterStacked.index]

# monthlyOuterCCGgrouped.index = [str(x)[0:7] for x in monthlyOuterCCGgrouped.index]
# yearlyOuterCCGgrouped.index = [str(x)[0:4] for x in yearlyOuterCCGgrouped.index]

## Add coordinates

In [42]:
yearlyOuterStacked.index.get_level_values(1).unique()

Index(['Broadway', 'Brook Green', 'Oxford High St',
       'Oxford St Ebbes (Cal Club)', 'Watford Roadside', 'Watford Town Hall',
       'Scrubs Lane', 'Reading Caversham Road', 'Reading Kings Road',
       'Reading Oxford Road', 'H&F Shepherd's Bush',
       'H&F Hammersmith Town Centre'],
      dtype='object', name='Location')

In [43]:
def lat(i):
    if i == 'Broadway':
        return "51.492766"
    elif i == 'Brook Green':
        return "51.496496"
    elif i == 'Oxford High St':
        return "51.752527"
    elif i == 'Oxford St Ebbes (Cal Club)':
        return "51.744856"
    elif i == 'Watford Roadside':
        return "51.658889"
    elif i == 'Watford Town Hall':
        return "51.659200"
    elif i == 'Scrubs Lane':
        return "51.530890"
    elif i == 'Reading Caversham Road':
        return "51.464355"
    elif i == 'Reading Kings Road':
        return "51.453116"
    elif i == 'Reading Oxford Road':
        return "51.461957"
    elif i == "H&F Shepherd's Bush":
        return "51.504563"
    elif i == 'H&F Hammersmith Town Centre':
        return "51.492695"
    return "Other"

In [44]:
def long(i):
    if i == 'Broadway':
        return "-0.223601"
    elif i == 'Brook Green':
        return "-0.220503"
    elif i == 'Oxford High St':
        return "-1.250939"
    elif i == 'Oxford St Ebbes (Cal Club)':
        return "-1.260338"
    elif i == 'Watford Roadside':
        return "-0.403333"
    elif i == 'Watford Town Hall':
        return "-0.402863"
    elif i == 'Scrubs Lane':
        return "-0.237011"
    elif i == 'Reading Caversham Road':
        return "-0.977094"
    elif i == 'Reading Kings Road':
        return "-0.949926"
    elif i == 'Reading Oxford Road':
        return "-1.012459"
    elif i == "H&F Shepherd's Bush":
        return "-0.224670"
    elif i == 'H&F Hammersmith Town Centre':
        return "-0.224787"
    return "Other"

In [45]:
yearlyOuterStacked["Lat"] = [lat(i) for i in yearlyOuterStacked.index.get_level_values(1)]
yearlyOuterStacked["Long"] = [long(i) for i in yearlyOuterStacked.index.get_level_values(1)]

In [46]:
yearlyOuterStacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Nitrogen dioxide,Ozone,PM10 particulate matter (Hourly measured),PM2.5 particulate matter (Hourly measured),Sulphur dioxide,CCG,Lat,Long
Date,Location,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2004-12-31,Broadway,76.864912,,29.886028,,9.171105,Hammersmith and Fulham,51.492766,-0.223601
2004-12-31,Brook Green,40.446501,,22.430841,,,Hammersmith and Fulham,51.496496,-0.220503
2004-12-31,Oxford High St,63.545897,,27.789444,,,Oxford,51.752527,-1.250939
2004-12-31,Oxford St Ebbes (Cal Club),,40.376279,,,,Oxford,51.744856,-1.260338
2004-12-31,Watford Roadside,38.836547,37.226716,,,,Watford,51.658889,-0.403333
2004-12-31,Watford Town Hall,,,24.200917,,,Watford,51.659200,-0.402863
2005-12-31,Broadway,73.645854,,32.511715,,8.271812,Hammersmith and Fulham,51.492766,-0.223601
2005-12-31,Brook Green,39.992881,,23.615790,,,Hammersmith and Fulham,51.496496,-0.220503
2005-12-31,Oxford High St,49.362863,,25.594434,,,Oxford,51.752527,-1.250939
2005-12-31,Oxford St Ebbes (Cal Club),,39.652825,,,,Oxford,51.744856,-1.260338


save files

In [47]:
dailyOuterStacked.to_csv("dailyOuterStacked.csv")
monthlyOuterStacked.to_csv("monthlyOuterStacked.csv")
yearlyOuterStacked.to_csv("yearlyOuterStacked.csv")

dailyOuterCCGgrouped.to_csv("dailyOuterCCGgrouped.csv")
monthlyOuterCCGgrouped.to_csv("monthlyOuterCCGgrouped.csv")
yearlyOuterCCGgrouped.to_csv("yearlyOuterCCGgrouped.csv")

# # Graphs

In [48]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
% matplotlib inline

UsageError: Line magic function `%` not found.


In [None]:
for l in yearlyOuterStacked.index.get_level_values(1).unique():
    plt.figure(figsize=(10, 6))
    sns.lineplot(y = 'Indicator Value (R µg/m3)' ,
                 x ='Date',
                 hue = 'Pollutant',
                 data = yearlyOuterStacked).set_title(l)
    plt.xticks(rotation=90)
    plt.show()

In [None]:
#sns.lineplot(data=df, x='x', y='y', hue='color')