# Gathering Emission Data

As with our *Gathering Weather Data* notebook, this notebook will exclusively focus on collecting the relevant emissions data for a given latitude and longitude. Unlike our other notebook, however, we will not be using an API to collect this data, but will use a data set collected from <a href='https://edgar.jrc.ec.europa.eu/index.php/dataset_ghg60#p2'>EDGAR</a> (Emissions Database for Global Atmospheric Research) which we have aggregated into large CSV files for each of the greenhouse gases being measured; using the data provided by EDGAR, we are able to collect emissions data for $CO_2$ (carbon dioxide), $CH_4$ (methane), and $N_2O$ (nitrous oxide). 

To begin gathering the emission data, we need to understand what our objective is. The data for the emissions is gridded, meaning that the emissions for a specific longitude and latitude along a $0.1^{\circ} \times 0.1^{\circ}$ are measured. Our wildfires have a specific longitude and latitude too, meaning that we are able to find the nearest *node* from the emissions data to be able to derive an approximation for the wildfire latitude and longitude. Our task therefore will be to find the closest node to the latitude and longitude of the wildfire.

---

### Imports

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Custom imports
import emissions_utils
from importlib import reload
reload(emissions_utils);

---

### Loading Wildfire Sample and CH4 Data

In [3]:
ch4_data = pd.read_csv('data/EDGAR Emissions/ch4.csv')
wildfires = pd.read_pickle('data/30k_samples_with_weather.pkl')

In [4]:
ch4_data.head()

Unnamed: 0.1,Unnamed: 0,lat,lon,emi_ch4,year
0,12703868,82.849998,306.850006,0.0,1992
1,11054241,37.049999,224.149994,1.641249e-14,1992
2,11054240,37.049999,224.050003,1.154963e-14,1992
3,11054239,37.049999,223.949997,1.350558e-14,1992
4,11054238,37.049999,223.850006,1.25344e-14,1992


In [5]:
wildfires.head(2)

Unnamed: 0,index,DATE,FIRE_YEAR,DISCOVERY_DOY,FIRE_SIZE,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,STATE,tempmax,...,precip,avg_precip,dew,avg_dew,windspeed,avg_windspeed,winddir,avg_winddir,pressure,avg_pressure
0,46,1992-01-01,1992,1,0.1,A,43.325,-101.0185,SD,"[6.7, 6.7, 1.7, 7.2, 8.4, 1.7, 4.4]",...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",0.0,"[-6.9, -8.7, -2.4, -5.0, -8.6, -7.5, -3.4]",-6.071429,"[13.0, 22.3, 31.7, 20.5, 11.2, 18.7, 11.2]",18.371429,"[295.9, 179.8, 186.4, 218.7, 263.2, 173.2, 247.3]",223.5,"[1026.9, 1029.2, 1017.7, 1012.2, 1019.1, 1024....",1021.671429
1,0,1992-01-01,1992,1,3.0,B,33.0634,-90.120813,MS,"[16.7, 13.4, 10.7, 11.7, 12.2, 10.7, 15.7]",...,"[nan, nan, nan, nan, nan, nan, nan]",,"[0.2, 6.4, 7.9, 2.3, 3.2, 2.2, 4.2]",3.771429,"[14.8, 18.4, 14.8, 14.8, 18.4, 22.3, 11.2]",16.385714,"[25.9, 46.9, 303.3, 10.4, 29.3, 36.9, 70.2]",74.7,"[1023.4, 1023.0, 1019.5, 1020.8, 1024.5, 1027....",1023.442857


In [6]:
# Get unique latitutde, longitude combinations
emissions_latlon = ch4_data[ch4_data['year'] == 1992][['lat', 'lon']]

In [7]:
# Look at the data
emissions_latlon.head()

Unnamed: 0,lat,lon
0,82.849998,306.850006
1,37.049999,224.149994
2,37.049999,224.050003
3,37.049999,223.949997
4,37.049999,223.850006


Check the format of the wildfires latitude and longitude values, as these could have different ranges to the latitude and longitude values of the emissions data. Latitude values can be between -90 and 90, whilst longitude values can be between -180 and 180.

In [8]:
wildfires[['LATITUDE', 'LONGITUDE']].describe()

Unnamed: 0,LATITUDE,LONGITUDE
count,30000.0,30000.0
mean,36.762301,-95.846038
std,6.110786,16.756508
min,17.956533,-166.1527
25%,32.816525,-110.44717
50%,35.437674,-92.350744
75%,40.738334,-82.36129
max,67.9833,-65.32


We see that the latitude and longitude values fall within the ranges mentioned above.

In [9]:
emissions_latlon.describe()

Unnamed: 0,lat,lon
count,860000.0,860000.0
mean,48.5,244.4
std,19.86084,36.084401
min,14.15,181.949997
25%,31.325,213.149994
50%,48.5,244.400002
75%,65.675001,275.649994
max,82.849998,306.850006


The latitude falls within the range, however the longitude value ranges from 181 to 306, meaning that we need to subtract 360 fromt these values to get longitude values that coincide with our wildfire data.

In [10]:
emissions_latlon['lon'] = emissions_latlon['lon'].apply(lambda x: x - 360)

In [11]:
emissions_latlon.describe()

Unnamed: 0,lat,lon
count,860000.0,860000.0
mean,48.5,-115.6
std,19.86084,36.084401
min,14.15,-178.050003
25%,31.325,-146.850006
50%,48.5,-115.599998
75%,65.675001,-84.350006
max,82.849998,-53.149994


In [12]:
# Create numpy array of the values
lat_lon = np.array(emissions_latlon)

In [13]:
# Convert the longitude columns of the ch4_data DataFrame
ch4_data['lon'] = ch4_data['lon'].apply(lambda x: x - 360)

In [14]:
ch4_data[['lon']].describe()

Unnamed: 0,lon
count,20640000.0
mean,-115.6
std,36.08438
min,-178.05
25%,-146.85
50%,-115.6
75%,-84.35001
max,-53.14999


In [15]:
# Convert the ch4 into a numpy array
ch4_array = np.array(ch4_data.iloc[:,1:])

In [17]:
# Create list for the emission
ch4_emissions = []

# Iterate through the rows in the wildfires data frame
for _, row in wildfires.iterrows():
    # Get the coordinates of the row
    coordinates = np.array([row['LATITUDE'], row['LONGITUDE']])
    lat, lon = emissions_utils.GetBestLatLon(coordinates, lat_lon)
    index = np.where((ch4_array[:,0] == lat) & (ch4_array[:,1] == lon) & (ch4_array[:,-1] == row['FIRE_YEAR']))
    ch4_emissions.append(ch4_array[index][0][2])

In [18]:
wildfires_with_emissions = wildfires.copy()
wildfires_with_emissions['ch4'] = ch4_emissions

In [19]:
wildfires_with_emissions.head()

Unnamed: 0,index,DATE,FIRE_YEAR,DISCOVERY_DOY,FIRE_SIZE,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,STATE,tempmax,...,avg_precip,dew,avg_dew,windspeed,avg_windspeed,winddir,avg_winddir,pressure,avg_pressure,ch4
0,46,1992-01-01,1992,1,0.1,A,43.325,-101.0185,SD,"[6.7, 6.7, 1.7, 7.2, 8.4, 1.7, 4.4]",...,0.0,"[-6.9, -8.7, -2.4, -5.0, -8.6, -7.5, -3.4]",-6.071429,"[13.0, 22.3, 31.7, 20.5, 11.2, 18.7, 11.2]",18.371429,"[295.9, 179.8, 186.4, 218.7, 263.2, 173.2, 247.3]",223.5,"[1026.9, 1029.2, 1017.7, 1012.2, 1019.1, 1024....",1021.671429,4.613353e-11
1,0,1992-01-01,1992,1,3.0,B,33.0634,-90.120813,MS,"[16.7, 13.4, 10.7, 11.7, 12.2, 10.7, 15.7]",...,,"[0.2, 6.4, 7.9, 2.3, 3.2, 2.2, 4.2]",3.771429,"[14.8, 18.4, 14.8, 14.8, 18.4, 22.3, 11.2]",16.385714,"[25.9, 46.9, 303.3, 10.4, 29.3, 36.9, 70.2]",74.7,"[1023.4, 1023.0, 1019.5, 1020.8, 1024.5, 1027....",1023.442857,1.679468e-11
2,36,1992-01-01,1992,1,1.0,B,33.058333,-79.979167,SC,"[10.8, 11.2, 16.0, 15.9, 13.9, 12.6, 15.5]",...,4.185714,"[5.1, 9.5, 10.2, 9.3, 4.3, 3.1, 5.2]",6.671429,"[22.4, 24.7, 21.9, 23.9, 19.6, 27.7, 24.4]",23.514286,"[43.5, 23.8, 35.0, 281.4, 335.4, 37.3, 28.6]",112.142857,"[1025.3, 1025.6, 1021.5, 1014.8, 1020.6, 1028....",1023.357143,1.795242e-11
3,132,1992-01-02,1992,2,0.25,A,40.775,-74.85416,NJ,"[7.8, 8.0, 7.5, 6.4, 2.3, 5.0, 7.7]",...,1.394286,"[-4.0, -4.8, 2.2, -3.0, -12.2, -7.9, 0.0]",-4.242857,"[25.0, 12.2, 15.5, 29.2, 14.7, 16.1, 14.4]",18.157143,"[316.5, 260.2, 4.8, 2.4, 28.0, 259.6, 61.9]",133.342857,"[1031.2, 1029.6, 1010.7, 1018.4, 1036.6, 1035....",1027.4,3.996876e-11
4,215,1992-01-03,1992,3,0.5,B,29.79,-82.37,FL,"[18.4, 18.9, 9.5, 15.1, 13.9, 22.2, 17.2]",...,0.0,"[15.8, 13.5, 5.7, 8.0, 10.8, 14.5, 10.7]",11.285714,"[20.1, 22.2, 18.4, 29.5, 22.3, 27.7, 27.7]",23.985714,"[71.4, 295.1, 330.1, 29.3, 39.6, 57.4, 300.6]",160.5,"[1018.6, 1016.6, 1021.1, 1023.5, 1022.3, 1016....",1018.357143,6.256688e-11


### Appending $CO_2$ and $N_2O$ Data

Now that we see how this has been achieved we can automate this process and append the values that we generated.

In [20]:
# Load in the different dataframes
co2_df = pd.read_csv('data/EDGAR Emissions/co2.csv')
n2o_df = pd.read_csv('data/EDGAR Emissions/n2o.csv')

In [21]:
# Drop the unnamed column
for df in [co2_df, n2o_df]:
    df.drop('Unnamed: 0', axis=1, inplace = True)

In [23]:
co2_data = emissions_utils.GetEmissions(wildfires, co2_df)

In [24]:
co2_data_df = pd.DataFrame(co2_data)
co2_data_df.head()

Unnamed: 0,emission,coordinates
0,3.672947e-09,"(43.34999847412109, -101.04998779296875)"
1,3.891437e-09,"(33.04999923706055, -90.14999389648438)"
2,8.400253e-07,"(33.04999923706055, -79.95001220703125)"
3,2.849724e-08,"(40.75, -74.85000610351562)"
4,3.58583e-07,"(29.75, -82.35000610351562)"


In [25]:
co2_data_df.isna().sum()

emission       0
coordinates    0
dtype: int64

In [27]:
n2o_data = emissions_utils.GetEmissions(wildfires, n2o_df)

In [28]:
n2o_data_df = pd.DataFrame(n2o_data)
n2o_data_df.head()

Unnamed: 0,emission,coordinates
0,3.926278e-12,"(43.34999847412109, -101.04998779296875)"
1,1.957145e-12,"(33.04999923706055, -90.14999389648438)"
2,1.743553e-11,"(33.04999923706055, -79.95001220703125)"
3,4.053962e-12,"(40.75, -74.85000610351562)"
4,1.031981e-11,"(29.75, -82.35000610351562)"


In [29]:
n2o_data_df.isna().sum()

emission       0
coordinates    0
dtype: int64

In [30]:
wildfires_with_emissions['co2'] = co2_data['emission']
wildfires_with_emissions['n2o'] = n2o_data['emission']

In [31]:
wildfires_with_emissions.head()

Unnamed: 0,index,DATE,FIRE_YEAR,DISCOVERY_DOY,FIRE_SIZE,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,STATE,tempmax,...,avg_dew,windspeed,avg_windspeed,winddir,avg_winddir,pressure,avg_pressure,ch4,co2,n2o
0,46,1992-01-01,1992,1,0.1,A,43.325,-101.0185,SD,"[6.7, 6.7, 1.7, 7.2, 8.4, 1.7, 4.4]",...,-6.071429,"[13.0, 22.3, 31.7, 20.5, 11.2, 18.7, 11.2]",18.371429,"[295.9, 179.8, 186.4, 218.7, 263.2, 173.2, 247.3]",223.5,"[1026.9, 1029.2, 1017.7, 1012.2, 1019.1, 1024....",1021.671429,4.613353e-11,3.672947e-09,3.926278e-12
1,0,1992-01-01,1992,1,3.0,B,33.0634,-90.120813,MS,"[16.7, 13.4, 10.7, 11.7, 12.2, 10.7, 15.7]",...,3.771429,"[14.8, 18.4, 14.8, 14.8, 18.4, 22.3, 11.2]",16.385714,"[25.9, 46.9, 303.3, 10.4, 29.3, 36.9, 70.2]",74.7,"[1023.4, 1023.0, 1019.5, 1020.8, 1024.5, 1027....",1023.442857,1.679468e-11,3.891437e-09,1.957145e-12
2,36,1992-01-01,1992,1,1.0,B,33.058333,-79.979167,SC,"[10.8, 11.2, 16.0, 15.9, 13.9, 12.6, 15.5]",...,6.671429,"[22.4, 24.7, 21.9, 23.9, 19.6, 27.7, 24.4]",23.514286,"[43.5, 23.8, 35.0, 281.4, 335.4, 37.3, 28.6]",112.142857,"[1025.3, 1025.6, 1021.5, 1014.8, 1020.6, 1028....",1023.357143,1.795242e-11,8.400253e-07,1.743553e-11
3,132,1992-01-02,1992,2,0.25,A,40.775,-74.85416,NJ,"[7.8, 8.0, 7.5, 6.4, 2.3, 5.0, 7.7]",...,-4.242857,"[25.0, 12.2, 15.5, 29.2, 14.7, 16.1, 14.4]",18.157143,"[316.5, 260.2, 4.8, 2.4, 28.0, 259.6, 61.9]",133.342857,"[1031.2, 1029.6, 1010.7, 1018.4, 1036.6, 1035....",1027.4,3.996876e-11,2.849724e-08,4.053962e-12
4,215,1992-01-03,1992,3,0.5,B,29.79,-82.37,FL,"[18.4, 18.9, 9.5, 15.1, 13.9, 22.2, 17.2]",...,11.285714,"[20.1, 22.2, 18.4, 29.5, 22.3, 27.7, 27.7]",23.985714,"[71.4, 295.1, 330.1, 29.3, 39.6, 57.4, 300.6]",160.5,"[1018.6, 1016.6, 1021.1, 1023.5, 1022.3, 1016....",1018.357143,6.256688e-11,3.58583e-07,1.031981e-11


In [32]:
wildfires_with_emissions.shape

(30000, 28)

Now that we have appended the gas emissions data to the wildfires DataFrame, we can save the data into a `.pkl` file which we will be able to analyse in later stages.

In [None]:
wildfires_with_emissions.to_pickle('data/30k_wildfires_weather_emissions.pkl')