# Historical emmission from Climatewatch

They are a secondary source of info, currently downloaded a csv with all their data but they also have an API.


In [2]:
import pandas as pd
from matplotlib import pyplot as plt
import requests
import os
import numpy as np

In [3]:
# Go to repository root directory
if "_changed_dir" not in locals():
    os.chdir("../")
    _changed_dir = True

Data already downloaded it's in file:
`Covid_Hackathon_2\data\raw\historical_emissions\historical_emissions.csv`

In addition the following file provides information about the sources of data
`Covid_Hackathon_2\data\raw\historical_emissions\Sources.csv`


In [6]:
file_path = 'data/raw/historical_emissions/historical_emissions.csv'

In [8]:
ghg_all = pd.read_csv(file_path)

## Dataset features

This dataset is a bit tricky because it's so comprehensive.

First let's look at the available columns:

In [11]:
ghg_all.columns[:6]

Index(['Country', 'Data source', 'Sector', 'Gas', 'Unit', '2018'], dtype='object')

Let's define `id_columns` for all those containing info about a data set.

In [12]:
id_columns = ghg_all.columns[:5]

In [30]:
def print_df_info(ghg_all, id_columns, limit_print_len = 12):
    for col in id_columns:
        u_id = ghg_all[col].unique()
        print(f"'{col}' has {len(u_id)} unique items:")
        if len(u_id) < limit_print_len:
            print(f"\t{u_id}")
        else:
            print(f"\t{u_id[:limit_print_len]} .....")
        print("_________")

In [33]:
print_df_info(ghg_all, id_columns)

'Country' has 217 unique items:
	['Afghanistan' 'Albania' 'Algeria'
 'Alliance of Small Island States (AOSIS)' 'Andorra' 'Angola' 'Anguilla'
 'Annex-I Parties to the Convention' 'Antarctica' 'Antigua and Barbuda'
 'Argentina' 'Armenia'] .....
_________
'Data source' has 5 unique items:
	['PIK' 'UNFCCC_NAI' 'CAIT' 'GCP' 'UNFCCC_AI']
_________
'Sector' has 30 unique items:
	['Industrial Processes and Product Use'
 'Total GHG emissions including LULUCF/LUCF' 'Industrial Processes'
 'Agriculture' 'Solvent and Other Product Use' 'Other Fuel Combustion'
 'Waste' 'Total GHG emissions excluding LULUCF/LUCF'
 'Total excluding LULUCF' 'Oil' 'Energy' 'Fugitive Emissions'] .....
_________
'Gas' has 8 unique items:
	['CO2' 'Aggregate GHGs' 'KYOTOGHG' 'N2O' 'All GHG' 'CH4' 'F-Gas'
 'Aggregate F-gases']
_________
'Unit' has 1 unique items:
	['MtCO₂e']
_________


In [32]:
ghg_all.Gas[ghg_all.Gas == 'CO₂'] = 'CO2'
ghg_all.Gas[ghg_all.Gas == 'N₂O'] = 'N2O'
ghg_all.Gas[ghg_all.Gas == 'CH₄'] = 'CH4'

## Grouping

This data set makes more sense grouped by Sector and by Gas, tangible distinctions.

In [44]:
for i, group in ghg_all[ghg_all["Data source"] == 'UNFCCC_NAI'].groupby(by=["Sector", "Gas"]): 
    print(i)
    print_df_info(group, ["Country"], limit_print_len = 10)

('Agriculture', 'Aggregate F-gases')
'Country' has 43 unique items:
	['Annex-I Parties to the Convention' 'Australia' 'Austria' 'Belarus'
 'Belgium' 'Bulgaria' 'Canada' 'Croatia' 'Cyprus' 'Czech Republic'] .....
_________
('Agriculture', 'Aggregate GHGs')
'Country' has 44 unique items:
	['Annex-I Parties to the Convention' 'Australia' 'Austria' 'Belarus'
 'Belgium' 'Bulgaria' 'Canada' 'Croatia' 'Cyprus' 'Czech Republic'] .....
_________
('Agriculture', 'CH4')
'Country' has 44 unique items:
	['Annex-I Parties to the Convention' 'Australia' 'Austria' 'Belarus'
 'Belgium' 'Bulgaria' 'Canada' 'Croatia' 'Cyprus' 'Czech Republic'] .....
_________
('Agriculture', 'CO2')
'Country' has 44 unique items:
	['Annex-I Parties to the Convention' 'Australia' 'Austria' 'Belarus'
 'Belgium' 'Bulgaria' 'Canada' 'Croatia' 'Cyprus' 'Czech Republic'] .....
_________
('Agriculture', 'N2O')
'Country' has 44 unique items:
	['Annex-I Parties to the Convention' 'Australia' 'Austria' 'Belarus'
 'Belgium' 'Bulgari

### Looking at the UNFCCC data

This is official data from the UNFCCC

In [132]:
ghg_UNFCCC = ghg_all[
    (ghg_all["Data source"] == 'UNFCCC_NAI') 
    | (ghg_all["Data source"] == 'UNFCCC_AI')
]
ghg_agg_UNFCCC = ghg_UNFCCC[ghg_UNFCCC.Gas == 'Aggregate GHGs']

In [61]:
for group_id, group in ghg_agg_UNFCCC.groupby(["Sector"]):
    print(f"group: {group_id}")
    print_df_info(group, ["Country", "Data source"], limit_print_len = 0)

group: Agriculture
'Country' has 192 unique items:
	[] .....
_________
'Data source' has 2 unique items:
	[] .....
_________
group: Energy
'Country' has 192 unique items:
	[] .....
_________
'Data source' has 2 unique items:
	[] .....
_________
group: Industrial Processes
'Country' has 148 unique items:
	[] .....
_________
'Data source' has 1 unique items:
	[] .....
_________
group: Industrial Processes and Product Use
'Country' has 44 unique items:
	[] .....
_________
'Data source' has 1 unique items:
	[] .....
_________
group: Land Use, Land-Use Change and Forestry
'Country' has 44 unique items:
	[] .....
_________
'Data source' has 1 unique items:
	[] .....
_________
group: Land-Use Change and Forestry
'Country' has 148 unique items:
	[] .....
_________
'Data source' has 1 unique items:
	[] .....
_________
group: Other
'Country' has 192 unique items:
	[] .....
_________
'Data source' has 2 unique items:
	[] .....
_________
group: Solvent and Other Product Use
'Country' has 148 uniqu

The grouping above reveals that Sector names do not exactly match between the two data sources:

In [133]:
ghg_agg_UNFCCC.Sector[ghg_agg_UNFCCC.Sector == 'Industrial Processes and Product Use'] = 'Industrial Processes'
ghg_agg_UNFCCC.Sector[ghg_agg_UNFCCC.Sector == 'Land Use, Land-Use Change and Forestry'] = 'Land-Use Change and Forestry'
ghg_agg_UNFCCC.Sector[ghg_agg_UNFCCC.Sector == 'Total GHG emissions including LULUCF/LUCF'] = 'Total GHG emissions with LULUCF'
ghg_agg_UNFCCC.Sector[ghg_agg_UNFCCC.Sector == 'Total GHG emissions excluding LULUCF/LUCF'] = 'Total GHG emissions without LULUCF'

In [66]:
for group_id, group in ghg_agg_UNFCCC.groupby(["Sector"]):
    print(group_id)
    print_df_info(group, ["Country"], limit_print_len = 0)

Agriculture
'Country' has 192 unique items:
	[] .....
_________
Energy
'Country' has 192 unique items:
	[] .....
_________
Industrial Processes
'Country' has 192 unique items:
	[] .....
_________
Land-Use Change and Forestry
'Country' has 192 unique items:
	[] .....
_________
Other
'Country' has 192 unique items:
	[] .....
_________
Solvent and Other Product Use
'Country' has 148 unique items:
	[] .....
_________
Total GHG emissions with LULUCF
'Country' has 192 unique items:
	[] .....
_________
Total GHG emissions without LULUCF
'Country' has 190 unique items:
	[] .....
_________
Waste
'Country' has 192 unique items:
	[] .....
_________


## Data rearranging

From this wide dataset let us pivot to have sectors as columns

In [134]:
ghg_agg_UNFCCC.columns

Index(['Country', 'Data source', 'Sector', 'Gas', 'Unit', '2018', '2017',
       '2016', '2015', '2014',
       ...
       '1859', '1858', '1857', '1856', '1855', '1854', '1853', '1852', '1851',
       '1850'],
      dtype='object', length=174)

In [135]:
id_vars = ghg_agg_UNFCCC.columns[:5]
date_vars = ghg_agg_UNFCCC.columns[5:]
print(f"new columns {id_vars}")

new columns Index(['Country', 'Data source', 'Sector', 'Gas', 'Unit'], dtype='object')


In [138]:
def pivot_dates_Sector(ghg_agg_UNFCCC, id_vars, date_vars):
    ghg_agg_UNFCCC = ghg_agg_UNFCCC.melt(id_vars=id_vars, value_vars=date_vars, var_name="date", value_name='emissions')

    dat_columns = ghg_agg_UNFCCC.Sector.unique()
    id_columns = [c for c in ghg_agg_UNFCCC.columns if c not in  ["emissions", "Sector"]]
    ghg_agg_UNFCCC["date"] = pd.to_datetime(ghg_agg_UNFCCC.date)
    ghg_agg_UNFCCC = pd.concat(
        [
            ghg_agg_UNFCCC[id_columns],
            ghg_agg_UNFCCC.pivot(columns="Sector", values="emissions")
        ],
        axis=1, sort=False
    )
    ghg_agg_UNFCCC_proc = ghg_agg_UNFCCC.groupby(id_columns, as_index=False).first()

In [139]:
pivot_dates_Sector(ghg_agg_UNFCCC, id_vars, date_vars)

In [141]:
ghg_agg_UNFCCC.sample(10)

Unnamed: 0,Country,Data source,Sector,Gas,Unit,2018,2017,2016,2015,2014,...,1859,1858,1857,1856,1855,1854,1853,1852,1851,1850
6500,Ethiopia,UNFCCC_NAI,Energy,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
6168,Eritrea,UNFCCC_NAI,Solvent and Other Product Use,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
5158,Democratic Republic of the Congo,UNFCCC_NAI,Land-Use Change and Forestry,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
14292,Nigeria,UNFCCC_NAI,Agriculture,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
4515,Costa Rica,UNFCCC_NAI,Energy,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
21883,Zimbabwe,UNFCCC_NAI,Energy,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
16062,Republic of Congo,UNFCCC_NAI,Total GHG emissions with LULUCF,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
17491,Sierra Leone,UNFCCC_NAI,Other,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
20394,Uganda,UNFCCC_NAI,Agriculture,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
4353,Comoros,UNFCCC_NAI,Solvent and Other Product Use,Aggregate GHGs,MtCO₂e,,,,,,...,,,,,,,,,,
