# World Ressources Institute - CAIT data - compare data from several obtentions 

The WRI's ClimateWatch tool is already a portal towards the presentation of several datasets (CAIT, UNFCCC, GCP, PIK, etc..). It stores the data and delivers an API to provide a web user with subsets of the data he might ask for. Accordingly, a lot of piping is done by the ClimateWatch plateform when presenting the data. For the GHG emissions (CAIT and UNFCCC), the data can be retrieved from three manners:
- import all data: this provides a zipped folder with a lot of content. GHG data are provided in excel files. For CAIT, it is teh CW_CAIT_GHG_Emissions_31102017.xlsx file
- import GHG data only: this provides a zipped folder with GHG datasets only. For CAIT, it is the CW_CAIT_GHG_Emissions.xlsx file.
- use the WRI API to get the GHG data: this is the typical way to get the data, intended by the WRI community. The webuser uses the provided API to select the data (typically GHG, CAIT, and the period) he wants. For CAIT, we obtain a historical_emissions.csv file, renamed here historical_emissions_cait.csv

As WRI is already doing some data preprocessing, the datasets are not exactly the same according to the manner they have been retreived from the ClimatWatch tool.The objective of this notebook is to compare them for teh CAIT data, in order to notice the differences, and select the best one to be mapped. 

The ISO-alpha3 code is listed in countries_codes_and_coordinates.csv, taken from https://gist.github.com/tadast/8827699#file-countries_codes_and_coordinates-csv



## Import libraries and load datasets

In [1]:
import pandas as pd
import numpy as np
import cufflinks as cf
import plotly
import plotly.offline as py
import plotly.graph_objs as go

pd.set_option("max_columns", 30)
pd.set_option("max_rows", 200)

In [2]:
# CAIT emissions from raw data download
all_CAIT = pd.read_excel("../../../data/ghg-emissions/wri/CW_CAIT_GHG_Emissions_31102017.xlsx", header=1, 
                               engine="openpyxl", sheet_name="GHG Emissions")
allCO2_CAIT = pd.read_excel("../../../data/ghg-emissions/wri/CW_CAIT_GHG_Emissions_31102017.xlsx", header=0, 
                               engine="openpyxl", sheet_name="CO2 Total Emissions")

# CAIT emissions from ghg data download
ghg_CAIT = pd.read_excel("../../../data/ghg-emissions/wri/CW_CAIT_GHG_Emissions.xlsx", header=0)

# CAIT emissions from api data download
api_CAIT = pd.read_csv("../../../data/ghg-emissions/wri/historical_emissions_cait.csv", header=0)

In [3]:
all_CAIT.head()

Unnamed: 0,Country,Year,Total GHG Emissions Excluding Land-Use Change and Forestry (MtCO2e),Total GHG Emissions Including Land-Use Change and Forestry (MtCO₂e‍),Total CO2 (excluding Land-Use Change and Forestry) (MtCO2),Total CH4 (MtCO2e),Total N2O (MtCO2e),Total F-Gas (MtCO2e),Total CO2 (including Land-Use Change and Forestry) (MtCO2),Total CH4 (including Land-Use Change and Forestry) (MtCO2e),Total N2O (including Land-Use Change and Forestry) (MtCO2e),Energy (MtCO2e),Industrial Processes (MtCO2e),Agriculture (MtCO2e),Waste (MtCO2e),Land-Use Change and Forestry (MtCO2),Bunker Fuels (MtCO2),Electricity/Heat (MtCO2),Manufacturing/Construction (MtCO2),Transportation (MtCO2),Other Fuel Combustion (MtCO2e),Fugitive Emissions (MtCO2e)
0,Afghanistan,1990,15.212848,15.212848,2.915024,9.311589,2.984055,0.00218,2.915024,9.311589,2.984055,3.774044,0.05714,7.34271,4.038954,0.0,,,,,,1.24222
1,Afghanistan,1991,15.286439,15.286439,2.684445,9.516187,3.082194,0.003613,2.684445,9.516187,3.082194,3.376803,0.058573,7.631027,4.220036,0.0,,,,,,1.020851
2,Afghanistan,1992,14.010531,14.010531,1.392269,9.571483,3.041733,0.005046,1.392269,9.571483,3.041733,1.9143,0.06367,7.631443,4.401118,0.0,,,,,,0.602588
3,Afghanistan,1993,14.028118,14.028118,1.322704,9.609869,3.089066,0.006479,1.322704,9.609869,3.089066,1.678073,0.065103,7.702742,4.5822,0.0,,,,,,0.413993
4,Afghanistan,1994,13.985408,13.985408,1.267744,9.789039,2.920713,0.007912,1.267744,9.789039,2.920713,1.456451,0.066536,7.699139,4.763283,0.0,,,,,,0.247331


In [4]:
ghg_CAIT.head()

Unnamed: 0,Country,Source,Sector,Gas,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,AFG,CAIT,Total excluding LUCF,All GHG,15.18285,15.10201,13.63469,13.46399,13.27173,13.47605,14.43757,15.34291,16.09523,16.91458,15.07575,...,16.51581,17.39828,21.03822,24.87871,31.53765,36.95546,44.90616,58.65186,66.74928,74.79611,84.61923,93.72862,95.37284,97.30011,98.92076
1,AFG,CAIT,Total including LUCF,All GHG,12.79404,12.71321,11.24588,11.07519,10.88293,11.08725,12.04877,12.95411,13.70642,14.52578,12.68694,...,16.63772,17.52018,21.16012,25.00061,31.65955,37.07736,45.02807,58.40564,66.50306,74.54989,84.37301,93.4824,95.5275,97.45477,99.07541
2,AFG,CAIT,Energy,All GHG,5.829497,5.334624,3.760858,3.42276,3.102594,2.783429,2.651769,2.509109,2.389449,2.096789,2.01713,...,2.271723,2.759622,6.319421,10.00122,15.27802,20.14082,26.05662,39.52942,47.59223,55.52703,64.67584,74.74164,75.93291,77.71818,79.58044
3,AFG,CAIT,Industrial Processes,All GHG,0.051879,0.0545,0.060111,0.062722,0.065343,0.067964,0.081694,0.095434,0.109174,0.122915,0.109985,...,0.142366,0.142436,0.16246,0.176814,0.202657,0.222971,0.248895,0.313896,0.378967,0.449909,0.53463,0.592081,0.758807,0.911544,1.06428
4,AFG,CAIT,Agriculture,All GHG,8.072853,8.396465,8.409491,8.48648,8.523959,8.957016,9.977472,10.95273,11.75197,12.79124,10.986,...,11.76986,12.07205,12.05455,12.12127,13.39995,13.85702,15.78838,15.90226,15.77779,15.72486,16.22045,15.11257,15.31574,15.22195,14.74454


In [5]:
api_CAIT.head()

Unnamed: 0,Country,Data source,Sector,Gas,Unit,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,...,2004,2003,2002,2001,2000,1999,1998,1997,1996,1995,1994,1993,1992,1991,1990
0,World,CAIT,Total including LUCF,All GHG,MtCO₂e,48939.71,47990.47,47413.95,46760.47,46647.29,46047.13,45427.61,44891.4,44758.58,43029.01,...,39437.14,37618.01,36727.84,35701.77,35607.73,34948.94,34929.19,35387.89,34068.91,33703.46,32977.47,32766.49,32670.32,32813.46,32645.91
1,China,CAIT,Total including LUCF,All GHG,MtCO₂e,11705.81,11408.26,11207.66,11149.68,11155.76,11144.76,10690.04,10364.83,9872.37,9046.19,...,6135.95,5386.89,4769.03,4459.91,4249.7,4053.37,4103.23,3963.61,3954.75,3918.3,3528.81,3377.79,3154.54,3023.38,2873.71
2,United States,CAIT,Total including LUCF,All GHG,MtCO₂e,5794.35,5613.64,5676.92,5586.69,5711.15,5690.79,5550.87,5796.78,6041.59,5793.66,...,6387.22,6306.12,6246.95,6406.04,6446.2,6291.25,6293.8,6258.47,6010.62,5843.9,5782.2,5691.26,5583.8,5501.92,5543.47
3,India,CAIT,Total including LUCF,All GHG,MtCO₂e,3346.63,3202.82,3073.24,3002.17,2988.34,2816.49,2758.53,2610.32,2576.93,2467.78,...,1905.13,1817.42,1769.79,1747.49,1498.2,1460.24,1382.26,1348.09,1289.63,1240.44,1174.78,1128.44,1096.71,1064.49,1009.44
4,European Union (27),CAIT,Total including LUCF,All GHG,MtCO₂e,3333.16,3401.95,3387.97,3046.38,2990.49,3136.34,3213.21,3274.4,3646.5,3566.88,...,4005.19,4008.81,3921.02,3940.21,3933.88,3934.85,4008.29,4048.58,4126.4,4018.79,3968.32,3984.95,4058.11,4202.13,4279.18


Data is not stored according to the same parameters + country are not referenced accordingly + the years extend to 2018 when obtained from api and ghg + precision is smaller for API + only total including LUCF has been obtained from the API. 

In [6]:
print(sorted(set(api_CAIT["Country"].unique()) - set(all_CAIT["Country"].unique()) ))
print()
print(set(all_CAIT["Country"].unique()) - set(api_CAIT["Country"].unique()) )

['Antigua and Barbuda', 'Bosnia and Herzegovina', "Côte d'Ivoire", 'Democratic Republic of the Congo', 'Eswatini', 'European Union (27)', 'Macedonia', 'North Korea', 'Republic of Congo', 'Russia', 'Saint Kitts and Nevis', 'Saint Vincent and the Grenadines', 'Sao Tome and Principe', 'South Korea', 'South Sudan', 'Timor-Leste', 'Trinidad and Tobago', 'United States']

{'European Union (28)', 'Trinidad & Tobago', 'Korea (North)', 'Saint Kitts & Nevis', 'Bosnia & Herzegovina', 'Macedonia, FYR', 'Congo, Dem. Republic', 'Congo', "Cote d'Ivoire", 'Sao Tome & Principe', 'Swaziland', 'United States of America', 'Saint Vincent & Grenadines', 'Russian Federation', 'Korea (South)', 'Antigua & Barbuda'}


In [7]:
#dictionnary of variables 
dict_corr = {
    'Antigua and Barbuda': 'Antigua & Barbuda', 
    'Bosnia and Herzegovina': 'Bosnia & Herzegovina',
    "Côte d'Ivoire": "Cote d'Ivoire", 
    'Democratic Republic of the Congo': 'Congo, Dem. Republic', 
    'Eswatini': 'Swaziland', 
    'Macedonia': 'Macedonia, FYR', 
    'North Korea': 'Korea (North)', 
    'Republic of Congo': 'Congo',
    'Russia': 'Russian Federation',
    'Saint Kitts and Nevis': 'Saint Kitts & Nevis', 
    'Saint Vincent and the Grenadines': 'Saint Vincent & Grenadines', 
    'Sao Tome and Principe':  'Sao Tome & Principe', 
    'South Korea': 'Korea (South)', 
    'Trinidad and Tobago': 'Trinidad & Tobago', 
    'United States': 'United States of America' 
}

#missing in all, present in api: South Sudan, EU(28), Timor Leste
#missing in api, present in all: EU27 ()

In [8]:
ghg_CAIT["Country"].unique()

array(['AFG', 'AGO', 'ALB', 'AND', 'ARE', 'ARG', 'ARM', 'ATG', 'AUS',
       'AUT', 'AZE', 'BDI', 'BEL', 'BEN', 'BFA', 'BGD', 'BGR', 'BHR',
       'BHS', 'BIH', 'BLR', 'BLZ', 'BOL', 'BRA', 'BRB', 'BRN', 'BTN',
       'BWA', 'CAF', 'CAN', 'CHE', 'CHL', 'CHN', 'CIV', 'CMR', 'COD',
       'COG', 'COK', 'COL', 'COM', 'CPV', 'CRI', 'CUB', 'CYP', 'CZE',
       'DEU', 'DJI', 'DMA', 'DNK', 'DOM', 'DZA', 'ECU', 'EGY', 'ERI',
       'ESP', 'EST', 'ETH', 'EUU', 'FIN', 'FJI', 'FRA', 'FSM', 'GAB',
       'GBR', 'GEO', 'GHA', 'GIN', 'GMB', 'GNB', 'GNQ', 'GRC', 'GRD',
       'GTM', 'GUY', 'HND', 'HRV', 'HTI', 'HUN', 'IDN', 'IND', 'IRL',
       'IRN', 'IRQ', 'ISL', 'ISR', 'ITA', 'JAM', 'JOR', 'JPN', 'KAZ',
       'KEN', 'KGZ', 'KHM', 'KIR', 'KNA', 'KOR', 'KWT', 'LAO', 'LBN',
       'LBR', 'LBY', 'LCA', 'LIE', 'LKA', 'LSO', 'LTU', 'LUX', 'LVA',
       'MAR', 'MDA', 'MDG', 'MDV', 'MEX', 'MHL', 'MKD', 'MLI', 'MLT',
       'MMR', 'MNE', 'MNG', 'MOZ', 'MRT', 'MUS', 'MWI', 'MYS', 'NAM',
       'NER', 'NGA',

In [9]:
#Let's just convert everything into the ALPHA ISO-3 data 

#import correspondance tabular
corr_iso_country = pd.read_csv("countries_codes_and_coordinates.csv", sep=';', skipinitialspace=True)
corr_iso_country

Unnamed: 0,Country,Alpha-2 code,Alpha-3 code,Numeric code,Latitude (average),Longitude (average)
0,Afghanistan,AF,AFG,4,33.0000,65.0
1,Albania,AL,ALB,8,41.0000,20.0
2,Algeria,DZ,DZA,12,28.0000,3.0
3,American Samoa,AS,ASM,16,-14.3333,-170.0
4,Andorra,AD,AND,20,42.5000,1.6
...,...,...,...,...,...,...
251,Wallis and Futuna,WF,WLF,876,-13.3000,-176.2
252,Western Sahara,EH,ESH,732,24.5000,-13.0
253,Yemen,YE,YEM,887,15.0000,48.0
254,Zambia,ZM,ZMB,894,-15.0000,30.0


In [10]:
#difference between ghg_CAIT and this table 
print(set(ghg_CAIT["Country"].unique()) - set(corr_iso_country["Alpha-3 code"].unique()))
print(set(corr_iso_country["Alpha-3 code"].unique()) - set(ghg_CAIT["Country"].unique()) )

#ok, it's almost enough


{'EUU', 'WORLD'}
{'REU', 'WLF', 'TKL', 'AIA', 'CYM', 'SGS', 'SMR', 'JEY', 'SHN', 'VIR', 'MNP', 'BVT', 'MCO', 'PYF', 'MTQ', 'ESH', 'MYT', 'CXR', 'GLP', 'GIB', 'MSR', 'ATA', 'CCK', 'PRI', 'ASM', 'NFK', 'TCA', 'ATF', 'ANT', 'PCN', 'BMU', 'HMD', 'IMN', 'PSE', 'MAC', 'HKG', 'GRL', 'TWN', 'SPM', 'UMI', 'VAT', 'FLK', 'IOT', 'GGY', 'SJM', 'GUF', 'GUM', 'ABW', 'FRO', 'VGB', 'NCL'}


In [11]:
#difference between api_CAIT and this table 
print(sorted(set(api_CAIT["Country"].unique()) - set(corr_iso_country["Country"].unique())))
print()
print(sorted(set(corr_iso_country["Country"].unique()) - set(api_CAIT["Country"].unique()) ))

#almost ok

['Democratic Republic of the Congo', 'Eswatini', 'European Union (27)', 'Iran', 'Laos', 'Macedonia', 'Micronesia', 'Moldova', 'North Korea', 'Republic of Congo', 'Syria', 'Tanzania', 'World']

['American Samoa', 'Anguilla', 'Antarctica', 'Aruba', 'Bermuda', 'Bolivia, Plurinational State of', 'Bouvet Island', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Burma', 'Cayman Islands', 'Christmas Island', 'Cocos (Keeling) Islands', 'Congo', 'Congo, the Democratic Republic of the', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gibraltar', 'Greenland', 'Guadeloupe', 'Guam', 'Guernsey', 'Heard Island and McDonald Islands', 'Holy See (Vatican City State)', 'Hong Kong', 'Iran, Islamic Republic of', 'Isle of Man', 'Ivory Coast', 'Jersey', "Korea, Democratic People's Republic of", 'Korea, Republic of', "Lao People's Democratic Republic", 'Libyan Arab Jamahiriya', 'Macao', 'Macedonia, the former Yugoslav Republic of', 'M

In [12]:
#difference between all_CAIT and this table 
print(sorted(set(all_CAIT["Country"].unique()) - set(corr_iso_country["Country"].unique())))
print()
print(sorted(set(corr_iso_country["Country"].unique()) - set(all_CAIT["Country"].unique()) ))

#almost ok

['Antigua & Barbuda', 'Bosnia & Herzegovina', 'Congo, Dem. Republic', "Cote d'Ivoire", 'European Union (28)', 'Iran', 'Korea (North)', 'Korea (South)', 'Laos', 'Macedonia, FYR', 'Micronesia', 'Moldova', 'Saint Kitts & Nevis', 'Saint Vincent & Grenadines', 'Sao Tome & Principe', 'Syria', 'Tanzania', 'Trinidad & Tobago', 'United States of America', 'World']

['American Samoa', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Aruba', 'Bermuda', 'Bolivia, Plurinational State of', 'Bosnia and Herzegovina', 'Bouvet Island', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Burma', 'Cayman Islands', 'Christmas Island', 'Cocos (Keeling) Islands', 'Congo, the Democratic Republic of the', "Côte d'Ivoire", 'Falkland Islands (Malvinas)', 'Faroe Islands', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gibraltar', 'Greenland', 'Guadeloupe', 'Guam', 'Guernsey', 'Heard Island and McDonald Islands', 'Holy See (Vatican City State)', 'Hong Kong', 'Iran, Islamic Republic of', 

# CONCLUSION 

It is possible to turn all the countries into the ALPHA ISO column. We add a line with corresponding ISO in the referential

In [13]:
#Defined in all CAIT but missing in ISO
to_add = {'Antigua & Barbuda' : 'ATG',
'Bosnia & Herzegovina': 'BIH',
'Congo, Dem. Republic': 'COD',
"Cote d'Ivoire": 'CIV',
'European Union (28)': 'EUU',
'Iran': 'IRN',
'Korea (North)': 'PRK',
'Korea (South)': 'KOR',
'Laos': 'LAO',
'Macedonia, FYR': 'MKD',
'Micronesia': 'FSM',
'Moldova': 'MDA',
'Saint Kitts & Nevis': 'KNA',
'Saint Vincent & Grenadines': 'VCT',
'Sao Tome & Principe': 'STP',
'Syria' : 'SYR',
'Tanzania': 'TZA',
'Trinidad & Tobago': 'TTO',
'United States of America': 'USA',
'World' : 'WORLD',
#Defined in api CAIT but missing in ISO
'Democratic Republic of the Congo': 'COD',
'Eswatini': 'SWZ',
'European Union (27)': 'EUU',
#'Iran',
#'Laos', 
'Macedonia': 'MKD',
#'Micronesia', 
#'Moldova', 
'North Korea': 'PRK',
'Republic of Congo': 'COG'
#'Syria',
#'Tanzania',
#'World'
}

#add line in corr_completed_iso
completed_iso = corr_iso_country.copy()
rows_list = []
for x in to_add:
    rows_list.append([x, np.nan, to_add[x], np.nan, np.nan, np.nan])                
completed_iso = completed_iso.append(pd.DataFrame(rows_list, columns=corr_iso_country.columns), ignore_index=True)

In [14]:
completed_iso

Unnamed: 0,Country,Alpha-2 code,Alpha-3 code,Numeric code,Latitude (average),Longitude (average)
0,Afghanistan,AF,AFG,4.0,33.0000,65.0
1,Albania,AL,ALB,8.0,41.0000,20.0
2,Algeria,DZ,DZA,12.0,28.0000,3.0
3,American Samoa,AS,ASM,16.0,-14.3333,-170.0
4,Andorra,AD,AND,20.0,42.5000,1.6
...,...,...,...,...,...,...
277,Eswatini,,SWZ,,,
278,European Union (27),,EUU,,,
279,Macedonia,,MKD,,,
280,North Korea,,PRK,,,


All countries in all_cait and api_cait can be turned into an ISO identifier. We add a column in the corresponding datasets

In [15]:
all_CAIT["alpha_3"] = [completed_iso[completed_iso["Country"] == country].iloc[0,2] for country in all_CAIT["Country"]]
api_CAIT["alpha_3"] = [completed_iso[completed_iso["Country"] == country].iloc[0,2] for country in api_CAIT["Country"]]

Unnamed: 0,Country,Source,Sector,Gas,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,AFG,CAIT,Total excluding LUCF,All GHG,15.182850,15.102010,13.634690,13.463990,13.271730,13.476050,14.437570,15.342910,16.095230,16.914580,15.075750,...,16.515810,17.398280,21.038220,24.878710,31.537650,36.955460,44.906160,58.651860,66.749280,74.796110,84.61923,93.728620,95.372840,97.300110,98.920760
1,AFG,CAIT,Total including LUCF,All GHG,12.794040,12.713210,11.245880,11.075190,10.882930,11.087250,12.048770,12.954110,13.706420,14.525780,12.686940,...,16.637720,17.520180,21.160120,25.000610,31.659550,37.077360,45.028070,58.405640,66.503060,74.549890,84.37301,93.482400,95.527500,97.454770,99.075410
2,AFG,CAIT,Energy,All GHG,5.829497,5.334624,3.760858,3.422760,3.102594,2.783429,2.651769,2.509109,2.389449,2.096789,2.017130,...,2.271723,2.759622,6.319421,10.001220,15.278020,20.140820,26.056620,39.529420,47.592230,55.527030,64.67584,74.741640,75.932910,77.718180,79.580440
3,AFG,CAIT,Industrial Processes,All GHG,0.051879,0.054500,0.060111,0.062722,0.065343,0.067964,0.081694,0.095434,0.109174,0.122915,0.109985,...,0.142366,0.142436,0.162460,0.176814,0.202657,0.222971,0.248895,0.313896,0.378967,0.449909,0.53463,0.592081,0.758807,0.911544,1.064280
4,AFG,CAIT,Agriculture,All GHG,8.072853,8.396465,8.409491,8.486480,8.523959,8.957016,9.977472,10.952730,11.751970,12.791240,10.986000,...,11.769860,12.072050,12.054550,12.121270,13.399950,13.857020,15.788380,15.902260,15.777790,15.724860,16.22045,15.112570,15.315740,15.221950,14.744540
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9077,ZWE,CAIT,Other Fuel Combustion,N2O,0.407812,0.417100,0.426387,0.435675,0.444962,0.587562,0.730026,0.872489,1.014953,1.157417,1.299881,...,0.883322,0.779182,0.803597,0.828012,0.852426,0.876841,0.901256,1.023247,1.145238,1.267229,1.38922,1.511210,1.536870,1.562529,1.588188
9078,ZWE,CAIT,Fugitive Emissions,N2O,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000
9079,ZWE,CAIT,Total excluding LUCF,F-Gas,0.067604,0.064739,0.061875,0.059011,0.056146,0.053282,0.072341,0.091400,0.110460,0.129519,0.148578,...,0.195809,0.207617,0.230892,0.254168,0.277443,0.300719,0.323994,0.379388,0.434782,0.490176,0.54557,0.600965,0.665562,0.730160,0.794758
9080,ZWE,CAIT,Total including LUCF,F-Gas,0.067604,0.064739,0.061875,0.059011,0.056146,0.053282,0.072341,0.091400,0.110460,0.129519,0.148578,...,0.195809,0.207617,0.230892,0.254168,0.277443,0.300719,0.323994,0.379388,0.434782,0.490176,0.54557,0.600965,0.665562,0.730160,0.794758


In [17]:
# Comapre GHG CAIT and API CAIT
ghg_CAIT_filtered = ghg_CAIT[(ghg_CAIT["Sector"] == "Total including LUCF") & (ghg_CAIT["Gas"] == "All GHG")]
ghg_CAIT_filtered.rename(columns={'Country' : 'alpha_3', 'Source': 'Data Source'}, inplace=True)
ghg_CAIT_filtered.reset_index(inplace=True, drop=True)
ghg_CAIT_filtered = ghg_CAIT_filtered.sort_values(by="alpha_3").reset_index(drop=True)
ghg_CAIT_filtered

Unnamed: 0,alpha_3,Data Source,Sector,Gas,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,AFG,CAIT,Total including LUCF,All GHG,12.79404,12.71321,11.24588,11.07519,10.88293,11.08725,12.04877,12.95411,13.70642,14.52578,12.68694,...,16.63772,17.52018,21.16012,25.00061,31.65955,37.07736,45.02807,58.40564,66.50306,74.54989,84.37301,93.4824,95.5275,97.45477,99.07541
1,AGO,CAIT,Total including LUCF,All GHG,65.17916,65.84554,66.17063,66.958,67.10296,71.46614,73.32947,71.74041,79.62493,77.64527,73.28794,...,114.9187,115.1372,114.9928,121.8567,123.385,128.3418,133.3547,133.5429,133.2065,135.4103,136.4979,139.0857,138.8545,137.3172,124.5919
2,ALB,CAIT,Total including LUCF,All GHG,11.50905,9.147693,7.091174,7.032104,7.871299,7.630235,7.355607,6.645458,6.878919,8.092082,8.484637,...,8.192747,7.978511,7.878905,7.944576,7.66892,7.732635,8.012203,8.792089,8.399308,8.596466,9.082266,9.008917,9.485248,10.05902,9.841821
3,AND,CAIT,Total including LUCF,All GHG,0.430703,0.432506,0.434309,0.439776,0.437915,0.458038,0.493621,0.510884,0.542803,0.571058,0.588321,...,0.643489,0.662777,0.638078,0.635363,0.639976,0.622605,0.627218,0.602272,0.59931,0.58902,0.575066,0.579432,0.586052,0.585343,0.587181
4,ARE,CAIT,Total including LUCF,All GHG,69.71052,76.25309,74.82384,77.75755,84.36647,90.78096,96.08856,103.8601,108.3913,112.6903,113.3438,...,148.0248,156.7044,162.9525,174.2744,199.8905,202.0764,208.5793,214.6342,226.5671,237.6311,240.0206,252.8525,259.6457,269.822,263.2394
5,ARG,CAIT,Total including LUCF,All GHG,288.3737,293.3617,297.7556,300.2689,305.3495,305.9336,319.8092,322.3359,326.1481,332.245,331.1449,...,406.4439,410.9204,425.4164,438.8241,442.9756,424.071,429.8086,410.6397,414.8384,424.7025,421.7583,428.3749,396.2695,397.6978,395.4986
6,ARM,CAIT,Total including LUCF,All GHG,24.44254,25.06766,14.95227,8.765833,6.238958,6.84631,5.778699,6.388588,6.312973,5.80162,6.153152,...,6.748061,7.468687,7.899751,8.564243,9.114554,8.020746,7.875077,8.300317,9.286578,9.222494,9.184014,9.177997,9.541392,9.31884,9.39727
7,ATG,CAIT,Total including LUCF,All GHG,0.35941,0.372226,0.450396,0.427165,0.433083,0.459654,0.485055,0.520347,0.56834,0.588586,0.609162,...,0.757573,0.788464,0.859508,0.917346,0.970969,1.926584,1.039144,1.06638,1.286581,1.056556,1.070591,1.095618,1.134099,1.160076,1.212433
8,AUS,CAIT,Total including LUCF,All GHG,558.2095,558.1337,560.2965,559.9836,563.0564,571.1905,571.0068,582.2795,608.0825,643.6008,661.9706,...,635.5026,598.4392,646.9057,646.4625,617.089,633.8043,601.1399,644.7351,643.1166,555.2612,566.2115,566.4693,577.3178,623.0799,619.2639
9,AUT,CAIT,Total including LUCF,All GHG,62.78817,66.99382,61.20293,61.12233,61.21579,63.69363,67.41246,66.39826,66.38068,64.5572,64.76489,...,83.58893,83.62986,81.30703,78.22495,77.78849,70.8345,76.5547,73.72466,70.50441,71.0535,67.28015,68.4078,68.53422,70.70918,67.85463


In [18]:
api_CAIT_filtered = api_CAIT.copy()
api_CAIT_filtered.drop(["Country", "Unit"], axis=1, inplace=True)
api_CAIT_filtered.columns = ["Data Source", "Sector", "Gas"] + [2018 - i for i in range(29)] + ["alpha_3"]
api_CAIT_filtered = api_CAIT_filtered[ghg_CAIT_filtered.columns]
api_CAIT_filtered = api_CAIT_filtered.sort_values(by="alpha_3").reset_index(drop=True)
api_CAIT_filtered

Unnamed: 0,alpha_3,Data Source,Sector,Gas,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,AFG,CAIT,Total including LUCF,All GHG,12.79,12.71,11.25,11.08,10.88,11.09,12.05,12.95,13.71,14.53,12.69,...,16.64,17.52,21.16,25.0,31.66,37.08,45.03,58.41,66.5,74.55,84.37,93.48,95.53,97.45,99.08
1,AGO,CAIT,Total including LUCF,All GHG,65.18,65.85,66.17,66.96,67.1,71.47,73.33,71.74,79.62,77.65,73.29,...,114.92,115.14,114.99,121.86,123.39,128.34,133.35,133.54,133.21,135.41,136.5,139.09,138.85,137.32,124.59
2,ALB,CAIT,Total including LUCF,All GHG,11.51,9.15,7.09,7.03,7.87,7.63,7.36,6.65,6.88,8.09,8.48,...,8.19,7.98,7.88,7.94,7.67,7.73,8.01,8.79,8.4,8.6,9.08,9.01,9.49,10.06,9.84
3,AND,CAIT,Total including LUCF,All GHG,0.43,0.43,0.43,0.44,0.44,0.46,0.49,0.51,0.54,0.57,0.59,...,0.64,0.66,0.64,0.64,0.64,0.62,0.63,0.6,0.6,0.59,0.58,0.58,0.59,0.59,0.59
4,ARE,CAIT,Total including LUCF,All GHG,69.71,76.25,74.82,77.76,84.37,90.78,96.09,103.86,108.39,112.69,113.34,...,148.02,156.7,162.95,174.27,199.89,202.08,208.58,214.63,226.57,237.63,240.02,252.85,259.65,269.82,263.24
5,ARG,CAIT,Total including LUCF,All GHG,288.37,293.36,297.76,300.27,305.35,305.93,319.81,322.34,326.15,332.24,331.14,...,406.44,410.92,425.42,438.82,442.98,424.07,429.81,410.64,414.84,424.7,421.76,428.37,396.27,397.7,395.5
6,ARM,CAIT,Total including LUCF,All GHG,24.44,25.07,14.95,8.77,6.24,6.85,5.78,6.39,6.31,5.8,6.15,...,6.75,7.47,7.9,8.56,9.11,8.02,7.88,8.3,9.29,9.22,9.18,9.18,9.54,9.32,9.4
7,ATG,CAIT,Total including LUCF,All GHG,0.36,0.37,0.45,0.43,0.43,0.46,0.49,0.52,0.57,0.59,0.61,...,0.76,0.79,0.86,0.92,0.97,1.93,1.04,1.07,1.29,1.06,1.07,1.1,1.13,1.16,1.21
8,AUS,CAIT,Total including LUCF,All GHG,558.21,558.13,560.3,559.98,563.06,571.19,571.01,582.28,608.08,643.6,661.97,...,635.5,598.44,646.91,646.46,617.09,633.8,601.14,644.74,643.12,555.26,566.21,566.47,577.32,623.08,619.26
9,AUT,CAIT,Total including LUCF,All GHG,62.79,66.99,61.2,61.12,61.22,63.69,67.41,66.4,66.38,64.56,64.76,...,83.59,83.63,81.31,78.22,77.79,70.83,76.55,73.72,70.5,71.05,67.28,68.41,68.53,70.71,67.85


In [19]:
api_CAIT_filtered.compare(ghg_CAIT_filtered)

Unnamed: 0_level_0,1990,1990,1991,1991,1992,1992,1993,1993,1994,1994,1995,1995,1996,1996,1997,...,2011,2012,2012,2013,2013,2014,2014,2015,2015,2016,2016,2017,2017,2018,2018
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,self,other,self,other,self,...,other,self,other,self,other,self,other,self,other,self,other,self,other,self,other
0,12.79,12.79404,12.71,12.71321,11.25,11.24588,11.08,11.07519,10.88,10.88293,11.09,11.08725,12.05,12.04877,12.95,...,58.40564,66.5,66.50306,74.55,74.54989,84.37,84.37301,93.48,93.4824,95.53,95.5275,97.45,97.45477,99.08,99.07541
1,65.18,65.17916,65.85,65.84554,66.17,66.17063,66.96,66.958,67.1,67.10296,71.47,71.46614,73.33,73.32947,71.74,...,133.5429,133.21,133.2065,135.41,135.4103,136.5,136.4979,139.09,139.0857,138.85,138.8545,137.32,137.3172,124.59,124.5919
2,11.51,11.50905,9.15,9.147693,7.09,7.091174,7.03,7.032104,7.87,7.871299,7.63,7.630235,7.36,7.355607,6.65,...,8.792089,8.4,8.399308,8.6,8.596466,9.08,9.082266,9.01,9.008917,9.49,9.485248,10.06,10.05902,9.84,9.841821
3,0.43,0.430703,0.43,0.432506,0.43,0.434309,0.44,0.439776,0.44,0.437915,0.46,0.458038,0.49,0.493621,0.51,...,0.602272,0.6,0.59931,0.59,0.58902,0.58,0.575066,0.58,0.579432,0.59,0.586052,0.59,0.585343,0.59,0.587181
4,69.71,69.71052,76.25,76.25309,74.82,74.82384,77.76,77.75755,84.37,84.36647,90.78,90.78096,96.09,96.08856,103.86,...,214.6342,226.57,226.5671,237.63,237.6311,240.02,240.0206,252.85,252.8525,259.65,259.6457,269.82,269.822,263.24,263.2394
5,288.37,288.3737,293.36,293.3617,297.76,297.7556,300.27,300.2689,305.35,305.3495,305.93,305.9336,319.81,319.8092,322.34,...,410.6397,414.84,414.8384,424.7,424.7025,421.76,421.7583,428.37,428.3749,396.27,396.2695,397.7,397.6978,395.5,395.4986
6,24.44,24.44254,25.07,25.06766,14.95,14.95227,8.77,8.765833,6.24,6.238958,6.85,6.84631,5.78,5.778699,6.39,...,8.300317,9.29,9.286578,9.22,9.222494,9.18,9.184014,9.18,9.177997,9.54,9.541392,9.32,9.31884,9.4,9.39727
7,0.36,0.35941,0.37,0.372226,0.45,0.450396,0.43,0.427165,0.43,0.433083,0.46,0.459654,0.49,0.485055,0.52,...,1.06638,1.29,1.286581,1.06,1.056556,1.07,1.070591,1.1,1.095618,1.13,1.134099,1.16,1.160076,1.21,1.212433
8,558.21,558.2095,558.13,558.1337,560.3,560.2965,559.98,559.9836,563.06,563.0564,571.19,571.1905,571.01,571.0068,582.28,...,644.7351,643.12,643.1166,555.26,555.2612,566.21,566.2115,566.47,566.4693,577.32,577.3178,623.08,623.0799,619.26,619.2639
9,62.79,62.78817,66.99,66.99382,61.2,61.20293,61.12,61.12233,61.22,61.21579,63.69,63.69363,67.41,67.41246,66.4,...,73.72466,70.5,70.50441,71.05,71.0535,67.28,67.28015,68.41,68.4078,68.53,68.53422,70.71,70.70918,67.85,67.85463


In [20]:
#It seems correct. Using the API, precise data has been lost in the process. 