# Joins | CO2 Emissions Meets Democracy 💥💥

#### This file joins the CO2 Emissions to the Democracy Index Data.

## Import Libraries

In [50]:
import pandas as pd
import numpy as np

## Import Data

In [51]:
democracy_data = pd.read_csv('democracy_data_clean.csv', index_col=0)
CO2_Emissions = pd.read_csv('CO2_Emissions_clean.csv', index_col=0)

## View Data

In [52]:
democracy_data

Unnamed: 0,geo,name,time,Democracy index (EIU)
0,AFG,Afghanistan,2006-01-01,30.6
1,AFG,Afghanistan,2007-01-01,30.4
2,AFG,Afghanistan,2008-01-01,30.2
3,AFG,Afghanistan,2009-01-01,27.5
4,AFG,Afghanistan,2010-01-01,24.8
...,...,...,...,...
2496,ZWE,Zimbabwe,2016-01-01,30.5
2497,ZWE,Zimbabwe,2017-01-01,31.6
2498,ZWE,Zimbabwe,2018-01-01,31.6
2499,ZWE,Zimbabwe,2019-01-01,31.6


In [53]:
CO2_Emissions

Unnamed: 0,Country Name,Country Code,Year,CO2 emissions (metric tons per capita)
2,Afghanistan,AFG,1960-01-01,0.046057
267,Afghanistan,AFG,1961-01-01,0.053589
532,Afghanistan,AFG,1962-01-01,0.073721
797,Afghanistan,AFG,1963-01-01,0.074161
1062,Afghanistan,AFG,1964-01-01,0.086174
...,...,...,...,...
14574,Zimbabwe,ZWE,2014-01-01,0.894256
14839,Zimbabwe,ZWE,2015-01-01,0.897598
15104,Zimbabwe,ZWE,2016-01-01,0.783303
15369,Zimbabwe,ZWE,2017-01-01,0.718570


Checking data types are interoperable.

In [54]:
CO2_Emissions.dtypes

Country Name                               object
Country Code                               object
Year                                       object
CO2 emissions (metric tons per capita)    float64
dtype: object

In [55]:
democracy_data.dtypes

geo                       object
name                      object
time                      object
Democracy index (EIU)    float64
dtype: object

We have a lot more years in CO2_Emissions than in democracy_data, but democracy_data has more recent years. We can decide to include them all, joining on country and year, but excluding no year. 

### Join
A left join on either will exclude years from either columns, and as we're not sure what data to keep or get rid of as of yet, we'll keep it all. So an outer join is beneficial. 

In [56]:
democracy_emissions = pd.merge(CO2_Emissions, democracy_data,  how='outer', left_on=['Country Code','Year'], right_on = ['geo','time'])

In [57]:
democracy_emissions
# Looks good.

Unnamed: 0,Country Name,Country Code,Year,CO2 emissions (metric tons per capita),geo,name,time,Democracy index (EIU)
0,Afghanistan,AFG,1960-01-01,0.046057,,,,
1,Afghanistan,AFG,1961-01-01,0.053589,,,,
2,Afghanistan,AFG,1962-01-01,0.073721,,,,
3,Afghanistan,AFG,1963-01-01,0.074161,,,,
4,Afghanistan,AFG,1964-01-01,0.086174,,,,
...,...,...,...,...,...,...,...,...
15973,,,,,YEM,Yemen,2020-01-01,19.5
15974,,,,,ZMB,Zambia,2019-01-01,50.9
15975,,,,,ZMB,Zambia,2020-01-01,48.6
15976,,,,,ZWE,Zimbabwe,2019-01-01,31.6


## More cleaning. 🧹

We don't need repetitive columns but we can't drop them before we've transferred the unique data over to the other attributes.

In [58]:
# If Country Code 'null' then take the data from 'geo' in the same row.
democracy_emissions['Country Name'].fillna(democracy_emissions['name'], inplace=True)
democracy_emissions['Country Code'].fillna(democracy_emissions['geo'], inplace=True)
democracy_emissions['Year'].fillna(democracy_emissions['time'], inplace=True)

In [59]:
democracy_emissions #Checking it worked.

Unnamed: 0,Country Name,Country Code,Year,CO2 emissions (metric tons per capita),geo,name,time,Democracy index (EIU)
0,Afghanistan,AFG,1960-01-01,0.046057,,,,
1,Afghanistan,AFG,1961-01-01,0.053589,,,,
2,Afghanistan,AFG,1962-01-01,0.073721,,,,
3,Afghanistan,AFG,1963-01-01,0.074161,,,,
4,Afghanistan,AFG,1964-01-01,0.086174,,,,
...,...,...,...,...,...,...,...,...
15973,Yemen,YEM,2020-01-01,,YEM,Yemen,2020-01-01,19.5
15974,Zambia,ZMB,2019-01-01,,ZMB,Zambia,2019-01-01,50.9
15975,Zambia,ZMB,2020-01-01,,ZMB,Zambia,2020-01-01,48.6
15976,Zimbabwe,ZWE,2019-01-01,,ZWE,Zimbabwe,2019-01-01,31.6


In [61]:
# Dropping the attributes.
democracy_emissions = democracy_emissions.drop(columns=['geo', 'name', 'time'])
democracy_emissions #Checking it worked.

Unnamed: 0,Country Name,Country Code,Year,CO2 emissions (metric tons per capita),Democracy index (EIU)
0,Afghanistan,AFG,1960-01-01,0.046057,
1,Afghanistan,AFG,1961-01-01,0.053589,
2,Afghanistan,AFG,1962-01-01,0.073721,
3,Afghanistan,AFG,1963-01-01,0.074161,
4,Afghanistan,AFG,1964-01-01,0.086174,
...,...,...,...,...,...
15973,Yemen,YEM,2020-01-01,,19.5
15974,Zambia,ZMB,2019-01-01,,50.9
15975,Zambia,ZMB,2020-01-01,,48.6
15976,Zimbabwe,ZWE,2019-01-01,,31.6


In [62]:
# Reordering, as the 2019 and 2020 years are out of place.
democracy_emissions.sort_values(by=['Country Name', 'Year'])

Unnamed: 0,Country Name,Country Code,Year,CO2 emissions (metric tons per capita),Democracy index (EIU)
0,Afghanistan,AFG,1960-01-01,0.046057,
1,Afghanistan,AFG,1961-01-01,0.053589,
2,Afghanistan,AFG,1962-01-01,0.073721,
3,Afghanistan,AFG,1963-01-01,0.074161,
4,Afghanistan,AFG,1964-01-01,0.086174,
...,...,...,...,...,...
15632,Zimbabwe,ZWE,2016-01-01,0.783303,30.5
15633,Zimbabwe,ZWE,2017-01-01,0.718570,31.6
15634,Zimbabwe,ZWE,2018-01-01,0.849793,31.6
15976,Zimbabwe,ZWE,2019-01-01,,31.6
