In [1]:
import pandas as pd

In [2]:
#import clean emission data created in ML_model.ipynb
emissions = pd.read_csv('./Emissions_data.csv')
emissions

Unnamed: 0,ISO_Code,Country,Year,CO2,Temperature,Population,GDP,GDP_per_Capita,CO2_per_Capita,CO2_per_Unit_GDP,Emission_Ratio
0,AFG,Afghanistan,1950,0.084,11.619864,7752000.0,1.949480e+10,2514.808999,0.108359,3.340214,Low
1,AFG,Afghanistan,1951,0.092,12.647118,7840000.0,2.006385e+10,2559.164343,0.117347,3.594923,Low
2,AFG,Afghanistan,1952,0.092,12.805239,7936000.0,2.074235e+10,2613.703484,0.115927,3.519910,Low
3,AFG,Afghanistan,1953,0.106,13.272074,8040000.0,2.201546e+10,2738.241719,0.131841,3.871097,Low
4,AFG,Afghanistan,1954,0.106,12.419927,8151000.0,2.248333e+10,2758.352230,0.130045,3.842874,Low
...,...,...,...,...,...,...,...,...,...,...,...
11213,ZWE,Zimbabwe,2012,7.695,21.910075,13115000.0,2.048226e+10,1561.743118,5.867327,492.718675,High
11214,ZWE,Zimbabwe,2013,11.632,21.624350,13350000.0,2.374258e+10,1778.470621,8.713109,654.045103,High
11215,ZWE,Zimbabwe,2014,11.962,21.710483,13587000.0,2.474828e+10,1821.467867,8.804004,656.723087,High
11216,ZWE,Zimbabwe,2015,12.163,22.327625,13815000.0,2.503057e+10,1811.840028,8.804198,671.306507,High


## Analyize High and Low Emission Ratio Countries
The countries in this dataset have been split into two groups, High and Low.  This represents the ratio between each country's GDP per Capita, and its CO2 emissions.  A High ranking means that country has a ratio over 300, indicating that their emissions outstrip their economic output relative to the population.  In other words, the economies of these countries produce more emissions than economic benefit to it's citizens.  An analysis of these two groups can help determine if it's possible for a country to both grow it's economy and keep emissions below this threshold.  The changes over time for key metrics will be examined for each country to show trends.

In [3]:
# separate into high and low emission groups
high_emission = emissions.loc[emissions['Emission_Ratio'] == 'High']
low_emission = emissions.loc[emissions['Emission_Ratio'] == 'Low']

In [4]:
high_emission['Country'].unique().tolist()

['Afghanistan',
 'Algeria',
 'Angola',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bangladesh',
 'Belarus',
 'Belgium',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Brazil',
 'Bulgaria',
 'Cameroon',
 'Canada',
 'Chile',
 'China',
 'Colombia',
 "Cote d'Ivoire",
 'Cuba',
 'Denmark',
 'Ecuador',
 'Egypt',
 'Estonia',
 'Ethiopia',
 'Finland',
 'France',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Hungary',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Italy',
 'Japan',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Kyrgyzstan',
 'Liberia',
 'Libya',
 'Lithuania',
 'Madagascar',
 'Malaysia',
 'Mexico',
 'Moldova',
 'Mongolia',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Nepal',
 'Netherlands',
 'Nigeria',
 'North Korea',
 'Pakistan',
 'Peru',
 'Philippines',
 'Poland',
 'Portugal',
 'Romania',
 'Russia',
 'Saudi Arabia',
 'Serbia',
 'Senegal',
 'Singapore',
 'Slovakia',
 'South Africa',
 'South Korea',
 'Spain',
 'Sudan',
 'Sweden',
 'Syria',
 'Tajikistan',
 'Tanzania',
 'Thailan

In [5]:
low_emission['Country'].unique().tolist()

['Afghanistan',
 'Albania',
 'Algeria',
 'Angola',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belgium',
 'Benin',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Burundi',
 'Bulgaria',
 'Burkina Faso',
 'Cambodia',
 'Cameroon',
 'Cape Verde',
 'Central African Republic',
 'Chad',
 'Chile',
 'Colombia',
 'Comoros',
 'Costa Rica',
 'Croatia',
 "Cote d'Ivoire",
 'Cuba',
 'Cyprus',
 'Denmark',
 'Dominica',
 'Djibouti',
 'Dominican Republic',
 'Ecuador',
 'Equatorial Guinea',
 'Egypt',
 'El Salvador',
 'Estonia',
 'Ethiopia',
 'Finland',
 'Gabon',
 'Gambia',
 'Georgia',
 'Ghana',
 'Greece',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Haiti',
 'Honduras',
 'Hungary',
 'Iceland',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Israel',
 'Italy',
 'Jamaica',
 'Jordan',
 'Kenya',
 'Kuwait',
 'Kyrgyzstan',
 'Laos',
 'Liberia',
 'Latvia',
 'Lebanon',
 'Lesotho',
 'Libya',
 'Lithuania',
 'Luxembourg',
 'Madagascar',
 'Malawi',


By separating the dataframe into low and high emission lists, and examining the countries, it is clear that many countries appear as both high and low, depending on the year.  Now we can begin to determine how a country's Emission Ratio has changed over time.  First a range of the years in the dataset must be established for each country, as it varies depending on the availability of data.

In [6]:
# find the first and last year in the dataset for each country
date_range = emissions.groupby('Country').agg({'Year':['min','max']})['Year']
date_range

Unnamed: 0_level_0,min,max
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,1950,2016
Albania,1950,2016
Algeria,1970,2016
Angola,1975,2016
Argentina,1901,2016
...,...,...
Venezuela,1904,2016
Vietnam,1913,2016
Yemen,1950,2016
Zambia,1959,2016


A for loop is written that will loop through each country in the dataset, and determine the GDP per Capita, Emission Ratio, CO2 produced, and the Temperature for the first and last year available.  A calculation is then done to determine the change over time for each of these indicators.  This will separate countries into groups of emissions ratios over time as Low to Low, High to Low, Low to High, and High to High.  It will also show how much GDP has increased or decreased, the change in CO2 output, and how the temperature has fluctuated over the time span given for each country.  This will allow us to find the countries that have done the best and the worst over time.

In [7]:
# create a list of countries in the dataset
countries = emissions['Country'].unique().tolist()

# creat empty dictionaries to hold the new values
stay_low = {}
improved = {}
got_worse = {}
stay_high = {}

#loop through the list of countries
for country in countries:
    
    #create a variable to hold the first and last year in the dataset for each country
    first = date_range.loc[country,'min']
    last = date_range.loc[country,'max']
    
    #create a variable to hold the value for gdp per capita for the first and last year for each country
    gdp_start = emissions[(emissions['Country'] == country) & (emissions['Year'] == first)]['GDP_per_Capita'].values[0]
    gdp_end = emissions[(emissions['Country'] == country) & (emissions['Year'] == last)]['GDP_per_Capita'].values[0]
    
    #create a variable to hold the 'high' or 'low' ranking for each country's emissions in the first and last year
    start_ratio = emissions[(emissions['Country'] == country) & (emissions['Year'] == first)]['Emission_Ratio'].values[0]
    end_ratio = emissions[(emissions['Country'] == country) & (emissions['Year'] == last)]['Emission_Ratio'].values[0]
    
    # create variable to hold co2_per_capita for the first and last year for each country
    start_co2 = emissions[(emissions['Country'] == country) & (emissions['Year'] == first)]['CO2_per_Capita'].values[0]
    end_co2 = emissions[(emissions['Country'] == country) & (emissions['Year'] == last)]['CO2_per_Capita'].values[0]
    
    # create variable to hold temperature for the first and last year for each country
    start_temp = emissions[(emissions['Country'] == country) & (emissions['Year'] == first)]['Temperature'].values[0]
    end_temp = emissions[(emissions['Country'] == country) & (emissions['Year'] == last)]['Temperature'].values[0]
    
    
    # calculate the change in gdp, co2, and temp for countries that went from LOW TO LOW 
    if (start_ratio == 'Low') & (end_ratio == 'Low'):
        gdp_change = gdp_end - gdp_start
        co2_change = end_co2 - start_co2
        temp_change = end_temp - start_temp
        stay_low[country] = {'Country' : country,
                            'GDP_Change': gdp_change, 
                            'CO2_Change' : co2_change,
                            'Temperature_Change' : temp_change,
                            'Year_Start' : first,
                            'Year_End' : last}
       
    # calculate the change in gdp, co2, and temp for countries that changed from HIGH TO LOW 
    elif (start_ratio == 'High') & (end_ratio == 'Low'):
        gdp_change = gdp_end - gdp_start
        co2_change = end_co2 - start_co2
        temp_change = end_temp - start_temp
        improved[country] = {'Country' : country,
                            'GDP_Change': gdp_change, 
                            'CO2_Change' : co2_change,
                            'Temperature_Change' : temp_change,
                            'Year_Start' : first,
                            'Year_End' : last}
        
    # calculate the change in gdp, co2, and temp for countries that changed from LOW TO HIGH 
    elif (start_ratio == 'Low') & (end_ratio == 'High'):
        gdp_change = gdp_end - gdp_start
        co2_change = end_co2 - start_co2
        temp_change = end_temp - start_temp
        got_worse[country] = {'Country' : country,
                            'GDP_Change': gdp_change, 
                            'CO2_Change' : co2_change,
                            'Temperature_Change' : temp_change,
                            'Year_Start' : first,
                            'Year_End' : last}
    
    # calculate the change in gdp, co2, and temp for countries that changed from HIGH TO HIGH 
    elif (start_ratio == 'High') & (end_ratio == 'High'):
        gdp_change = gdp_end - gdp_start
        co2_change = end_co2 - start_co2
        temp_change = end_temp - start_temp
        stay_high[country] = {'Country' : country,
                            'GDP_Change': gdp_change, 
                            'CO2_Change' : co2_change,
                            'Temperature_Change' : temp_change,
                            'Year_Start' : first,
                            'Year_End' : last}
        
        
# convert calulations to dataframe
stay_low_df = pd.DataFrame.from_dict(stay_low, orient = 'index')
improved_df = pd.DataFrame.from_dict(improved, orient = 'index')
got_worse_df = pd.DataFrame.from_dict(got_worse, orient = 'index')
stay_high_df = pd.DataFrame.from_dict(stay_high, orient = 'index')

The result is four dataframes which show the countries that have Emissions Ratios that have stayed low, improved, gotten worse, and stayed high over time. Each dataframe shows the changes to GDP, CO2 and Temperature.  The range of years is also included for context.

In [8]:
#clean up index on stay_low
stay_low_df = stay_low_df.reset_index()
stay_low= stay_low_df.drop(['index'], axis = 1)
stay_low

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
0,Albania,10466.948818,13.351852,0.294388,1950,2016
1,Armenia,3052.483332,-33.025833,1.675355,1973,2016
2,Bahrain,24700.354145,99.020867,1.452817,1970,2016
3,Barbados,5762.621968,40.806019,1.388825,1950,2016
4,Benin,1150.179153,5.240824,0.909750,1958,2016
...,...,...,...,...,...,...
79,Tunisia,8859.303351,22.277524,1.517540,1950,2016
80,Uganda,677.448240,1.156193,2.069242,1950,2016
81,United Arab Emirates,-16791.720827,-84.682674,1.217733,1993,2016
82,Uruguay,15667.216267,19.084885,-0.407207,1932,2016


In [9]:
#clean up index on improved
improved_df = improved_df.reset_index()
improved= improved_df.drop(['index'], axis = 1)

In [10]:
#clean up index on got_worse
got_worse_df = got_worse_df.reset_index()
got_worse = got_worse_df.drop(['index'], axis = 1)

In [11]:
#clean up index on stay_high
stay_high_df = stay_high_df.reset_index()
stay_high = stay_high_df.drop(['index'], axis = 1)

The stay_ low and improved dataframes are then merged, as all these countries have earned the distinction of having done a good job by maintaining or improving their emissions/gdp ratio.

In [12]:
#merge two dataframes into one
good_job = pd.concat([stay_low, improved])
good_job

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
0,Albania,10466.948818,13.351852,0.294388,1950,2016
1,Armenia,3052.483332,-33.025833,1.675355,1973,2016
2,Bahrain,24700.354145,99.020867,1.452817,1970,2016
3,Barbados,5762.621968,40.806019,1.388825,1950,2016
4,Benin,1150.179153,5.240824,0.909750,1958,2016
...,...,...,...,...,...,...
2,Belgium,33515.851932,18.411939,1.949663,1901,2016
3,Bosnia and Herzegovina,11467.093915,49.003204,1.523910,1959,2016
4,Hungary,22748.557213,33.343283,1.054459,1910,2016
5,Moldova,1356.411970,-44.224353,1.766712,1973,2016


In [23]:
#export new dataframe as csv
#good_job.to_csv("Low_Emissions_Countries.csv", index=False)

The got_worse and stay_high dataframes are then merged, as all these countries have earned the distinction of having done a bad job by showing a deteriorating ratio or by never improving at all.

In [13]:
#merge two dataframes into one
bad_job = pd.concat([got_worse, stay_high])
bad_job

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
0,Afghanistan,-697.624844,2.337166,2.893701,1950,2016
1,Algeria,7067.994899,25.982784,1.402083,1970,2016
2,Angola,2411.771512,5.549800,1.195092,1975,2016
3,Argentina,13969.802471,40.922390,-0.258302,1901,2016
4,Australia,38349.499603,140.168874,0.905133,1901,2016
...,...,...,...,...,...,...
17,Turkmenistan,12992.667217,-6.401300,1.527772,1973,2016
18,Ukraine,867.219122,-53.319318,1.649833,1973,2016
19,United Kingdom,32897.107390,-46.687916,1.172031,1901,2016
20,United States,46549.610062,73.264825,2.339558,1901,2016


In [24]:
#export new dataframe as csv
#bad_job.to_csv("High_Emissions_Countries.csv", index=False)

The bad_job dataframe is ordered to show the countries that have performed the worst in terms of CO2 emissions over time.  This is done by keeping the countries with a positive CO2 change over time and putting them in order from worst to least worst.  This will indicate the worst polluters.

In [21]:
polluters = bad_job.loc[bad_job['CO2_Change'] > 1 ].sort_values(by= 'CO2_Change', ascending=False)
polluters                         

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
30,Saudi Arabia,40951.397625,178.191896,1.901842,1950,2016
4,Australia,38349.499603,140.168874,0.905133,1901,2016
32,South Korea,35069.1147,120.072182,1.209148,1950,2016
2,Canada,38321.364404,112.685433,2.241518,1901,2016
8,Japan,34397.393581,89.209059,2.114275,1901,2016
15,Iran,14467.129601,81.27872,2.286862,1913,2016
20,Malaysia,21010.269775,80.404758,1.01515,1911,2016
18,Libya,5616.188412,75.817613,1.876042,1950,2016
20,United States,46549.610062,73.264825,2.339558,1901,2016
10,Netherlands,44835.674452,70.640688,1.928488,1901,2016


Sorting by the countries with the least gdp growth, that also increased co2 emissions, will show those who have done the least for its citizens and the least for the environment.

In [22]:
the_worst = bad_job.loc[(bad_job['CO2_Change'] > 1)].sort_values(by= 'GDP_Change', ascending=True)
the_worst

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
0,Afghanistan,-697.624844,2.337166,2.893701,1950,2016
31,Senegal,202.125252,5.756238,1.033083,1958,2016
23,Mozambique,681.504627,1.264142,0.937417,1950,2016
12,Ethiopia,984.488681,1.296688,1.735325,1950,2016
13,Ghana,1068.647525,4.425489,1.3242,1950,2016
27,North Korea,1146.038639,9.037039,1.162251,1950,2015
36,Tanzania,1152.934566,1.482801,0.719617,1959,2016
35,Syria,1195.177993,15.108988,1.917621,1950,2016
41,Yemen,1227.418667,3.643662,1.545908,1950,2016
17,Kenya,1676.045454,2.000894,2.017792,1950,2016


The good_jobs dataframe is filtered to only include countries that have increased GDP per Capita AND decreased CO2 output.  This is determined by only including countries with a positive GDP change over time, and a negative CO2 change over time.  This will give us the countries that have shown it's possible to grow an economy and decrease emissions over time.

In [18]:
#filter for countries who both increased gdp and decreased co2
bravo = good_job.loc[(good_job['CO2_Change'] < 1) & (good_job['GDP_Change'] > 1)].sort_values(by= 'CO2_Change', ascending=True)
bravo

Unnamed: 0,Country,GDP_Change,CO2_Change,Temperature_Change,Year_Start,Year_End
49,Luxembourg,56313.350135,-93.922653,0.877498,1950,2016
6,Slovakia,12161.006564,-50.459153,2.302523,1985,2016
5,Moldova,1356.41197,-44.224353,1.766712,1973,2016
1,Azerbaijan,6051.055928,-33.4322,1.387227,1973,2016
1,Armenia,3052.483332,-33.025833,1.675355,1973,2016
48,Lithuania,15563.76921,-28.426234,1.098658,1973,2016
29,Georgia,3792.849655,-18.826893,1.545461,1973,2016
45,Latvia,12385.671908,-15.628808,1.081291,1973,2016
83,Zambia,1749.116858,-12.372189,0.974292,1959,2016
25,Estonia,17341.933034,-9.993596,1.074581,1973,2016


In [25]:
#export new dataframe as csv
#bravo.to_csv("the_winners.csv", index=False)

In [19]:
countrylist = emissions['Country'].unique().tolist()
len(countrylist)

156

In [20]:
the_good_ones = bravo['Country'].unique().tolist()
len(the_good_ones)

12

## The Results

### The results of this analysis show that Saudi Arabia has increased CO2 emssions the most over time, and Afghanistan alone has both decreased economic output and increased CO2 emissions.  

### Also out of the 156 countries represented in our dataset, 12 have managed to both increase their GDP per Capita and decrease CO2 emissions over time.  Showing that it is possible for a country to improve economic outputs while not polluting disproportionately!  Bravo Luxembourg!