##In this project I will look at historical temperature changes and try to estableshed whether these depend on the level of polution for various countries. First we are going to look at the historical CO_2 emissions data, downloaded from http://www.globalcarbonatlas.org/en/CO2-emissions. The data for 2015 and 2016 is preliminary.

In [1]:
import pandas as pd
import numpy as np
import plotly
import plotly.graph_objs as go
from plotly import tools
plotly.offline.init_notebook_mode(connected=True)
from scipy import stats

In [2]:
co2_data = pd.read_csv('CO2_historical_1.csv')
co2_data = co2_data.rename(columns= {'Unnamed: 0': 'Year'})
co2_data.set_index('Year', inplace=True, drop=True)
co2_data.head()

Unnamed: 0_level_0,Afghanistan,Albania,Algeria,Andorra,Angola,Anguilla,Antigua and Barbuda,Argentina,Armenia,Aruba,...,Uruguay,Uzbekistan,Vanuatu,Venezuela,Vietnam,Wallis and Futuna Islands,Western Sahara,Yemen,Zambia,Zimbabwe
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1960,0.41403,2.0225,6.1555,,0.5496,,0.03664,48.7752,2.4946,0.61856,...,4.3162,47.8664,,57.0228,7.4856,,,3.631,4.3561,5.9446
1961,0.49098,2.279,6.0603,,0.45434,,0.047632,51.1384,2.5726,0.64555,...,4.1183,49.363,,51.8859,7.9802,,,2.6637,3.7096,5.0623
1962,0.68883,2.4622,5.6645,,1.1798,,0.10259,53.652,2.7,0.70894,...,4.0084,51.8088,0.040304,54.0623,9.3395,,,3.8838,3.5833,4.8899
1963,0.70715,2.0812,5.4227,,1.1505,,0.084272,50.0429,2.8956,0.67909,...,4.3162,55.5615,0.032976,56.1581,9.1124,,,2.9165,3.445,4.7013
1964,0.83906,2.0152,5.6462,,1.2238,,0.0916,55.6818,3.0795,0.66028,...,4.5544,59.0898,0.062288,56.5575,11.7908,,,3.631,3.2756,4.4701


##Next we would like to know which contribute the most to the CO_2 emissions. I will look at the data for 2014 
and compute which countries contributed for 95% of the CO_2 emissions. First, countries with CO_2 emission, not given for 2014 are removed.

In [3]:
mask = np.array(pd.notnull(co2_data.loc[[2014],:]).values)[0,:] 
new_columns = np.array(co2_data.columns.tolist())[mask].tolist()
co2_data_clean = co2_data.loc[:,new_columns]
co2_data_clean.head()

Unnamed: 0_level_0,Afghanistan,Albania,Algeria,Andorra,Angola,Anguilla,Antigua and Barbuda,Argentina,Armenia,Aruba,...,United States of America,Uruguay,Uzbekistan,Vanuatu,Venezuela,Vietnam,Wallis and Futuna Islands,Yemen,Zambia,Zimbabwe
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1960,0.41403,2.0225,6.1555,,0.5496,,0.03664,48.7752,2.4946,0.61856,...,2888.3312,4.3162,47.8664,,57.0228,7.4856,,3.631,4.3561,5.9446
1961,0.49098,2.279,6.0603,,0.45434,,0.047632,51.1384,2.5726,0.64555,...,2878.1489,4.1183,49.363,,51.8859,7.9802,,2.6637,3.7096,5.0623
1962,0.68883,2.4622,5.6645,,1.1798,,0.10259,53.652,2.7,0.70894,...,2984.764,4.0084,51.8088,0.040304,54.0623,9.3395,,3.8838,3.5833,4.8899
1963,0.70715,2.0812,5.4227,,1.1505,,0.084272,50.0429,2.8956,0.67909,...,3116.679,4.3162,55.5615,0.032976,56.1581,9.1124,,2.9165,3.445,4.7013
1964,0.83906,2.0152,5.6462,,1.2238,,0.0916,55.6818,3.0795,0.66028,...,3253.3316,4.5544,59.0898,0.062288,56.5575,11.7908,,3.631,3.2756,4.4701


##Next, we want to create a table to show what percentage of the global $CO_2$ emissions belongs to the countries with higher emissions. First we sort the $CO_2$ emissions for 2014. The table has two columns, one shows the number of countries considered and the second what percentage of the global $CO_2$ emissions they contribute.  

In [4]:
sorted_2014 = co2_data_clean.loc[[2014],:].sort_values(axis=1,by=[2014], ascending=False)
world_emission_2014 = sorted_2014.iloc[0,:].values.sum()
def percentage_emissions(n):
    '''Takes the number of highest n contributos and returns what percentage of the global CO_2 emissions the contribute'''
    big_n = sorted_2014.iloc[0,0:n].values.sum()
    return (big_n/world_emission_2014)*100

In [5]:
number_highes_emission = np.arange(1,sorted_2014.shape[1]+1)

percentege_emissions_2014=pd.DataFrame({
       'number of countries': number_highes_emission
   }
)

percentege_emissions_2014['percentage_co2'] = percentege_emissions_2014['number of countries'].apply(percentage_emissions)
percentege_emissions_2014.head()

Unnamed: 0,number of countries,percentage_co2
0,1,29.572802
1,2,45.577778
2,3,52.009526
3,4,56.810887
4,5,60.453317


##Below is a plot how the percentage of CO_2 emiisions as a functionof the number of the countries with highest emission. For example, from the plot we can see that the biggest 40 contributors make up 92% of the total CO_2 emission. The biggest 80 contributors make up 98% of the world's CO_2 emission.

In [6]:
trace0 = go.Scatter(
    x = percentege_emissions_2014['number of countries'],
    y = percentege_emissions_2014['percentage_co2'],
    mode = 'markers',
    name = 'markers'
)

data = [trace0]

layout= go.Layout(
    title= 'Percentage of the total world emission for the biggest CO_2 contributors',
    xaxis= dict(
        title= 'Number of countries considered',
        ticklen= 5,
        gridwidth= 2,
    ),
    yaxis=dict(
        title= 'Percentage of the total emission',
        ticklen= 5,
        gridwidth= 2,
    ),
    showlegend= False
)
fig= go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)

##The plot below shows the top ten contributing countries and percentage they each contribute. The hoovering feature shows the name of the country and the corresponding percentage.

In [7]:
def percentage_country(emission_country):
    return (emission_country/world_emission_2014)*100
    

data = sorted_2014.loc[[2014],:].values[10:]
df_index = np.arange(0,sorted_2014.loc[[2014],:].values.size).tolist()
CO_2_emissions_2014 = pd.DataFrame({'emissions': sorted_2014.loc[[2014],:].values[0,:],
                                    'countries':sorted_2014.columns.tolist()
                                   }, 
                                   index = df_index)

CO_2_emissions_2014['percentage_country'] = CO_2_emissions_2014['emissions'].apply(percentage_country)

top_ten_percentage = CO_2_emissions_2014['percentage_country'][0:10].values.tolist()
top_ten_percentage.append((sorted_2014.loc[[2014],:].values[0,10:].sum()/world_emission_2014)*100)
top_ten_countries = CO_2_emissions_2014['countries'][0:10].values.tolist()
top_ten_countries.append('Rest of the world')


fig = {
    'data': [
        {
            'values': top_ten_percentage,
            'labels': top_ten_countries,
            'text':'CO2',
            'textposition':'inside',
            'type': 'pie',
            'name': 'CO2 Emissions',
            'hoverinfo':'label+percent+name'
        }
    ],
    'layout':{
        'title':'Global CO2 Emissions 2014'
        
    }
}


plotly.offline.iplot(fig)

##Next we look at historical data for average temperature at different countries. The data is from http://www.globalcarbonatlas.org/en/CO2-emissions. The function, defined below will create a table, containing temperatures  for a given country. These are monthly temperatures are for the period 1900 - 2016.


In [8]:
months = np.arange(1,13)
years = np.arange(1905,2016,5)

def create_country_table(country_name, years):
    country_table_yearly=pd.DataFrame({
       'Month':months
    })
    
    country_table = pd.read_csv(country_name+'_temp.csv')
    country_table.rename(columns={'\tYear': 'Year', ' Month': 'Month'}, inplace=True)
    for i in years:
        monthly_temps = country_table.loc[(country_table['Year']==i), ['tas','Month']].reset_index(drop=True)
        monthly_temps.rename(columns={'tas': i}, inplace=True)
        country_table_yearly = pd.merge(country_table_yearly, monthly_temps, on=['Month'])
    return (country_table_yearly)

We dicide to look at four different countries. Two of them with very big $CO_2$ emmision, China and USA and the other two with very low emission, Central African Republic and Andorra. Below we create a list of table for the countries we are interested in and make a dictionary out of that. The dictionary contains the countries as key and tables as data.

In [9]:
countries = ['china','usa','caf','andorra']
list_tables = [create_country_table(country,years) for country in countries ]
countries_dict= dict(zip(countries,list_tables))

For example, the data for China looks a follows:

In [10]:
countries_dict['china'].head(5)

Unnamed: 0,Month,1905,1910,1915,1920,1925,1930,1935,1940,1945,...,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
0,1,-7.4643,-9.0046,-9.5875,-7.7541,-8.793,-10.57,-8.9896,-8.2708,-10.416,...,-9.4959,-7.8528,-8.7824,-8.9087,-7.91,-8.235,-8.7686,-7.919,-7.267,-6.3235
1,2,-6.9613,-6.4983,-6.8353,-7.9435,-7.3851,-5.4137,-3.8035,-5.3263,-9.2913,...,-5.3782,-5.4021,-6.0804,-5.6334,-4.7373,-4.4737,-6.1966,-6.6477,-4.8713,-3.2716
2,3,-1.3929,-0.9292,-0.7152,0.02547,-0.3312,0.67571,1.74691,0.71292,-0.4578,...,-2.2786,1.43131,-0.1602,-0.8486,1.97607,1.00944,1.32536,1.38762,0.72168,2.58015
3,4,5.83789,6.68162,6.48666,7.24842,6.95375,6.99702,6.8066,7.07243,8.94731,...,7.06017,7.88079,6.92622,8.28341,7.05575,7.18657,8.30946,8.95801,6.56534,8.70064
4,5,12.3659,12.8168,12.624,12.9627,12.3582,13.2169,12.9796,12.9391,13.4058,...,13.2448,12.2695,13.2314,13.306,13.1484,13.4273,14.593,13.3799,13.5868,13.914


##Below we plot to plot the average monthly temperatures for several year. 

In [11]:
  countries[0]

'china'

In [12]:
Years_to_plot = [1910, 1945, 1990, 2015]
trace_dict={}
#colors = ['rgba(67,670,290,15)', 'rgba(200,110,115,1)', 'rgb(205, 12, 24)', 'rgb(22, 96, 167)']
colors = ['yellow', 'green', 'red', 'blue']


for country in countries:
    table=countries_dict[country]    
    x_data = table['Month'].values.tolist()
    
    
    trace = []
    for index,year in enumerate(Years_to_plot):
        trace.append(go.Scatter(
            x=x_data,
            y=table[year].values.tolist(),
            name = year,
            mode = 'lines',
           line=dict(color=colors[index])
        )
    )
    trace_dict[country] = trace 

    layout = go.Layout(
       xaxis=dict(
           showline=True
       )
    )  
fig = tools.make_subplots(rows=2, cols=2, subplot_titles = ('China', 'United States of America', 'Central African Republic', 'Andorra' ) )

for i in np.arange(len(trace_dict['china'])):
    fig.append_trace(trace_dict['china'][i], 1, 1)
for i in np.arange(len(trace_dict['usa'])):
    fig.append_trace(trace_dict['usa'][i], 1, 2)   
for i in np.arange(len(trace_dict['caf'])):
    fig.append_trace(trace_dict['caf'][i], 2, 1)   
for i in np.arange(len(trace_dict['caf'])):
    fig.append_trace(trace_dict['andorra'][i], 2, 2)   

fig['layout'].update(showlegend=False, title='Monthly temperatures')

plotly.offline.iplot(fig, filename='make-subplots-multiple-with-titles')


This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]



##There is no legend as the plot get too busy, but the year appears when the hoovering feature is used. It can be seen from these plot that the average temperature increases slightly with time as the blue curve is from 2015 and it is slightly higher on average than the other (red: 1990, green:1945, yellow:1910).

Now we would like to look at the yearly temperature change for a given country for certain months. We are interested to see if there is a significant difference between the countries with high and low levels of $CO_2$ emissions.

In [13]:
years = np.arange(1905,2015,5)
months_to_plot=['1','8']
months_names = {
    '1': 'January',
    '8':  'August'
    
}

In [14]:
line_country={}
x_data = years
data_trace = {}
for index, country in enumerate(countries):
    
    month = int(months_to_plot[0])
    country_table=countries_dict[country]     
    trace_0 =[]
    line_month = {}
    corr_month = country_table[country_table['Month']==month]
    if index ==0:
        groupindex = 'group1',
        showlegend_group = 'True'
    else:
        groupindex = 'group2',
        showlegend_group = 'False'
    
    trace_0=go.Scatter(
        x = x_data,
        y = corr_month.loc[:,(corr_month.columns.isin(years))].values.tolist()[0],
        name=months_names[str(month)],
        legendgroup = groupindex[0],
        showlegend = showlegend_group,
        line=dict(color='red')
    )
    slope, intercept, r_value, p_value, std_err = stats.linregress(x_data, trace_0['y'])
    line_month[ month ]=[slope,intercept]
    
    month = int(months_to_plot[1])
    trace_1 =[]
    corr_month = country_table[country_table['Month']==month]
    trace_1 = go.Scatter(
         x = x_data,
         y = corr_month.loc[:,(corr_month.columns.isin(years))].values.tolist()[0],
         name=months_names[str(month)],
         legendgroup=groupindex[0],
         showlegend = showlegend_group,
         line=dict(color='blue')
        )
    slope, intercept, r_value, p_value, std_err = stats.linregress(x_data, trace_1['y'])
    line_month[ month ]=[slope,intercept]
    line_country[country]=line_month 
    data_trace[country] = [trace_0, trace_1]

#print(line_country)
fig = tools.make_subplots(rows=2, cols=2, subplot_titles=(countries[0][0].upper()+countries[0][1:], countries[1][0].upper()+countries[1][1:],
                                                          countries[2][0].upper()+countries[2][1:], countries[3][0].upper()+countries[3][1:]))

fig.append_trace(data_trace[countries[0]][0], 1, 1)
fig.append_trace(data_trace[countries[0]][1], 1, 1)
fig.append_trace(data_trace[countries[1]][0], 1, 2)
fig.append_trace(data_trace[countries[1]][1], 1, 2)
fig.append_trace(data_trace[countries[2]][0], 2, 1)
fig.append_trace(data_trace[countries[2]][1], 2, 1)
fig.append_trace(data_trace[countries[3]][0], 2, 2)
fig.append_trace(data_trace[countries[3]][1], 2, 2)
layout=go.Layout(title="First Plot", xaxis={'title':'x1'}, yaxis={'title':'x2'})
fig.layout.xaxis3.update({'title':'Year'})
fig.layout.yaxis3.update({'title':'Temperature'})

plotly.offline.iplot(fig, filename='Temperature vs. time')


This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]



Above we also fit the daa for different years to a straight line. The slope and intercept for the lines for different countries are as follows:

In [15]:
for country in countries:
    month = 1
    print (country,',', 'January',':', '   y =', line_country[country][int(months_to_plot[0])][0],'*x + ', line_country[country][int(months_to_plot[0])][1] )
    month = 8
    print (country,',','August',':', '   y =', line_country[country][int(months_to_plot[1])][0],'*x + ', line_country[country][int(months_to_plot[1])][1] )


china , January :    y = 0.00787095426313 *x +  -24.078652061
china , August :    y = 0.00523258046302 *x +  8.36021919819
usa , January :    y = 0.0189559457933 *x +  -43.0670957086
usa , August :    y = 0.0116138339921 *x +  -3.7569027668
caf , January :    y = 0.00678528514963 *x +  11.2235815923
caf , August :    y = 0.00789151891587 *x +  8.36049717674
andorra , January :    y = 0.0135011778656 *x +  -23.693341581
andorra , August :    y = 0.0194368944099 *x +  -18.6066889893


All the countries have positive slope of the straight line, showing that the temperature is increasing. USA has the highest rate of increase of the temperature : the slope is 0.0189559457933 for the January yearly temperatures and 0.0116138339921 for the August temperatures. On the other hand, the country with highest emission, China, has the lowest rate of increase, 0.00787095426313 for January and 0.00523258046302 for August. The presented data does not show direct relation between the $CO_2$ emission levels and the rate of temperature increase. Of course, much more detailed analysis can be done on this topic, if more countries are compared. 