Insights on CO2 emissions /
An analysis of emissions by country over time(1990-2021) and their correlations with the environment and human development

First part of this project is to analyse the nature of the emissions (in metric tons of CO2 emissions per capita):
- how did they increased/decreased over time
                 &
- which sectors of the economy are responsible for the emissions

Second part of this project is to analyse the possible correlations between the emissions and:

- human development: human development index, gross national income per capita and life expectancy

- environment: country/city temperature and forest area

In [1]:
import pandas as pd
import numpy as np

In [2]:
co2_production_country = pd.read_csv('Data/co2_production.csv')

co2_production_country.head()

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021
0,AFG,Afghanistan,Low,SA,180.0,0.209727,0.182525,0.095233,0.084285,0.075054,...,0.327922,0.261571,0.232967,0.22968,0.190617,0.188995,0.224492,0.319299,0.312376,0.312376
1,AGO,Angola,Medium,SSA,148.0,0.429586,0.413433,0.408015,0.439647,0.28618,...,1.346212,1.277248,1.235861,1.205736,1.088803,0.953168,0.791171,0.737992,0.67541,0.67541
2,ALB,Albania,High,ECA,67.0,1.656902,1.288961,0.768727,0.724712,0.607846,...,1.601835,1.697127,1.940611,1.555329,1.556278,1.838242,1.642153,1.688178,1.575754,1.575754
3,AND,Andorra,Very High,,40.0,7.461153,7.17651,6.906331,6.730577,6.488824,...,5.912019,5.896947,5.828084,5.964928,6.067376,6.043168,6.423396,6.505535,6.034945,6.034945
4,ARE,United Arab Emirates,Very High,AS,26.0,28.277672,29.256027,28.134519,30.170919,31.644558,...,22.047365,22.330116,21.914832,23.381781,22.932086,17.795688,16.01124,15.780701,15.193336,15.193336


In [3]:
co2_production_country.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 37 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ISO3           195 non-null    object 
 1   Country        195 non-null    object 
 2   hdicode        191 non-null    object 
 3   region         151 non-null    object 
 4   hdi_rank_2021  191 non-null    float64
 5   co2_prod_1990  187 non-null    float64
 6   co2_prod_1991  188 non-null    float64
 7   co2_prod_1992  191 non-null    float64
 8   co2_prod_1993  191 non-null    float64
 9   co2_prod_1994  192 non-null    float64
 10  co2_prod_1995  192 non-null    float64
 11  co2_prod_1996  192 non-null    float64
 12  co2_prod_1997  192 non-null    float64
 13  co2_prod_1998  192 non-null    float64
 14  co2_prod_1999  192 non-null    float64
 15  co2_prod_2000  192 non-null    float64
 16  co2_prod_2001  192 non-null    float64
 17  co2_prod_2002  193 non-null    float64
 18  co2_prod_2

In [4]:
co2_production_country.duplicated().any()

False

In [5]:
co2_production_country.isna().sum()

ISO3              0
Country           0
hdicode           4
region           44
hdi_rank_2021     4
co2_prod_1990     8
co2_prod_1991     7
co2_prod_1992     4
co2_prod_1993     4
co2_prod_1994     3
co2_prod_1995     3
co2_prod_1996     3
co2_prod_1997     3
co2_prod_1998     3
co2_prod_1999     3
co2_prod_2000     3
co2_prod_2001     3
co2_prod_2002     2
co2_prod_2003     2
co2_prod_2004     2
co2_prod_2005     2
co2_prod_2006     2
co2_prod_2007     2
co2_prod_2008     2
co2_prod_2009     2
co2_prod_2010     2
co2_prod_2011     2
co2_prod_2012     2
co2_prod_2013     2
co2_prod_2014     2
co2_prod_2015     2
co2_prod_2016     2
co2_prod_2017     2
co2_prod_2018     2
co2_prod_2019     2
co2_prod_2020     2
co2_prod_2021     2
dtype: int64

In [6]:
co2_production_country[co2_production_country.isna().any(axis=1)].head(35)

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021
3,AND,Andorra,Very High,,40.0,7.461153,7.17651,6.906331,6.730577,6.488824,...,5.912019,5.896947,5.828084,5.964928,6.067376,6.043168,6.423396,6.505535,6.034945,6.034945
8,AUS,Australia,Very High,,5.0,16.471401,16.398136,16.453819,16.549526,16.860062,...,17.580628,16.906991,16.920093,16.8507,17.143393,16.870352,16.708116,16.446992,15.36838,15.36838
9,AUT,Austria,Very High,,25.0,8.04512,8.457575,7.692123,7.686282,7.679979,...,7.911257,7.91925,7.447395,7.645374,7.684142,7.89112,7.486503,7.589238,6.732425,6.732425
12,BEL,Belgium,Very High,,13.0,12.022922,12.289365,12.155554,12.001454,12.289169,...,9.236138,9.203193,8.632538,8.952519,8.795879,8.708432,8.728936,8.643987,7.226206,7.226206
16,BGR,Bulgaria,High,,68.0,8.674893,7.016917,6.611654,6.812691,6.655464,...,6.591312,5.85458,6.240562,6.705186,6.362234,6.699426,6.188305,6.031683,5.388846,5.388846
29,CAN,Canada,Very High,,15.0,16.629828,16.124718,16.4227,16.253112,16.57155,...,16.300962,16.253686,15.99815,15.906585,15.379926,15.500352,15.629865,15.567316,14.196937,14.196937
30,CHE,Switzerland,Very High,,1.0,6.636657,6.866108,6.773066,6.347552,6.150655,...,5.276541,5.325731,4.780793,4.668199,4.676956,4.515537,4.324335,4.276608,3.731913,3.731913
42,CYP,Cyprus,Very High,,29.0,6.067844,6.564988,6.889613,7.034294,7.168234,...,6.397714,5.753733,6.031234,6.003867,6.300702,6.377249,6.172132,6.123855,5.380504,5.380504
43,CZE,Czechia,Very High,,32.0,15.878995,14.390672,13.968561,13.383381,12.775871,...,10.517562,10.062388,9.828518,9.891583,10.046788,10.113002,9.950761,9.431222,8.215039,8.215039
44,DEU,Germany,Very High,,9.0,13.313394,12.759012,12.079229,11.887968,11.626629,...,10.052591,10.242812,9.730928,9.727783,9.741451,9.507598,9.072083,8.518355,7.690142,7.690142


Now I'm trying to find a way to fill in the NaN regions with the correct values. Found two files online that can help.

In [7]:
country_regions = pd.read_csv('Data/CLASS.csv')
regions_codes = pd.read_csv('Data/data-verbose.csv')

In [8]:
country_regions.head()

Unnamed: 0,Economy,Code,Region,Income group,Lending category
0,Aruba,ABW,Latin America & Caribbean,High income,
1,Afghanistan,AFG,South Asia,Low income,IDA
2,Angola,AGO,Sub-Saharan Africa,Lower middle income,IBRD
3,Albania,ALB,Europe & Central Asia,Upper middle income,IBRD
4,Andorra,AND,Europe & Central Asia,High income,


In [9]:
regions_codes.head()

Unnamed: 0,WORLDBANKREGION,Code
0,East Asia & Pacific,EAP
1,Europe & Central Asia,ECA
2,High income countries,HIC
3,Latin America & Caribbean,LAC
4,Middle East & North Africa,MENA


In [10]:
region_to_code = regions_codes.set_index('WORLDBANKREGION')['Code'].to_dict()

country_regions['Region code'] = country_regions['Region'].map(region_to_code)

country_regions.head()

Unnamed: 0,Economy,Code,Region,Income group,Lending category,Region code
0,Aruba,ABW,Latin America & Caribbean,High income,,LAC
1,Afghanistan,AFG,South Asia,Low income,IDA,SA
2,Angola,AGO,Sub-Saharan Africa,Lower middle income,IBRD,SSA
3,Albania,ALB,Europe & Central Asia,Upper middle income,IBRD,ECA
4,Andorra,AND,Europe & Central Asia,High income,,ECA


In [11]:
countrycode_to_regioncode = dict(zip(country_regions['Code'], country_regions['Region code']))

co2_production_country['region'] = co2_production_country['ISO3'].map(countrycode_to_regioncode).combine_first(co2_production_country['region'])

co2_production_country['region'].isna().sum()


0

In [12]:
 co2_production_country[co2_production_country['region'].isna()]

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021


In [13]:
nan_counts = co2_production_country.isna().sum(axis=1)

co2_production_country[nan_counts >= 5]

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021
108,MCO,Monaco,,ECA,,,,,,,...,,,,,,,,,,
157,SMR,San Marino,Very High,ECA,44.0,,,,,,...,,,,,,,,,,
174,TLS,Timor-Leste,Medium,EAP,140.0,,,,,,...,0.258713,0.470192,0.517929,0.502298,0.405679,0.400802,0.401664,0.405972,0.398727,0.398727


In [14]:
countries_to_drop = ['Monaco', 'San Marino']

co2_production_country = co2_production_country[~co2_production_country['Country'].isin(countries_to_drop)]

co2_production_country[nan_counts >= 5]

  co2_production_country[nan_counts >= 5]


Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021
174,TLS,Timor-Leste,Medium,EAP,140.0,,,,,,...,0.258713,0.470192,0.517929,0.502298,0.405679,0.400802,0.401664,0.405972,0.398727,0.398727


In [15]:
co2_production_country[nan_counts >= 2]

  co2_production_country[nan_counts >= 2]


Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_1990,co2_prod_1991,co2_prod_1992,co2_prod_1993,co2_prod_1994,...,co2_prod_2012,co2_prod_2013,co2_prod_2014,co2_prod_2015,co2_prod_2016,co2_prod_2017,co2_prod_2018,co2_prod_2019,co2_prod_2020,co2_prod_2021
52,ERI,Eritrea,Low,SSA,176.0,,,,,0.335296,...,0.186013,0.189188,0.190672,0.189789,0.191147,0.20635,0.233047,0.238728,0.203679,0.203679
59,FSM,Micronesia (Federated States of),Medium,EAP,134.0,,,1.011845,0.987249,1.002169,...,1.192085,1.279856,1.26175,1.312273,1.296557,1.282086,1.268643,1.294118,1.282352,1.282352
113,MHL,Marshall Islands,Medium,EAP,131.0,,,1.562473,1.691156,1.678224,...,2.390289,2.445291,2.499169,2.487857,2.475102,2.524372,2.509031,2.569611,2.555837,2.555837
132,NRU,Nauru,,EAP,,13.106786,12.759115,12.054835,11.04112,10.470566,...,4.336095,4.665687,4.985519,5.293268,4.896993,5.198146,5.15089,5.266921,5.241223,5.241223
139,PLW,Palau,High,EAP,80.0,,,12.478557,12.146102,11.835856,...,12.670068,12.902988,12.472484,11.615058,11.989281,12.139488,11.867426,12.164038,12.123356,12.123356
142,PRK,Korea (Democratic People's Rep. of),,EAP,,6.031635,5.688114,4.917658,4.461095,4.018907,...,1.514682,1.0836,1.219002,0.977667,1.162193,0.856034,0.691745,1.144671,1.137023,1.137023
158,SOM,Somalia,,SSA,,0.101161,0.097276,0.093311,0.087539,0.085373,...,0.047833,0.048241,0.046948,0.045677,0.044943,0.0437,0.042479,0.043075,0.03537,0.03537
174,TLS,Timor-Leste,Medium,EAP,140.0,,,,,,...,0.258713,0.470192,0.517929,0.502298,0.405679,0.400802,0.401664,0.405972,0.398727,0.398727


In [16]:
id_vars = ['ISO3', 'Country', 'hdicode', 'region', 'hdi_rank_2021']

melted_co2_production_country = pd.melt(co2_production_country, id_vars=id_vars, var_name='co2_prod_year', value_name='co2_prod').sort_values('Country')

melted_co2_production_country


Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_year,co2_prod
0,AFG,Afghanistan,Low,SA,180.0,co2_prod_1990,0.209727
4053,AFG,Afghanistan,Low,SA,180.0,co2_prod_2011,0.401954
1930,AFG,Afghanistan,Low,SA,180.0,co2_prod_2000,0.036462
5597,AFG,Afghanistan,Low,SA,180.0,co2_prod_2019,0.319299
1158,AFG,Afghanistan,Low,SA,180.0,co2_prod_1996,0.061787
...,...,...,...,...,...,...,...
5210,ZWE,Zimbabwe,Medium,SSA,146.0,co2_prod_2016,0.765313
5403,ZWE,Zimbabwe,Medium,SSA,146.0,co2_prod_2017,0.673026
5596,ZWE,Zimbabwe,Medium,SSA,146.0,co2_prod_2018,0.821010
2894,ZWE,Zimbabwe,Medium,SSA,146.0,co2_prod_2004,0.784415


In [17]:
melted_co2_production_country['co2_prod_year'] = melted_co2_production_country['co2_prod_year'].str.replace('co2_prod_', '')


melted_co2_production_country['co2_prod_year'] = pd.to_numeric(melted_co2_production_country['co2_prod_year'], errors='coerce')

melted_co2_production_country

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_year,co2_prod
0,AFG,Afghanistan,Low,SA,180.0,1990,0.209727
4053,AFG,Afghanistan,Low,SA,180.0,2011,0.401954
1930,AFG,Afghanistan,Low,SA,180.0,2000,0.036462
5597,AFG,Afghanistan,Low,SA,180.0,2019,0.319299
1158,AFG,Afghanistan,Low,SA,180.0,1996,0.061787
...,...,...,...,...,...,...,...
5210,ZWE,Zimbabwe,Medium,SSA,146.0,2016,0.765313
5403,ZWE,Zimbabwe,Medium,SSA,146.0,2017,0.673026
5596,ZWE,Zimbabwe,Medium,SSA,146.0,2018,0.821010
2894,ZWE,Zimbabwe,Medium,SSA,146.0,2004,0.784415


In [22]:
melted_co2_production_country = melted_co2_production_country.sort_values(by=['Country', 'co2_prod_year']).reset_index(drop=True)

melted_co2_production_country.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6176 entries, 0 to 6175
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ISO3           6176 non-null   object 
 1   Country        6176 non-null   object 
 2   hdicode        6080 non-null   object 
 3   region         6176 non-null   object 
 4   hdi_rank_2021  6080 non-null   float64
 5   co2_prod_year  6176 non-null   int64  
 6   co2_prod       6153 non-null   float64
dtypes: float64(2), int64(1), object(4)
memory usage: 337.9+ KB


Now I want to replace the nan values of the CO2_prod column with the values by following the same trend of the other years(for the same country)

In [25]:
#melted_co2_production_country['co2_prod_year'] = melted_co2_production_country['co2_prod_year'].astype(float)

#g = melted_co2_production_country['co2_prod'].mask(melted_co2_production_country['co2_prod'].notnull(), melted_co2_production_country['co2_prod'].isnull().cumsum()).bfill()

#melted_co2_production_country['co2_prod'] = (melted_co2_production_country.groupby(['ISO3', g])['co2_prod'].apply(lambda x: x.interpolate(method="spline", order=1)))

In [26]:
#melted_co2_production_country['co2_prod'] = melted_co2_production_country.groupby('ISO3')['co2_prod'].fillna(method='ffill')

In [28]:
# Convert 'co2_prod_year' to float
melted_co2_production_country['co2_prod_year'] = melted_co2_production_country['co2_prod_year'].astype(float)

# Group by 'ISO3' and apply interpolation
melted_co2_production_country['co2_prod'] = melted_co2_production_country.groupby('ISO3')['co2_prod'].apply(
    lambda group: group.interpolate(method='spline', order=1, limit_direction='backward', limit_area=None)
)

# Fill remaining NaN values with the first non-NaN value in the group
melted_co2_production_country['co2_prod'] = melted_co2_production_country.groupby('ISO3')['co2_prod'].transform(lambda x: x.fillna(method='bfill').fillna(method='ffill'))

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  melted_co2_production_country['co2_prod'] = melted_co2_production_country.groupby('ISO3')['co2_prod'].apply(


In [30]:
melted_co2_production_country[melted_co2_production_country['ISO3'] == 'FSM']

Unnamed: 0,ISO3,Country,hdicode,region,hdi_rank_2021,co2_prod_year,co2_prod
3616,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1990.0,1.060279
3617,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1991.0,1.068577
3618,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1992.0,1.011845
3619,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1993.0,0.987249
3620,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1994.0,1.002169
3621,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1995.0,1.056273
3622,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1996.0,1.04855
3623,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1997.0,1.114583
3624,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1998.0,1.117385
3625,FSM,Micronesia (Federated States of),Medium,EAP,134.0,1999.0,1.155862
