EPA112A Programming for Data Science (2024/25 Q1)
Group Project
### Relation of CO2 emissions to the GDP
Group 1:
Helena Angerla, 5300401;
Derk de Brauw, 4726731;
name, number

# Introduction
''' when we present the problem we could use the UN's sustainable development goals and the point 8 (Decent work and wconomic growth) (link: https://www.un.org/sustainabledevelopment/economic-growth/)'''

In developing countries, rapid economic growth is often associated with increased CO2 emissions, raising concerns about environmental sustainability. This study aims to analyze the relationship between economic growth, measured by gross domestic product (GDP), and CO2 emissions as an indicator of environmental sustainability. The hypothesis is that at lower levels of GDP, economic growth contributes to higher emissions, but after reaching a certain threshold, further growth could lead to a reduction in emissions—a concept known as the Environmental Kuznets Curve (EKC). The research seeks to identify this tipping point, using global datasets such as World Development Indicators and CO2 emissions databases.

# Datasets
presenting the data we use.

Present the data that you are going to use. Your data must come from at least two sources (one additional source aside from the World Bank data).

GDP: 
The data used in this report is drawed from UNECE: annual growth rate of real GDP per capita. The data coved 56 countries over the year 200 untill the year 2023. (source: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT__92-SDG__01-sdgover/008_en_sdGoal8_r.px/)

CO2:
Other data that will be used is from European Commission the emissions database for global atmospheric reasearch (EDGAR). where the CO2 emissions of all world countries can be found from year 1970 to 2022. (source: https://edgar.jrc.ec.europa.eu/report_2022#intro)

Third source Kaggle, a AI and ML community. where we could find the CO2 emissions for countries and the data is self updating. (Source: https://www.kaggle.com/datasets/ulrikthygepedersen/co2-emissions-by-country)

# Preprocessing
Explain how you clean, format and do any pre-processing work that you find useful, making the data useful for your goals. Implement the steps that you describe in python. Basically, how we will manage the data.


- Process data with Pandas dataframe --> we get the headers for easier navigation.
- get from files (1) CO2 per country (2) GDP per country (3) CO2 per GDP 
- from the other file get the GDP for country
- combine the two csv to one data frame

In [1]:
# Importing libraries
import numpy as np
import pandas as pd

In [2]:
# Importing the data files
GDP = pd.read_csv('c0001047_20241018-094743.csv', delimiter=',', header=1) # this files gives a growth rate how the GDP has changed in comparisson to last year or so and not the actual value of the GDP.
CO2 = pd.read_excel('EDGARv7.0_FT2021_fossil_CO2_booklet_2022.xlsx', sheet_name='fossil_CO2_totals_by_country')
CO2_GDP = pd.read_excel('EDGARv7.0_FT2021_fossil_CO2_booklet_2022.xlsx', sheet_name='fossil_CO2_per_GDP_by_country')

In [7]:
# Cleaning the data
def drop_NaN(df): # doesn't work at the moment as wished
    '''Function that removes rows with no value.'''
    df.dropna(how='all')
    return df

drop_NaN(GDP)
drop_NaN(CO2)
drop_NaN(CO2_GDP)

# Cut the data into a specific years that can be found in both (2000 - 2021)

Unnamed: 0,Substance,EDGAR Country Code,Country,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,CO2,ABW,Aruba,0.259123,0.283727,0.311033,0.278271,0.281836,0.295495,0.195168,...,0.468715,0.432303,0.439750,0.422637,0.406352,0.321007,0.324734,0.341388,0.362543,0.345605
1,CO2,AFG,Afghanistan,0.095005,0.101843,0.061179,0.078183,0.096176,0.056922,0.058905,...,0.158300,0.124224,0.114260,0.117480,0.104910,0.109022,0.120462,0.102889,0.106683,0.108820
2,CO2,AGO,Angola,0.165580,0.171977,0.187962,0.247615,0.231432,0.226841,0.241661,...,0.119849,0.131361,0.136966,0.145447,0.140584,0.119869,0.111793,0.118561,0.114215,0.120280
3,CO2,AIA,Anguilla,0.024681,0.039792,0.031377,0.038098,0.048945,0.060947,0.062495,...,0.073785,0.092105,0.096760,0.112214,0.125609,0.148555,0.187722,0.122208,0.107731,0.120166
4,CO2,AIR,International Aviation,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
208,CO2,ZMB,Zambia,0.171360,0.178279,0.180054,0.150191,0.140581,0.135330,0.107216,...,0.090119,0.089875,0.094024,0.097191,0.101910,0.123707,0.135315,0.114296,0.109831,0.112333
209,CO2,ZWE,Zimbabwe,0.395968,0.413123,0.456798,0.414472,0.357302,0.349903,0.305460,...,0.258495,0.259519,0.251318,0.251387,0.219208,0.195057,0.224744,0.227378,0.229596,0.233016
210,,,,,,,,,,,...,,,,,,,,,,
211,CO2,EU27,EU27,0.315017,0.308199,0.294692,0.290933,0.282437,0.279206,0.280934,...,0.187363,0.182810,0.171779,0.170988,0.167944,0.164366,0.157469,0.147167,0.139526,0.141027


# Exploration
Exploratory data analysis – get to know your data.

- top 10 countries - highest and lowest
- average and mean
- how many countries are above the cutoff value
- shape and look of the files

In [6]:
CO2_GDP

Unnamed: 0,Substance,EDGAR Country Code,Country,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,CO2,ABW,Aruba,0.259123,0.283727,0.311033,0.278271,0.281836,0.295495,0.195168,...,0.468715,0.432303,0.439750,0.422637,0.406352,0.321007,0.324734,0.341388,0.362543,0.345605
1,CO2,AFG,Afghanistan,0.095005,0.101843,0.061179,0.078183,0.096176,0.056922,0.058905,...,0.158300,0.124224,0.114260,0.117480,0.104910,0.109022,0.120462,0.102889,0.106683,0.108820
2,CO2,AGO,Angola,0.165580,0.171977,0.187962,0.247615,0.231432,0.226841,0.241661,...,0.119849,0.131361,0.136966,0.145447,0.140584,0.119869,0.111793,0.118561,0.114215,0.120280
3,CO2,AIA,Anguilla,0.024681,0.039792,0.031377,0.038098,0.048945,0.060947,0.062495,...,0.073785,0.092105,0.096760,0.112214,0.125609,0.148555,0.187722,0.122208,0.107731,0.120166
4,CO2,AIR,International Aviation,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
208,CO2,ZMB,Zambia,0.171360,0.178279,0.180054,0.150191,0.140581,0.135330,0.107216,...,0.090119,0.089875,0.094024,0.097191,0.101910,0.123707,0.135315,0.114296,0.109831,0.112333
209,CO2,ZWE,Zimbabwe,0.395968,0.413123,0.456798,0.414472,0.357302,0.349903,0.305460,...,0.258495,0.259519,0.251318,0.251387,0.219208,0.195057,0.224744,0.227378,0.229596,0.233016
210,,,,,,,,,,,...,,,,,,,,,,
211,CO2,EU27,EU27,0.315017,0.308199,0.294692,0.290933,0.282437,0.279206,0.280934,...,0.187363,0.182810,0.171779,0.170988,0.167944,0.164366,0.157469,0.147167,0.139526,0.141027


In [5]:
GDP

Unnamed: 0,Indicator,Country,2000,2001,2002,2003,2004,2005,2006,2007,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,8.1.1 - Annual growth rate of real GDP per capita,Albania,7.8,9.3,5.5,6.6,6.6,6.6,6.9,7.0,...,1.9,2.3,3.4,3.9,4.1,2.2,-3.2,9.6,5.3,..
1,8.1.1 - Annual growth rate of real GDP per capita,Andorra,0.5,5.4,0.1,4.2,3.9,1.6,4.3,4.2,...,2.1,1.3,2.6,-1.4,0.0,0.2,-12.7,6.5,8.5,..
2,8.1.1 - Annual growth rate of real GDP per capita,Armenia,7.1,10.8,16.1,14.8,11.1,14.6,14.0,14.6,...,4.1,3.6,0.6,8.1,5.8,8.2,-6.7,6.4,13.1,..
3,8.1.1 - Annual growth rate of real GDP per capita,Austria,3.1,0.9,1.2,0.4,2.1,1.6,2.9,3.4,...,-0.1,-0.1,0.9,1.5,1.9,1.0,-6.9,4.1,4.6,..
4,8.1.1 - Annual growth rate of real GDP per capita,Azerbaijan,9.9,8.7,9.4,10.0,8.9,25.0,32.8,23.4,...,1.4,-0.1,-4.2,-0.9,0.7,1.7,-4.7,5.3,4.1,..
5,8.1.1 - Annual growth rate of real GDP per capita,Belarus,6.3,5.3,5.7,7.7,12.2,10.2,10.7,9.1,...,1.7,-3.9,-2.5,2.5,3.3,1.7,-0.3,3.0,-4.3,..
6,8.1.1 - Annual growth rate of real GDP per capita,Belgium,3.3,0.7,1.3,0.6,3.0,1.7,1.9,3.0,...,0.9,1.4,0.7,1.0,1.2,1.7,-5.7,6.4,2.6,..
7,8.1.1 - Annual growth rate of real GDP per capita,Bosnia and Herzegovina,4.8,2.0,4.9,4.2,7.4,5.1,6.4,7.2,...,2.5,5.7,4.5,4.5,5.0,4.1,-1.8,8.9,5.3,..
8,8.1.1 - Annual growth rate of real GDP per capita,Bulgaria,5.4,4.6,6.7,6.0,7.2,7.7,7.5,7.3,...,1.8,4.3,3.9,3.7,3.6,5.0,-3.0,9.1,5.5,..
9,8.1.1 - Annual growth rate of real GDP per capita,Canada,4.2,0.7,2.0,0.8,2.1,2.2,1.6,1.0,...,1.9,-0.3,-0.1,1.8,1.4,0.6,-6.0,4.3,2.6,..


# Visualization
Use multiple types of visualization on the data that make sense for your goal.

3 charts
(1) x-axis: year, y-axis: CO2, y-axis2: GDP - linear
(2) x-axis: year, y-axis: CO2 per GDP - linear 
(3) x-axis: GDP, y-axis:CO2 (window average of 2 or 5 years) - scatter plot
expected to see a linear growth or a hill like figure
search for the cutoff value when the economy is good and the tipping point when the CO2 values are decreasing

# Machine Learning
Use at least one machine learning technique to make meaningful predictions over (part of) the data.

# Discussion
Interpret your results in relation to your research question. Were there any surprises in your research? How reliable do you think your results are? Are there any
limitations to your analysis?

# Conclusion
What did you learn?

# Sources

Crippa, M., Guizzardi, D., Banja, M., Solazzo, E., Muntean, M., Schaaf, E., Pagani, F., Monforti-Ferrario, F., Olivier, J., Quadrelli, R., Risquez Martin, A., Taghavi-Moharamli, P., Grassi, G., Rossi, S., Jacome Felix Oom, D., Branco, A., San-Miguel-Ayanz, J. and Vignati, E., CO2 emissions of all world countries - JRC/IEA/PBL 2022 Report, EUR 31182 EN, Publications Office of the European Union, Luxembourg, 2022, doi:10.2760/730164, JRC130363.