- Name: Charlene Fasuyi
- Class: MSc Data Science with Python
- Topic: Python_Final_Project
- Date: 30/11/2024


Link: https://drive.google.com/file/d/1D26DlHJ4jt7Lgh1-KxDEo6lcUl5Uiwd0/view?usp=drive_link

# Background

The 1992 United Nations Earth Summit in Rio de Janeiro marked a pivotal moment in global climate change awareness, where 154 nations first formally acknowledged the urgent need for environmental action. By selecting the period from 1993-2023, this research captures the immediate aftermath and long-term trajectory of global greenhouse gas emissions (GHG) following this landmark international conference. The timeframe allows for a comprehensive analysis of how nations responded to the initial global climate dialogue, tracking emission patterns during a critical three-decade period of increasing environmental consciousness and technological transformation. This approach provides a unique longitudinal perspective on whether international commitments and growing climate awareness translated into meaningful changes in global emission behaviors.



---
## Hypotheses

1. Due to increased global climate change awareness, most nations will have reduced their GHG emissions from 1993-2023. (H1)

2. Industry-centric nations and emerging economic powers will demonstrate disproportionately high GHG emissions relative to their population size. (H2)

3. Highly industrialized regions with smaller populations have higher per capita emissions compared to more populous but less industrialized regions. (H3)





---


## Data Wrangling

### Data cleaning for World GHG Emission Trends.

In [None]:
#Packages used
import numpy as np
import pandas as pd
import plotly.express as px

In [None]:
#Imported total emissions per country data file.
total_ghg = pd.read_csv("https://ourworldindata.org/grapher/total-ghg-emissions.csv?v=1&csvType=filtered&useColumnShortNames=false&tab=table&time=1993..latest",
                        storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})

In [None]:
#Exploratory data analysis (EDA).
total_ghg.info()
total_ghg.duplicated().sum()
total_ghg.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6665 entries, 0 to 6664
Data columns (total 4 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   Entity                                              6665 non-null   object 
 1   Code                                                6107 non-null   object 
 2   Year                                                6665 non-null   int64  
 3   Annual greenhouse gas emissions in CO₂ equivalents  6665 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 208.4+ KB


Unnamed: 0,Entity,Code,Year,Annual greenhouse gas emissions in CO₂ equivalents
0,Afghanistan,AFG,1993,11803789.0
1,Afghanistan,AFG,1994,12282003.0
2,Afghanistan,AFG,1995,13318770.0
3,Afghanistan,AFG,1996,15723104.0
4,Afghanistan,AFG,1997,18583560.0


In [None]:
#Checking which rows have nulls and whether they would be useful.
total_ghg[total_ghg['Code'].isna()]

Unnamed: 0,Entity,Code,Year,Annual greenhouse gas emissions in CO₂ equivalents
31,Africa,,1993,3.123851e+09
32,Africa,,1994,3.395308e+09
33,Africa,,1995,3.460717e+09
34,Africa,,1996,3.541379e+09
35,Africa,,1997,3.588715e+09
...,...,...,...,...
6381,Upper-middle-income countries,,2019,2.420739e+10
6382,Upper-middle-income countries,,2020,2.370056e+10
6383,Upper-middle-income countries,,2021,2.464715e+10
6384,Upper-middle-income countries,,2022,2.470495e+10


In [None]:
#Shortened long column names and renamed column appropriately.
total_ghg = total_ghg.rename(columns = {'Annual greenhouse gas emissions in CO₂ equivalents' : 'AnnualGhgEmmiss(tonnes)',
                                        'Entity' : 'Country'})

### Continental GHG Emission (code)

In [None]:
#Created a pivot dataframe of continents to check continental overall trends of Ghg emissions between (1993-2023).
grouped_ghg = total_ghg[total_ghg['Code'].isna() &
                        total_ghg['Country'].isin(['Africa',
                                                  'Asia',
                                                  'Europe',
                                                  'Oceania',
                                                  'South America',
                                                  'North America'])].groupby(['Country', 'Year'])['AnnualGhgEmmiss(tonnes)'].sum().reset_index()

#Initial check of data before plotting continental trends.
pivoted_ghg = grouped_ghg.pivot(index='Country', columns='Year', values='AnnualGhgEmmiss(tonnes)')

### Global Emissions over time (code)

In [None]:
#Dropping null and World rows as they will affect map graph plotting.
total_ghg_countries = total_ghg[(total_ghg['Country'] != 'World') & (~total_ghg['Code'].isna())]

### US_China vs World Emission Comparison (code)

In [None]:
#Filtered out USA and China.
not_us_china = total_ghg_countries[~total_ghg_countries['Country'].isin(['United States', 'China'])].drop(columns=['Code'])

In [None]:
#Filtered out all countries but USA and China.
us_china = total_ghg_countries[total_ghg_countries['Country'].isin(['United States', 'China'])].drop(columns=['Code'])

In [None]:
#To aggregate all other countries by continent, imported per_capita table that contains continent info.
#Will be used for furthur analysis.
per_capita = pd.read_csv("https://ourworldindata.org/grapher/co2-per-capita-vs-renewable-electricity.csv?v=1&csvType=filtered&useColumnShortNames=false&tab=table&time=1993..latest",
                         storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})

In [None]:
#EDA
per_capita.info()
per_capita.duplicated().sum()
per_capita.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8141 entries, 0 to 8140
Data columns (total 6 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Entity                             8141 non-null   object 
 1   Code                               6964 non-null   object 
 2   Year                               8141 non-null   int64  
 3   Annual CO₂ emissions (per capita)  7158 non-null   float64
 4   Renewables - % electricity         6549 non-null   float64
 5   World regions according to OWID    271 non-null    object 
dtypes: float64(2), int64(1), object(3)
memory usage: 381.7+ KB


Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita),Renewables - % electricity,World regions according to OWID
0,ASEAN (Ember),,2000,,19.347086,
1,ASEAN (Ember),,2001,,19.06632,
2,ASEAN (Ember),,2002,,17.664303,
3,ASEAN (Ember),,2003,,16.672487,
4,ASEAN (Ember),,2004,,15.700016,


In [None]:
#Shortened and renamed columns.
per_capita = per_capita.rename(columns = {'Annual CO₂ emissions (per capita)' : 'PerCapGhgEmiss',
                                          'Entity' : 'Country',
                                          'World regions according to OWID' : 'Continent'})

In [None]:
#Left joined to match not_us_china countries with correct continents.
not_us_china = not_us_china.merge(per_capita,
                        on=['Country'],
                        how='left').drop(columns = ['Year_y',
                                                    'PerCapGhgEmiss',
                                                    'Renewables - % electricity',
                                                    'Code']).rename(columns = {'Year_x': 'Year'})

In [None]:
#Summed rest of world data and grouped by continent.
not_us_china_continents = not_us_china.groupby(['Continent', 'Year'])['AnnualGhgEmmiss(tonnes)'].sum().reset_index()

In [None]:
#Renamed columns for easier union with us_china dataframe.
not_us_china_continents = not_us_china_continents.rename(columns ={'Continent' : 'Country'})

In [None]:
#Combined by row us_china table and rest of world tabel.
trend = pd.concat([not_us_china_continents, us_china], ignore_index=True)

### Population against Per Capita Emission (code)


In [None]:
#Filtered for per capita emissions per continent.
per_capita_continents = per_capita[per_capita['Country'].isin(['Africa',
                                       'Asia',
                                       'Europe',
                                       'Oceania',
                                       'South America',
                                       'North America'])].sort_values('PerCapGhgEmiss', ascending = False)

In [None]:
#Dropped unneeded columns
per_capita_graph = per_capita_continents.drop(columns =['Continent',
                                                        'Renewables - % electricity',
                                                        'Code']).rename(columns={'Country': 'Continent'})

In [None]:
#Imported data on population per country.
pop = pd.read_csv("https://ourworldindata.org/grapher/population.csv?v=1&csvType=filtered&useColumnShortNames=true&tab=table&time=latest",
                  storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})

In [None]:
#EDA
pop.info()
pop.duplicated().sum()
pop.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Entity                 260 non-null    object
 1   Code                   239 non-null    object
 2   Year                   260 non-null    int64 
 3   population_historical  260 non-null    int64 
dtypes: int64(2), object(2)
memory usage: 8.2+ KB


Unnamed: 0,Entity,Code,Year,population_historical
0,Afghanistan,AFG,2023,41454762
1,Africa,,2023,1480770740
2,Africa (UN),,2023,1480770527
3,Albania,ALB,2023,2811660
4,Algeria,DZA,2023,46164221


In [None]:
#Renamed columns to streamline the merge with the per capita data below.
pop = pop.rename(columns = {'population_historical' : 'Population', 'Entity' : 'Continent'}).drop(columns ='Code')

In [None]:
#Merged population and per capita emission data for plotting.
per_capita_pop_graph = per_capita_graph.merge(pop,
                       on= ['Continent', 'Year'],
                       how='left')

### Data Calculations

In [None]:
#Subset trend table to collect values for 2023
trend_2023 = trend[trend['Year'] == 2023]

In [None]:
 #Subset for only Asia's and China's values
 asia= (trend_2023[trend_2023['Country'] == 'Asia']['AnnualGhgEmmiss(tonnes)'])
 china=   (trend_2023[trend_2023['Country'] == 'China']['AnnualGhgEmmiss(tonnes)'])

In [None]:
#Add China's total GHG to the rest of Asia and calculted percentage.
asia_total = np.array(asia) + np.array(china)
(china/asia_total) * 100

Unnamed: 0,AnnualGhgEmmiss(tonnes)
216,46.583235


In [None]:
#Subset for only North America's and US's values
north_america = (trend_2023[trend_2023['Country'] == 'North America']['AnnualGhgEmmiss(tonnes)'])
us = (trend_2023[trend_2023['Country'] == 'United States']['AnnualGhgEmmiss(tonnes)'])

In [None]:
#Add both above and calculate US's percentage.
north_america_total = np.array(north_america) + np.array(us)
(us/north_america_total)* 100

Unnamed: 0,AnnualGhgEmmiss(tonnes)
247,75.15111


In [None]:
#Calcualate India's population percentage
(np.array(pop[pop['Continent'] == 'India']['Population'])/
 np.array(pop[pop['Continent'] == 'World']['Population'])) *100

array([17.77207989])

In [None]:
#Calcualate China's population percentage
(np.array(pop[pop['Continent'] == 'China']['Population'])/
 np.array(pop[pop['Continent'] == 'World']['Population'])) *100

array([17.58071598])

# Data Analysis




---


Due to the massive volume of data, visualizations will be the primary tool for analysis, as they are interactive and provide an effective way to summarize the data and make meaningful observations.

### Continental GHG Emission (graph)




---


Contrary to H1, the data reveals that increased climate change awareness has not led to significant emission reductions. Most continents have maintained similar emission levels, directly challenging the first hypothesis. The graph shows no substantial decline in emissions, suggesting that awareness alone has not translated into meaningful action. A silver lining is that while awareness may have prevented further emission increases, efforts to reduce emissions have been slow or ineffective.

Asia’s high emissions are noteworthy; however, due its vast and varied population, larger countries could be overshadowing smaller emitters, skewing the data and necessitatimg a deeper dive into the data.


In [None]:
# @title
#Assigns line label to the 2022 interval.
grouped_ghg['TextLabel'] = grouped_ghg.apply(lambda row: row['Country'] if row['Year'] == 2022 else None, axis=1)

grouped_ghg = grouped_ghg.sort_values('AnnualGhgEmmiss(tonnes)', ascending=True)     #Ensures data is sorted by GHG emissions descending.

color_discrete_map = {        #Configures colours.
    'Asia': 'firebrick',
    'Africa': 'lightslategrey',
    'Europe': 'orange',
    'Oceania': 'lightslategrey',
    'South America' : 'grey',
    'North America' :'red'

}

fig = px.area(             #Plots an area graph.
    grouped_ghg,           #Data being plotted
    x='Year',
    y='AnnualGhgEmmiss(tonnes)',
    color='Country',
    title='Annual GHG Emissions by Continent (1993 - 2023)',
    labels={'AnnualGhgEmmiss(tonnes)': 'Annual GHG Emissions (tonnes)', 'Year': 'Year'},
    color_discrete_map= color_discrete_map,
    text ='TextLabel'
)
fig.update_traces(line=dict(width=2),
                  textposition='top left', #Label positioning
                  textfont=dict(size=14),   #Label font configuration
                  mode='lines+text',         #Removes dots at every year interval
                  hovertemplate='<b>Year:</b> %{x}<br>' +
                  '<b>Annual GHG Emissions (tonnes):</b> %{y}<br>' +
                  '<extra></extra>')         #Ensures only the year and emission data is shown in the pop-up.


fig.update_layout(
    showlegend = False,
    template='plotly_white',
    title= dict(
        text= 'Annual GHG Emissions by Continent (1993 - 2023)',
        x= 0.5) # Center the title horizontally
)
fig.show()



### Global Emissions over time (graph)




---
Gaining a more nuanced understanding of country-level emissions is crucial. Thus a color-coded map was employed to illustrate world trends across the 30-year period.

The data confirms that a few countries disproportionately impact their continent's total emissions:

1. In North America, the United States leads, as the only country in North America producing emissions in the billions. Furthermore, the top 3 most populated countries contribute disproportionately to the region’s emissions.

2. A similar situtaion is seen in South America with Brazil producing GHG output in the billions compared the rest of the continents millions.

3. In Asia, China dominates emissions, producing half of the continent's
total GHG output.

The map reveals that populous, industrial and emerging economic powers dominate global emissions, supporting H2 about disproportionate GHG production relative to population size.

In [None]:
# @title
#Map graph to show ghg emissions over the entire continent.

fig = px.choropleth(
    total_ghg_countries,
    locations='Country',  # Country names or codes
    locationmode='country names',  # Ensure proper matching of country names to my data table
    color='AnnualGhgEmmiss(tonnes)',
    hover_name='Country',
    animation_frame='Year',  # Adds animation over years
    title='GHG Emissions Trends Over Time (1993-2023)',
    labels={'AnnualGhgEmmiss(tonnes)': 'Annual GHG Emissions (tonnes)'},
    color_continuous_scale='OrRd'  # Color scale to signify dangerous impact
)

#Configure map colours
fig.update_layout(
    template='ggplot2',
    title= {
        'text': 'GHG Emissions Trends Over Time (1993-2023)',
        'x': 0.47,  # Center the title horizontally
        'xanchor': 'center',  # Align the center of the title with the center of the plot
        'y': 0.95,  # Position slightly below the top
        'yanchor': 'top'  # Anchor the title at the top of the plot
    },
    coloraxis_colorbar=dict(
        title = 'Emissions (tonnes)',
        title_font = dict(size=14)
  )
)


# Adjust transparency
fig.update_traces(marker=dict(opacity=0.8))
fig.show()

### US_China vs World Emission Comparison (graph)



---
 A closer analysis reveals that a few countries emit far more than their regional counterparts. In North America, the United States’ dominates, responsible for 75% of GHG meissions. Similarly, China’s emissions are so significant that they skew Asia’s total, raising the continent to the top emitter globally.

When removing the United States and China from their respective continental totals:

1. North America’s emissions drop from 7.8 billion tonnes to 1.9 billion tonnes, lowering its global ranking from 2nd to 5th place.

2. Remarkably, China's emissions exceed the combined total of all continents, excluding Asia.

However, even when China's emissions are excluded, Asia remains the highest emitting continent. Interestingly, both China and India contain 35% of Earth's population, but only China was responsible for half of Asia’s emissions. This stark disparity underscores the significant role of large industrial nations, particularly those in Asia and North America, in driving global emissions.


Interestingly, Europe emerges as the 2nd-highest emitting continent, despite having a smaller population than Africa and South America combined. This highlights that high emissions are concentrated in industrialized regions, often led by so-called “first-world” nations, who, while championing climate action, remain key contributors to the problem.

These findings illustrate how a few industrial powerhouses can dramatically influence their continent's total emissions, directly supporting H2.

In [None]:
# @title
#Assigns line label to the 2023 interval.
trend['TextLabel'] = trend.apply(lambda row: row['Country'] if row['Year'] == 2023 else None, axis=1)

fig = px.line(
    trend,
    x='Year',
    y='AnnualGhgEmmiss(tonnes)',
    text ='TextLabel',
    color='Country',
    title='Trends in Total GHG Emissions for 2023 Top Emitters (1993-2023)',
    labels={'AnnualGhgEmmiss(tonnes)': 'Annual GHG Emmissions (tonnes)', 'Year': 'Year'}
)

# Graph customisation.
fig.update_traces(line=dict(width=1.5),    #Thin lines
                  textposition='top center', #Label positioning
                  textfont=dict(size=10),   #Label font configuration
                  mode='lines+text',         #Removes dots at every year interval
                  hovertemplate='<b>Year:</b> %{x}<br>' +
                  '<b>Annual GHG Emissions (tonnes):</b> %{y}<br>' +
                  '<extra></extra>')    #Ensures only the year and emission data is shown in the pop-up.

fig.update_layout(
    showlegend= False,
    title=dict(
        text='China and USA Stand out in TotalGHG Emissions (1993-2023)',
        x=0.5,  # Centers the title horizontally
        xanchor='center',  # Aligns the center of the title with the center of the plot
        font=dict(size=20)
    ),
    template='plotly_white',  #Customizes the graph itself
    font=dict(size=14),
    height=600,
    width=1300
)

fig.show()


### Population against Per Capita Emission (graph)



---

The per capita emissions graph provides compelling evidence for the H3:

1. North America leads in individual greenhouse gas emissions.
2. Oceania ranks 2nd in per capita emissions, despite being the least populous continent highlighting industrialized regions' high individual impact.
3. Africa, the 2nd most populous continent, emits the lowest amount of GHG per person, demonstrating inverse relationship between population and emissions

This analysis helps adjust for the skew caused by population size in total emissions data. It illustrates that high emissions are not solely tied to large populations but also to industrial practices, affluence, and lifestyle factors.

In [None]:
# @title
custom_color_scale = [
    [0, 'rgba(255, 182, 182, 1)'],  # Light red with full opacity
    [0.5, 'rgba(215, 58, 58, 1)'],  # Medium red
    [1, 'rgba(138, 3, 3, 1)']       # Dark red (burnt auburn)
]

fig = px.scatter(
    per_capita_pop_graph,
    x= 'Population',
    y= 'PerCapGhgEmiss',
    text= 'Continent',
    size='PerCapGhgEmiss',
    title='Continental Per Capita Emissions (2023)',
    labels={'PerCapGhgEmiss': 'Per Capita GHG Emissions (tonnes)'},
    color = 'PerCapGhgEmiss',
    color_continuous_scale= custom_color_scale,
    template='plotly_white'
)


fig.update_traces(textposition='top center', #Label positioning
                  textfont=dict(size=10),
                  marker=dict(           #Puts borders around points to make them visinble on background
        line=dict(
            color='black',  #Border color
            width=1  #Border width
        )
    )
)


fig.update_layout(
    title=dict(
        text='China and USA Stand out in TotalGHG Emissions (1993-2023)',
        x=0.5,             #Centers the title horizontally
        xanchor='center',  #Aligns the center of the title with the center of the plot
        font=dict(size=20)
    ),
    template='plotly_white',  #Customizes the graph itself
    font=dict(size=14),
    height = 680,
    width = 900
)


fig.show()

# Future research
---

It could investigate how these high-emitting continents are working to mitigate their emissions. Are their efforts supported by their citizens and governments? Additionally, it would be worth studying which countries are most affected by climate change’s consequences, especially those contributing the least to GHG emissions, to identify how they can be better supported.


# Conclusion
---


The data highlights key insights:

- While some regions and countries contribute significantly more than others, no nation has shown substantial progress in reducing emissions.
- China and the United States stand out as disproportionate contributors, overshadowing their respective continents.
- Per capita emissions provide a more nuanced view, revealing stark contrasts in emissions based on industrialization and lifestyle.
- This analysis serves as a warning: without collective global action, the consequences of climate inaction will impact all nations big or small, rich or poor. While no country is “innocent,” the focus must remain on equitable and effective measures to protect our planet for future generations.