# CO2EconomyInsights

## Introduction
Welcome to CO2EconomyInsights! This project is an in-depth analysis aimed at uncovering the intricate relationships between CO2 emissions, economic activity, population growth, and temperature variations. Climate change, marked significantly by increasing CO2 emissions, is one of the most pressing issues humanity faces today. Our focus is on data spanning from 1990 to 2020, delving into the contributing factors, the varying roles of different countries and regions, and the overarching trends that define this critical period in environmental change.

We believe that understanding the dynamics of CO2 emissions is essential to formulating effective policies, raising awareness, and driving action to mitigate the impacts of climate change. This project strives to bring clarity to these dynamics, offering insights that are accessible, comprehensible, and actionable.

## Objective
We analyze key trends and correlations across countries and regions, focusing on CO2 emissions in relation to GDP, population, and temperature. Our goal is to identify leading emission contributors, study the impact of economic and demographic factors, and understand temperature-related changes.

## Data Cleaning
The data used in this analysis has been preprocessed for accuracy and consistency in a dedicated module, clean_data.py. This preprocessing includes handling missing values, normalizing formats, and ensuring data relevance and reliability across various datasets such as CO2 emissions, GDP, population, and temperature. For detailed preprocessing steps, refer to the `clean_data.py` module.

## Global CO2 Emissions Trends (1990-2020)
This section examines the evolution of global CO2 emissions over three decades. We aim to visualize and understand the global trends and significant changes during this period.

### Data Loading and Preparation

In [1]:
# Import pandas module for data manipulation
import pandas as pd

# Load cleaned CO2 data
co2_data = pd.read_csv("data/clean_data/cleaned_co2_data.csv")

# Sum the CO2 emissions for each year 
annual_global_emissions = co2_data.groupby("Year")["CO2 Emissions"].sum().reset_index()
print(annual_global_emissions.head(10))

   Year  CO2 Emissions
0  1990   2.034678e+07
1  1991   2.048748e+07
2  1992   2.051612e+07
3  1993   2.065387e+07
4  1994   2.077113e+07
5  1995   2.135832e+07
6  1996   2.181230e+07
7  1997   2.219552e+07
8  1998   2.231532e+07
9  1999   2.242669e+07


### Line Chart

In [2]:
# Import plotly for interactive visualizations
import plotly.express as px

# Create a line chart showing the global yearly CO2 emissions over time
fig = px.line(annual_global_emissions, 
              x="Year", 
              y="CO2 Emissions", 
              title="Global CO2 Emissions Trend (1990-2020)",
              labels={"Total CO2 Emissions": "Total CO2 Emissions (metric tons)",
                      "Year": "Year"},
              line_shape="spline",
              hover_data={"CO2 Emissions": ':.2s'})

# Customizing line color and adding markers
fig.update_traces(line=dict(color='#b30000', width=2), mode='lines+markers')

# Show line chart
fig.show()

We can see that global CO2 emissions have increased over time, with a particularly noticeable rise from the early 2000s until around 2010. Post-2010, the growth rate seems to plateau, with a slight decrease in the latter part of the decade, which could reflect various global efforts to reduce emissions or economic factors that may have influenced these trends.

The decreases in global CO2 emissions observed in 2009 and 2020 coincide with the global financial crisis and the COVID-19 pandemic, respectively, each leading to reduced industrial activity and economic downturns.

## CO2 Emissions Geographical Distribution
Here, we explore CO2 emissions from a geographical standpoint, visualizing total emissions by country. 
This provides insights into the global distribution of emissions and identifies the top contributing countries.

### Data Processing for Map Visualization

In [3]:
# Sum CO2 emmissions for each country across all years
country_total_emissions = co2_data.groupby("Country Name")["CO2 Emissions"].sum().reset_index()
print(country_total_emissions.head(10))

          Country Name  CO2 Emissions
0          Afghanistan   1.494529e+05
1              Albania   1.203406e+05
2              Algeria   3.278598e+06
3              Andorra   1.503160e+04
4               Angola   5.723800e+05
5  Antigua and Barbuda   1.154570e+04
6            Argentina   4.528824e+06
7              Armenia   1.771098e+05
8            Australia   1.087684e+07
9              Austria   2.031100e+06


### World Map Visualization


In [4]:
# Create an interactive world map of CO2 emissions by country
fig = px.choropleth(country_total_emissions,
                    locations="Country Name",
                    locationmode="country names",
                    color="CO2 Emissions",
                    hover_name="Country Name",
                    color_continuous_scale=px.colors.diverging.RdYlGn_r,
                    title="Total CO2 Emissions by Country (1990-2020)",
                    hover_data={"CO2 Emissions": ':.2s'})

# Show the visualization
fig.show()

The map visualization presents total CO2 emissions by country from 1990 to 2020, with varying color intensities indicating the level of emissions. Countries with darker hues are the highest emitters, showing a clear distribution of global emissions with significant contributions from United States and China.

## Top 10 CO2 Emissions Contributors
We identify the top 10 countries with the highest cumulative CO2 emissions from 1990 to 2020 and chart their total emissions over this time period to illustrate their significant roles in global emissions.

### Data Preparation

In [5]:
# Identify the top 10 emitting countries over the entire period
top_emitters = co2_data.groupby('Country Name')['CO2 Emissions'].sum().nlargest(10).reset_index()
print(top_emitters.head(10))

         Country Name  CO2 Emissions
0               China    191755756.2
1       United States    162638318.1
2  Russian Federation     51646893.1
3               India     42045842.0
4               Japan     36090686.7
5             Germany     24971744.6
6              Canada     15939392.3
7      United Kingdom     15131217.2
8         Korea, Rep.     14948880.4
9  Iran, Islamic Rep.     13552515.5


### Bar Chart

In [6]:
# Create the bar chart visualization for the sum of CO2 emissions from 1990 to 2020
fig = px.bar(top_emitters,
             x='Country Name',
             y='CO2 Emissions',
             color="CO2 Emissions",
             color_continuous_scale=px.colors.diverging.RdYlGn_r,
             title='Total CO2 Emissions from 1990 to 2020 for Top 10 Emitting Countries',
             labels={'CO2 Emissions':'Total CO2 Emissions (metric tons)'},
             text='CO2 Emissions',
             hover_data={"CO2 Emissions": ':.2s'})

# Customize the layout for better readability
fig.update_layout(xaxis_tickangle=-45, 
                  yaxis=dict(title='Total CO2 Emissions (metric tons)'),
                  coloraxis_showscale=False)
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')

# Show bar chart
fig.show()

The bar chart displays total CO2 emissions for the top 10 emitting countries from 1990 to 2020, highlighting the significant lead in emissions by China and the United States compared to others.

## Correlation Between GDP, Population, and CO2 Emissions
This analysis delves into the interconnected relationship between economic growth (GDP), population size, and CO2 emissions. It aims to unravel the intricate dynamics of how both economic development and demographic trends influence the environmental impact.

### Data Preparation and Analysis
We combined GDP, population, and CO2 emissions data by country to find out how they're related. By calculating their correlations, we understand the link between economic growth, population size, and CO2 emissions. We also split the data using the median population to compare smaller and larger countries.

In [7]:
from functools import reduce

# Load cleaned GDP, Population and Country data
gdp_data = pd.read_csv("data/clean_data/cleaned_gdp_data.csv")
pop_data = pd.read_csv("data/clean_data/cleaned_pop_data.csv")
country_data = pd.read_csv("data/clean_data/cleaned_country_data.csv")

# Aggregate GDP, Population and CO2 mean values by year on 'Country Code'
gdp_mean = gdp_data.groupby("Country Code")["GDP"].mean().reset_index()
pop_mean = pop_data.groupby("Country Code")["Population"].mean().reset_index()
co2_mean = co2_data.groupby("Country Code")["CO2 Emissions"].mean().reset_index()

# List of dataframes to merge
dataframes = [gdp_mean, pop_mean, co2_mean, country_data]

# Merge the GDP, Population, CO2 and Country data with reduce
total_data = reduce(lambda left, right: pd.merge(left, right, on='Country Code', suffixes=('', '_right')), dataframes)

# Calculate GDP vs CO2 Emissions Correlation
gdp_co2_corr = round(total_data["GDP"].corr(total_data["CO2 Emissions"]), 2)
print(f"Correlation coefficient between GDP and CO2 Emissions: {gdp_co2_corr}")

# Calculate Population vs CO2 Emissions Correlation
pop_co2_corr = round(total_data["Population"].corr(total_data["CO2 Emissions"]), 2)
print(f"Correlation coefficient between Population and CO2 Emissions: {pop_co2_corr}")

# Splitting data between small and big countries based on median population for all countries
median_population = round(total_data["Population"].median()) # Using median value will split the countries evenly
small_countries_data = total_data[total_data["Population"] < median_population]
big_countries_data = total_data[total_data["Population"] >= median_population]

print(f"Median Populations is: {median_population}")


Correlation coefficient between GDP and CO2 Emissions: 0.84
Correlation coefficient between Population and CO2 Emissions: 0.75
Median Populations is: 7255446


Our analysis revealed strong correlations in the data: a correlation coefficient of **0.84** between **GDP** and **CO2 Emissions** indicates a significant link between economic growth and increased carbon emissions. Additionally, a correlation coefficient of **0.75** between **Population** and **CO2 Emissions** suggests that larger population sizes also contribute notably to higher emissions. The **median population** across the dataset is **7,255,446**, which we used to differentiate between countries with smaller and larger populations for more detailed insights.

### Bubble Charts
We use bubble charts to visualize how GDP and CO2 emissions relate to population size. One chart shows countries with smaller populations, and another for those with larger populations, helping us see how economic size and population impact CO2 emissions.

In [9]:
from helper_functions import create_bubble_chart

# Create bubble chart for small countries data
data_label = "Small Countries"
xaxis_title = "Average GDP (in USD - log scale)"
yaxis_title = "Average CO2 Emissions (in metric tons - log scale)"
size = "Population"
create_bubble_chart(small_countries_data, data_label=data_label, xaxis_title=xaxis_title, yaxis_title=yaxis_title, size=size)

# Create bubble chart for big countries data
data_label = "Big Countries"
create_bubble_chart(big_countries_data, data_label=data_label, xaxis_title=xaxis_title, yaxis_title=yaxis_title, size=size)


The bubble charts show that higher GDP and larger populations correlate with greater CO2 emissions for both big and small countries from 1990 to 2020. Big countries with high incomes emit more CO2, while smaller economies contribute less. Log scales on both axes make it easy to compare across a broad range of values.

## GDP vs CO2 Emissions per Capita
This analysis examines the relationship between a country's wealth (GDP) and its CO2 emissions on a per capita basis, providing insight into how economic prosperity aligns with individual carbon footprints.

### Data Preparation and Analysis

To explore this relationship, we calculated the CO2 emissions per capita by dividing the total emissions by the population size. We then assessed how this figure correlates with GDP to determine if higher wealth translates to higher per-person emissions.

In [13]:
# Create emissions per capita column
total_data["CO2 Emissions per Capita"] = total_data["CO2 Emissions"] / total_data ["Population"]

# Calculate GDP vs CO2 Emissions per Capita Correlation
gdp_co2_per_capita_corr = round(total_data["GDP"].corr(total_data["CO2 Emissions per Capita"]), 2)
print(f"Correlation coefficient between GDP and CO2 Emissions per Capita: {gdp_co2_per_capita_corr}")


Correlation coefficient between GDP and CO2 Emissions per Capita: 0.26


### Scatter Plot

A scatter plot was created to visualize this correlation, with GDP on a log scale to accommodate the wide range of economic sizes.

In [14]:
# Create scatter plot
data_label = "All Countries"
xaxis_title = "Average GDP (in USD - log scale)"
yaxis_title = "Average CO2 Emissions per Capita (in metric tons - log scale)"
y = "CO2 Emissions per Capita"
create_bubble_chart(total_data, data_label=data_label, xaxis_title=xaxis_title, yaxis_title=yaxis_title, y=y)

The correlation coefficient of **0.26** and dispersion of data points in the scatter plot suggests a weak relationship between a country's GDP and its per capita CO2 emissions, indicating that higher wealth does not necessarily lead to higher individual carbon emissions.

## CO2 Emissions and Global Average Temperature Trend

This section explores the trend between global CO2 emissions and average temperatures, assessing their potential connection and how they may jointly influence climate patterns.

### Data Preparation and Analysis
Average yearly temperatures and CO2 emissions data were synchronized from 1990 to 2013 to ensure comparability. We then calculated the correlation between these two variables to quantify their relationship. The analysis yielded a correlation coefficient of 0.26, suggesting a relatively weak connection between the global temperature rise and CO2 emissions over the period studied.


In [15]:
# Load temperature data
temp_data = pd.read_csv("data/clean_data/cleaned_temp_data.csv")

# Get average temperature for each year
annual_average_temperature = temp_data.groupby("Year")["Temperature"].mean().reset_index()

# Filter C02 data to have same data range as temperature data (1990 - 2013)
annual_global_emissions_filtered = annual_global_emissions[annual_global_emissions["Year"] < 2014]

# Calculate CO2 Emissions vs Global Average Temperature Correlation
co2_temp_corr = round( annual_global_emissions_filtered["CO2 Emissions"].corr(annual_average_temperature["Temperature"]), 2)
print(f"Correlation coefficient between CO2 Emissions and Global Average Temperature: {co2_temp_corr}")


Correlation coefficient between CO2 Emissions and Global Average Temperature: 0.72


## Dual-axis Line Chart
To illustrate these trends, we employ a dual-axis line chart that concurrently tracks the changes in CO2 emissions and average temperatures.

In [17]:
import plotly.graph_objs as go
from plotly.subplots import make_subplots

# Create container for dual axis line chart
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add Global CO2 Emissions to the primary y-axis
fig.add_trace(
    go.Scatter(x=annual_global_emissions_filtered["Year"], 
               y=annual_global_emissions_filtered["CO2 Emissions"], 
               name='CO2 Emissions', 
               hovertemplate='%{x}<br>CO2 Emissions: %{y:.2s}<extra></extra>'),
    secondary_y=False)

# Add Global Average Temperature to the secondary y-axis
fig.add_trace(
    go.Scatter(x=annual_average_temperature["Year"], 
               y=annual_average_temperature["Temperature"], 
               name='AVG Temperature', 
               hovertemplate='%{x}<br>Temperature: %{y:.1f}°C<extra></extra>'),
    secondary_y=True)

# Add figure title
fig.update_layout(
    title_text="CO2 Emissions and Global Average Temperature Over Time")

# Set axis titles
fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="CO2 Emissions (metric tons)", secondary_y=False)
fig.update_yaxes(title_text="Global Average Temperature (°C)", secondary_y=True)

# Show the dual axis line chart 
fig.show()

NameError: name 'go' is not defined