<a href="https://colab.research.google.com/github/SaraKmair/co2_emissions_analysis/blob/main/CO%E2%82%82_Emissions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

data source: https://github.com/owid/co2-data

In [3]:
url = 'https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv'
df = pd.read_csv(url)

In [4]:
 df[df['country'] == 'Canada'].head()

Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
7847,Canada,1785,CAN,,,,,0.004,,,...,,,,,,,,,,
7848,Canada,1786,CAN,,,,,0.004,0.0,0.0,...,,,,,,,,,,
7849,Canada,1787,CAN,,,,,0.004,0.0,0.0,...,,,,,,,,,,
7850,Canada,1788,CAN,,,,,0.004,0.0,0.0,...,,,,,,,,,,
7851,Canada,1789,CAN,,,,,0.004,0.0,0.0,...,,,,,,,,,,


In [5]:
df[df['country'] == 'World'].head()

Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
47266,World,1750,,745664133.0,,,,9.306,,,...,,,,,,,,,,
47267,World,1751,,,,,,9.407,0.101,1.088,...,,,,,,,,,,
47268,World,1752,,,,,,9.505,0.098,1.041,...,,,,,,,,,,
47269,World,1753,,,,,,9.61,0.105,1.108,...,,,,,,,,,,
47270,World,1754,,,,,,9.734,0.123,1.281,...,,,,,,,,,,


- Global CO2 Emissions Over Time
- Top Emitting Countries in 2022
- CO2 Emissions Per Capita Over Time
- Cumulative CO2 Emissions by Country
- Energy Consumption Trends

In [21]:
import plotly.express as px

# Assuming you have a population column in your DataFrame
# and it's aligned with the CO2 emissions data

# Filter the DataFrame for World data from 1900 onwards for both CO2 and population
world_data = df[(df['country'] == 'World') & (df['year'] >= 1940)][['year', 'co2', 'population']]

# Create the initial line plot for CO2 emissions
fig1 = px.line(world_data, x='year', y='co2',
               title='Global CO2 Emissions and Population Over Time since 1900',
               labels={'co2': 'Total CO2 Emissions (Mt)'})

# Update layout for the CO2 emissions
fig1.update_layout(xaxis_title='Year', yaxis_title='Total CO2 Emissions (Mt)')

# Add the secondary y-axis for population
fig1.add_scatter(x=world_data['year'], y=world_data['population'], mode='lines', name='Population', yaxis='y2')

# Update the layout to include the secondary y-axis
fig1.update_layout(
    yaxis2=dict(
        title='Population',
        overlaying='y',
        side='right'
    )
)

# Show the plot
fig1.show()


Both CO2 emissions and global population have been increasing since 1940. The chart likely shows a steady, possibly exponential, rise in CO2 emissions, paralleled by a significant increase in the global population, likely due to higher energy consumption, industrial activity, and land use changes accompanying population growth.

In [22]:
# df[(df.country=='North America') & (df.year ==1970)][['year', 'co2']].head()

In [9]:
import plotly.express as px

# Filter the DataFrame for each main continent from 1900 onwards
continents = ['Asia', 'Europe', 'Africa', 'North America', 'South America', 'Oceania']
dataframes = {continent: df[(df['country'] == continent) & (df['year'] >= 1900)][['year', 'co2']] for continent in continents}

# Initialize the figure
fig1 = px.line()

# Add each continent's data to the plot
for continent, data in dataframes.items():
    fig1.add_scatter(x=data['year'], y=data['co2'], mode='lines', name=continent)

# Update layout and titles
fig1.update_layout(
    title='CO2 Emissions Over Time by Continent (from 1900)',
    xaxis_title='Year',
    yaxis_title='Total CO2 Emissions (Mt)',
    legend_title='Continent'
)

# Show the plot
fig1.show()


***Rapid Emissions Growth in Asia:***
Asia shows the highest increase in CO2 emissions over time. This significant rise can be attributed to rapid industrialization, economic development, and urbanization in Asian countries, particularly in emerging economies like China and India. However, this insight must be contextualized with Asia's large and growing population, which means that per capita emissions might still be lower compared to some other regions.


North America and Europe also show substantial increases in CO2 emissions, but at a slower rate than Asia. This could reflect the earlier industrialization and subsequent stabilization or efficiency improvements in these regions. Additionally, the environmental policies, technological advancements, and shift towards renewable energy sources in Europe and North America might have contributed to relatively lower emission growth rates.
Population Factor in Emission Analysis:

Considering the high population in Asia, it's important to analyze emissions on a per capita basis. This might reveal that, despite having the highest total emissions, the per capita emissions in Asia could be lower than in North America and Europe, where the population is lower but the lifestyle and consumption patterns are more energy-intensive.

In [8]:
# 1. Global CO2 Emissions Over Time (from 1900 onwards)
global_co2 = df[(df['country'] == 'Canada') & (df['year'] >= 1900)][['year', 'co2']]
fig1 = px.line(global_co2, x='year', y='co2',
               title='Canada CO2 Emissions Over Time (from 1900)',
               labels={'co2': 'Total CO2 Emissions'})
fig1.update_layout(xaxis_title='Year', yaxis_title='Total CO2 Emissions (Mt)')
fig1.show()

In [81]:
# Filter the dataset for the top emitting 'countries' in 2022
top_emitters_2022 = df[(df['year'] == 2022) & df['country'].str.contains('countries')]
top_emitters_co2 = top_emitters_2022.groupby('country')['co2'].sum().sort_values(ascending=False)
top_emitters_pop = top_emitters_2022[top_emitters_2022['country'].isin(top_emitters_co2.index)]

# Align population data with the sorted CO2 emissions data
top_emitters_pop = top_emitters_pop.set_index('country')['population'][top_emitters_co2.index]

# Create the figure
fig2 = go.Figure()

# Add the first plot (CO2 Emissions)
fig2.add_trace(go.Bar(x=top_emitters_co2.index, y=top_emitters_co2, name='CO2 Emissions (Mt)'))

# Add the second plot (Population) on a secondary y-axis
fig2.add_trace(go.Scatter(x=top_emitters_pop.index, y=top_emitters_pop, name='Population', yaxis='y2', mode='lines+markers'))

# Create a secondary y-axis
fig2.update_layout(
    title="Top CO2 Emitting 'Countries' and their Populations in 2022",
    xaxis_title='Country',
    yaxis=dict(title='CO2 Emissions (Mt)'),
    yaxis2=dict(title='Population', overlaying='y', side='right'),
    legend=dict(x=0.01, y=0.99, bordercolor='Black', borderwidth=1)
)

fig2.show()

In [43]:
# 3. Global CO2 Emissions Per Capita Over Time
world_co2_per_capita = df[(df['country'] == 'World') & (df['year'] >= 1900)][['year', 'co2_per_capita']]
fig3 = px.line(world_co2_per_capita, x='year', y='co2_per_capita',
               title='Global CO2 Emissions Per Capita Over Time',
               labels={'co2_per_capita': 'CO2 Emissions Per Capita (t)'})
fig3.update_layout(xaxis_title='Year', yaxis_title='CO2 Emissions Per Capita (t)')
fig3.show()

Overall Trend Analysis:

The general trend shows an increase in CO2 emissions per capita from 1900 to the present, indicating that, on average, each person is responsible for more CO2 emissions over time. This trend aligns with the global industrialization, increased reliance on fossil fuels, and growth in consumerism.
Dip in 2020:

A noteworthy feature of this study is the small dip in CO2 emissions per capita in 2020. This dip likely correlates with the global COVID-19 pandemic, which led to lockdowns, reduced industrial activity, decreased transportation, and a general slowdown in economic activities. This insight demonstrates how significant global events can temporarily influence emission patterns.

the dip in CO2 emissions per capita in 2009 can likely be attributed to the global economic recession during that period. Economic downturns typically lead to reduced industrial activity, lower energy consumption, and a general slowdown in economic activities, all of which contribute to a decrease in CO2 emissions.

The quick rebound in CO2 emissions per capita in 2010 and 2021, following dips in 2009 and 2020, respectively, indicates a strong resilience of emissions to economic shocks. This suggests that the underlying drivers of CO2 emissions, such as reliance on fossil fuels and industrial activities, remain robust despite short-term disruptions.

In [28]:
# 3. CO2 Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
countries_co2_per_capita = df[(df['country'].str.contains('countries')) & (df.year > 2000)][['year', 'country', 'co2_per_capita']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each 'country' separately
for country in countries_co2_per_capita['country'].unique():
    country_data = countries_co2_per_capita[countries_co2_per_capita['country'] == country]
    fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['co2_per_capita'], mode='lines', name=country))

# Update the layout
fig3.update_layout(
    title="Disparity in CO2 Emissions Based on Income Level",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions Per Capita (t)',
    legend_title='Country'
)

fig3.show()


High-income countries have the highest CO2 emissions per capita at 10.1 tonnes. This is approximately 5.4 times higher than upper-middle-income countries (6.6 tonnes), about 5.5 times higher than lower-middle-income countries (1.85 tonnes), and remarkably 35.3 times higher than low-income countries (0.286 tonnes). This stark disparity highlights the significant difference in energy consumption and lifestyle patterns between countries based on their economic status.

The considerable difference in emissions between high-income and low-income countries underscores a global inequality in environmental impact. High-income countries, despite having a smaller population compared to some lower-income regions, contribute disproportionately to global CO2 emissions. This insight points to the need for addressing environmental responsibility and sustainable practices more effectively in wealthier nations.

In [29]:

# Define the main continents
continents = ['Asia', 'Europe', 'Africa', 'North America', 'South America', 'Oceania']

# Filter the DataFrame for each continent
continents_co2_per_capita = df[df['country'].isin(continents)][['year', 'country', 'co2_per_capita']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each continent separately
for continent in continents_co2_per_capita['country'].unique():
    continent_data = continents_co2_per_capita[continents_co2_per_capita['country'] == continent]
    fig3.add_trace(go.Scatter(x=continent_data['year'], y=continent_data['co2_per_capita'], mode='lines', name=continent))

# Update the layout
fig3.update_layout(
    title="CO2 Emissions Per Capita Over Time for Each Continent",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions Per Capita (t)',
    legend_title='Continent'
)

fig3.show()


In [137]:
# # 3. CO2 Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
# countries_co2_per_capita = df[df['country'].str.contains('countries')][['year', 'country', 'co2_per_capita']].dropna()

# # Create the figure
# fig3 = go.Figure()

# # Plot each 'country' separately
# for country in countries_co2_per_capita['country'].unique():
#     country_data = countries_co2_per_capita[countries_co2_per_capita['country'] == country]
#     fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['co2_per_capita'], mode='lines', name=country))

# # Update the layout
# fig3.update_layout(
#     title="CO2 Emissions Per Capita Over Time for Entities with 'Countries' in the Name",
#     xaxis_title='Year',
#     yaxis_title='CO2 Emissions Per Capita (t)',
#     legend_title='Country'
# )

# fig3.show()


In [33]:
# 3. CO2 Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
countries_co2_per_capita = df[(df['country']=='Canada') |(df['country']=='World')][['year', 'country', 'co2_per_capita']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each 'country' separately
for country in countries_co2_per_capita['country'].unique():
    country_data = countries_co2_per_capita[countries_co2_per_capita['country'] == country]
    fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['co2_per_capita'], mode='lines', name=country))

# Update the layout
fig3.update_layout(
    title="Canada's CO2 Emissions Relative to Global Average",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions Per Capita (t)',
    legend_title='Country'
)

fig3.show()


Canada's CO2 Emissions Relative to Global Average:

In 2022, Canada's CO2 emissions per capita were approximately 3.1 times higher than the world average. This significant difference indicates that the average Canadian's carbon footprint is considerably larger than the global average. This disparity can be attributed to factors such as Canada's energy consumption patterns, reliance on carbon-intensive industries, and lifestyle choices.
Impact of Economic and Geographic Factors:

Canada's higher emissions per capita may be influenced by its economic structure, which includes energy-intensive sectors like oil and gas production. Additionally, Canada's vast geography and climate contribute to higher energy demands for heating and transportation, which could explain the higher per capita emissions compared to the global average.
Reflection on Sustainability and Environmental Policy:

The data highlights the need for Canada to focus on sustainability and environmental policies. Given its significantly higher emissions per capita, there is a pressing need for Canada to adopt more aggressive carbon reduction strategies, invest in renewable energy sources, and encourage sustainable practices among its population.

In [35]:
# 3.methane Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
countries_methane_per_capita = df[(df['country']=='Canada') |(df['country']=='World')][['year', 'country', 'methane_per_capita']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each 'country' separately
for country in countries_methane_per_capita['country'].unique():
    country_data = countries_methane_per_capita[countries_methane_per_capita['country'] == country]
    fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['methane_per_capita'], mode='lines', name=country))

# Update the layout
fig3.update_layout(
    title="Canada's Methane Emissions Relative to Global Average",
    xaxis_title='Year',
    yaxis_title='Methane Emissions Per Capita (t)',
    legend_title='Country'
)

fig3.show()


In [75]:
# 5. Energy Consumption Trends
global_energy_consumption = df[(df['country'] == 'World') & (df['year'] >= 1960)].groupby('year')['primary_energy_consumption'].sum().reset_index()
fig5 = px.line(global_energy_consumption, x='year', y='primary_energy_consumption',
               title='Global Energy Consumption Over Time',
               labels={'primary_energy_consumption': 'Primary Energy Consumption (TWh)'})
fig5.update_layout(xaxis_title='Year', yaxis_title='Primary Energy Consumption (TWh)')
fig5.show()

In [58]:
# 3. CO2 Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
countries_energy_per_capita = df[df['country'].str.contains('countries')][['year', 'country', 'energy_per_capita']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each 'country' separately
for country in countries_energy_per_capita['country'].unique():
    country_data = countries_energy_per_capita[countries_energy_per_capita['country'] == country]
    fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['energy_per_capita'], mode='lines', name=country))

# Update the layout
fig3.update_layout(
    title="Energy Per Capita Over Time for Entities with 'Countries' in the Name",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions Per Capita (t)',
    legend_title='Country'
)

fig3.show()


In [64]:

# Filter the DataFrame to include only necessary columns for CO2 per capita and Energy per capita, then drop NA values
world = df[(df['country'] == 'World') & (df['year'] >= 1960)]
correlation_data = world[['energy_per_capita', 'co2_per_capita']].dropna()

# Create a scatter plot with a trend line using Plotly Express
fig = px.scatter(
    correlation_data,
    x='energy_per_capita',
    y='co2_per_capita',
    trendline='ols',  # Ordinary Least Squares regression line
    labels={'energy_per_capita': 'Energy Consumption Per Capita (TWh)', 'co2_per_capita': 'CO2 Emissions Per Capita (t)'},
    title='Correlation between Energy Consumption Per Capita and CO2 Emissions Per Capita'
)

# Show the figure
fig.show()


In [107]:

# Filter the DataFrame to include only necessary columns for CO2 per capita and Energy per capita, then drop NA values
world = df[(df['country'] == 'Canada') & (df['year'] >= 1960)]
correlation_data = world[['energy_per_capita', 'co2_per_capita']].dropna()

# Create a scatter plot with a trend line using Plotly Express
fig = px.scatter(
    correlation_data,
    x='energy_per_capita',
    y='co2_per_capita',
    trendline='ols',  # Ordinary Least Squares regression line
    labels={'energy_per_capita': 'Energy Consumption Per Capita (TWh)', 'co2_per_capita': 'CO2 Emissions Per Capita (t)'},
    title='Correlation between Energy Consumption Per Capita and CO2 Emissions Per Capita'
)

# Show the figure
fig.show()


In [129]:

# # Filter the DataFrame to include only necessary columns for CO2 per capita and Energy per capita, then drop NA values
# world = df[(df['country'] == 'Sweden') & (df['year'] >= 1990)]
# correlation_data = world[['energy_per_capita', 'co2_per_capita']].dropna()

# # Create a scatter plot with a trend line using Plotly Express
# fig = px.scatter(
#     correlation_data,
#     x='energy_per_capita',
#     y='co2_per_capita',
#     trendline='ols',  # Ordinary Least Squares regression line
#     labels={'energy_per_capita': 'Energy Consumption Per Capita (TWh)', 'co2_per_capita': 'CO2 Emissions Per Capita (t)'},
#     title='Correlation between Energy Consumption Per Capita and CO2 Emissions Per Capita'
# )

# # Show the figure
# fig.show()


In [130]:

# # Filter the DataFrame to include only necessary columns for CO2 per capita and Energy per capita, then drop NA values
# world = df[(df['country'] == 'United Kingdom') & (df['year'] >= 1990)]
# correlation_data = world[['energy_per_capita', 'co2_per_capita']].dropna()

# # Create a scatter plot with a trend line using Plotly Express
# fig = px.scatter(
#     correlation_data,
#     x='energy_per_capita',
#     y='co2_per_capita',
#     trendline='ols',  # Ordinary Least Squares regression line
#     labels={'energy_per_capita': 'Energy Consumption Per Capita (TWh)', 'co2_per_capita': 'CO2 Emissions Per Capita (t)'},
#     title='Correlation between Energy Consumption Per Capita and CO2 Emissions Per Capita'
# )

# # Show the figure
# fig.show()


In [105]:
# 3. CO2 Emissions Per Capita Over Time for Each 'Country' with 'countries' in its Name
countries_co2_growth_prct = df[(df['country']=='Canada') | (df['country']=='World')][['year', 'country', 'co2_growth_prct']].dropna()

# Create the figure
fig3 = go.Figure()

# Plot each 'country' separately
for country in countries_co2_growth_prct['country'].unique():
    country_data = countries_co2_growth_prct[countries_co2_growth_prct['country'] == country]
    fig3.add_trace(go.Scatter(x=country_data['year'], y=country_data['co2_growth_prct'], mode='lines', name=country))

# Update the layout
fig3.update_layout(
    title="CO2 Growth Per Capita Over Time for Canada",
    xaxis_title='Year',
    yaxis_title='CO2 Growth Percent',
    legend_title='Country'
)

fig3.show()


In [114]:
df.columns

Index(['country', 'year', 'iso_code', 'population', 'gdp', 'cement_co2',
       'cement_co2_per_capita', 'co2', 'co2_growth_abs', 'co2_growth_prct',
       'co2_including_luc', 'co2_including_luc_growth_abs',
       'co2_including_luc_growth_prct', 'co2_including_luc_per_capita',
       'co2_including_luc_per_gdp', 'co2_including_luc_per_unit_energy',
       'co2_per_capita', 'co2_per_gdp', 'co2_per_unit_energy', 'coal_co2',
       'coal_co2_per_capita', 'consumption_co2', 'consumption_co2_per_capita',
       'consumption_co2_per_gdp', 'cumulative_cement_co2', 'cumulative_co2',
       'cumulative_co2_including_luc', 'cumulative_coal_co2',
       'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_luc_co2',
       'cumulative_oil_co2', 'cumulative_other_co2', 'energy_per_capita',
       'energy_per_gdp', 'flaring_co2', 'flaring_co2_per_capita', 'gas_co2',
       'gas_co2_per_capita', 'ghg_excluding_lucf_per_capita', 'ghg_per_capita',
       'land_use_change_co2', 'land_use_chang

In [34]:
canada_industry_co2 = df[df['country'] == 'Canada'][['year', 'coal_co2', 'oil_co2', 'gas_co2', 'cement_co2']].dropna()

# Create the figure for line chart
fig = go.Figure()

# Adding a line for each industry
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['coal_co2'], mode='lines', name='Coal CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['oil_co2'], mode='lines', name='Oil CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['gas_co2'], mode='lines', name='Gas CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['cement_co2'], mode='lines', name='Cement CO2'))

# Update the layout
fig.update_layout(
    title="Canada's CO2 Emissions by Industry Over Time",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions (t)',
    legend_title='Industry'
)

fig.show()

In [40]:
canada_industry_co2[[col for col in canada_industry_co2.columns if col.endswith('_co2') ]].columns

Index(['cement_co2', 'coal_co2', 'consumption_co2', 'cumulative_cement_co2',
       'cumulative_co2', 'cumulative_coal_co2', 'cumulative_flaring_co2',
       'cumulative_gas_co2', 'cumulative_luc_co2', 'cumulative_oil_co2',
       'cumulative_other_co2', 'flaring_co2', 'gas_co2', 'land_use_change_co2',
       'oil_co2', 'other_industry_co2', 'share_global_cement_co2',
       'share_global_co2', 'share_global_coal_co2',
       'share_global_cumulative_cement_co2', 'share_global_cumulative_co2',
       'share_global_cumulative_coal_co2',
       'share_global_cumulative_flaring_co2',
       'share_global_cumulative_gas_co2', 'share_global_cumulative_luc_co2',
       'share_global_cumulative_oil_co2', 'share_global_cumulative_other_co2',
       'share_global_flaring_co2', 'share_global_gas_co2',
       'share_global_luc_co2', 'share_global_oil_co2',
       'share_global_other_co2', 'temperature_change_from_co2', 'trade_co2'],
      dtype='object')

In [42]:
canada_industry_co2 = df[df['country'] == 'Canada'][['year', 'coal_co2', 'oil_co2', 'gas_co2', 'cement_co2', 'land_use_change_co2',
                                                     'flaring_co2', 'other_industry_co2']].dropna()

# Create the figure for line chart
fig = go.Figure()

# Adding a line for each industry
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['coal_co2'], mode='lines', name='Coal CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['oil_co2'], mode='lines', name='Oil CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['gas_co2'], mode='lines', name='Gas CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['cement_co2'], mode='lines', name='Cement CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['flaring_co2'], mode='lines', name='Flaring CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['other_industry_co2'], mode='lines', name='Other Industry CO2'))
fig.add_trace(go.Scatter(x=canada_industry_co2['year'], y=canada_industry_co2['land_use_change_co2'], mode='lines', name='Land Use Change CO2'))
# Update the layout
fig.update_layout(
    title="Canada's CO2 Emissions by Industry Over Time",
    xaxis_title='Year',
    yaxis_title='CO2 Emissions (t)',
    legend_title='Industry'
)

fig.show()

In [110]:


# Filter data for Canada and World
canada_data = df[(df['country'] == 'Canada') & (df['co2'].notna())][['year', 'co2']]
world_data = df[(df['country'] == 'World') & (df['co2'].notna())][['year', 'co2']]

# Merge datasets on year
merged_data = pd.merge(canada_data, world_data, on='year', suffixes=('_canada', '_world'))

# Calculate Canada's share of global CO2 emissions
merged_data['canada_share'] = (merged_data['co2_canada'] / merged_data['co2_world']) * 100

# Create the figure
fig = go.Figure()

# Add a line trace for Canada's share
fig.add_trace(go.Scatter(x=merged_data['year'], y=merged_data['canada_share'], mode='lines', name='Canada\'s Share'))

# Update the layout
fig.update_layout(
    title="Canada's Share in Global CO2 Emissions Over Time",
    xaxis_title='Year',
    yaxis_title='Share of Global CO2 Emissions (%)',
    legend_title='Indicator'
)

fig.show()


In [111]:
# Filter data for Canada and World for CO2 emissions
canada_co2 = df[(df['country'] == 'Canada') & (df['co2'].notna())][['year', 'co2']]
world_co2 = df[(df['country'] == 'World') & (df['co2'].notna())][['year', 'co2']]

# Filter data for Canada and World for Population
canada_pop = df[(df['country'] == 'Canada') & (df['population'].notna())][['year', 'population']]
world_pop = df[(df['country'] == 'World') & (df['population'].notna())][['year', 'population']]

# Merge datasets on year
merged_co2 = pd.merge(canada_co2, world_co2, on='year', suffixes=('_canada', '_world'))
merged_pop = pd.merge(canada_pop, world_pop, on='year', suffixes=('_canada', '_world'))

# Calculate Canada's share of global CO2 emissions and population
merged_co2['canada_co2_share'] = (merged_co2['co2_canada'] / merged_co2['co2_world']) * 100
merged_pop['canada_pop_share'] = (merged_pop['population_canada'] / merged_pop['population_world']) * 100

# Create the figure with dual-axis
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(x=merged_co2['year'], y=merged_co2['canada_co2_share'], mode='lines', name='Canada CO2 Share', yaxis='y1'))
fig.add_trace(go.Scatter(x=merged_pop['year'], y=merged_pop['canada_pop_share'], mode='lines', name='Canada Population Share', yaxis='y2'))

# Create axis objects
fig.update_layout(
    xaxis=dict(title='Year'),
    yaxis=dict(title='Share of Global CO2 Emissions (%)', side='left', showgrid=False),
    yaxis2=dict(title='Share of Global Population (%)', side='right', overlaying='y', showgrid=False),
    title="Canada's Share in Global CO2 Emissions and Population Over Time"
)

fig.show()

In [113]:


# Filter the DataFrame to include only necessary columns for Canada
canada_data = df[(df['country'] == 'Canada') & (df['year'] >= 1960)]
correlation_data = canada_data[['gdp_per_capita', 'co2_per_capita']].dropna()

# Create a scatter plot with a trend line using Plotly Express
fig = px.scatter(
    correlation_data,
    x='gdp_per_capita',
    y='co2_per_capita',
    trendline='ols',  # Ordinary Least Squares regression line
    labels={'gdp_per_capita': 'GDP Per Capita (USD)', 'co2_per_capita': 'CO2 Emissions Per Capita (t)'},
    title='Correlation between GDP Per Capita and CO2 Emissions Per Capita in Canada'
)

# Show the figure
fig.show()


In [115]:

# Filter the data for Canada's land use change CO2
canada_land_use_co2 = df[df['country'] == 'Canada'][['year', 'land_use_change_co2']].dropna()

# Create a line chart
fig = px.line(canada_land_use_co2, x='year', y='land_use_change_co2', title='CO2 Emissions from Land Use Change in Canada')

fig.show()


In [117]:
df.columns

Index(['country', 'year', 'iso_code', 'population', 'gdp', 'cement_co2',
       'cement_co2_per_capita', 'co2', 'co2_growth_abs', 'co2_growth_prct',
       'co2_including_luc', 'co2_including_luc_growth_abs',
       'co2_including_luc_growth_prct', 'co2_including_luc_per_capita',
       'co2_including_luc_per_gdp', 'co2_including_luc_per_unit_energy',
       'co2_per_capita', 'co2_per_gdp', 'co2_per_unit_energy', 'coal_co2',
       'coal_co2_per_capita', 'consumption_co2', 'consumption_co2_per_capita',
       'consumption_co2_per_gdp', 'cumulative_cement_co2', 'cumulative_co2',
       'cumulative_co2_including_luc', 'cumulative_coal_co2',
       'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_luc_co2',
       'cumulative_oil_co2', 'cumulative_other_co2', 'energy_per_capita',
       'energy_per_gdp', 'flaring_co2', 'flaring_co2_per_capita', 'gas_co2',
       'gas_co2_per_capita', 'ghg_excluding_lucf_per_capita', 'ghg_per_capita',
       'land_use_change_co2', 'land_use_chang

In [118]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Filter and prepare data
recent_data = df[(df['country'] == 'Canada') & (df['year'] >= 2000)][['year', 'co2']]
X = recent_data['year'].values.reshape(-1, 1)
y = recent_data['co2'].values

# Fit the model
model = LinearRegression().fit(X, y)

# Predict for the next 10 years
future_years = np.array(range(X.max() + 1, X.max() + 11)).reshape(-1, 1)
predicted_co2 = model.predict(future_years)

# Visualize
future_data = pd.DataFrame({'year': future_years.flatten(), 'predicted_co2': predicted_co2})
fig = px.line(future_data, x='year', y='predicted_co2', title='Projected CO2 Emissions for Canada')

fig.show()


In [135]:
canada_co2_data.co2.min()

522.845

In [132]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import plotly.express as px

# Assuming 'df' is your DataFrame

# Filter and prepare data
canada_co2_data = df[(df['country'] == 'Canada') & (df['year'] >= 2000)][['year', 'co2']]
X = canada_co2_data['year'].values.reshape(-1, 1)
y = canada_co2_data['co2'].values

# Splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit the model with training data
model = LinearRegression().fit(X_train, y_train)

# Predict using the test data
y_pred = model.predict(X_test)

# Calculate metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
print(f"R-squared: {r2}")

# Predict for future years
future_years = np.array(range(X.max() + 1, X.max() + 11)).reshape(-1, 1)
predicted_co2 = model.predict(future_years)

# Visualize future predictions
future_data = pd.DataFrame({'year': future_years.flatten(), 'predicted_co2': predicted_co2})
fig = px.line(future_data, x='year', y='predicted_co2', title='Projected CO2 Emissions for Canada')

fig.show()


Mean Absolute Error: 17.66603854362834
Mean Squared Error: 557.0549252949099
Root Mean Squared Error: 23.602011043445216
R-squared: -0.9373340989278118


In [136]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# Creating polynomial features
degree = 2  # You can adjust the degree of the polynomial
polyreg = make_pipeline(PolynomialFeatures(degree), LinearRegression())

# Fit the model with training data
polyreg.fit(X_train, y_train)

# Predict using the test data
y_pred = polyreg.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
print(f"R-squared: {r2}")


Mean Absolute Error: 17.824569605400416
Mean Squared Error: 477.281705400216
Root Mean Squared Error: 21.84677791804128
R-squared: -0.6598975804347058


In [119]:
# Filter data around the time of the event (e.g., 2018-2022 for COVID-19)
event_data = df[(df['country'] == 'Canada') & (df['year'] >= 2018) & (df['year'] <= 2022)][['year', 'co2']]

fig = px.line(event_data, x='year', y='co2', title='CO2 Emissions in Canada During COVID-19 Pandemic')

fig.show()


## Impact of Economic Development on Environmental Indicators

In [50]:
# Selecting relevant columns
df_sub = df[['country', 'year', 'gdp', 'co2', 'population', 'energy_per_capita']]

In [51]:
# Calculating GDP per capita
df_sub['gdp_per_capita'] = df_sub['gdp'] / df_sub['population']



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [54]:
# Scatter plot of GDP per Capita vs CO2 Emissions
fig = px.scatter(df, x='gdp_per_capita', y='co2',
                 title='GDP per Capita vs CO2 Emissions',
                 labels={'gdp_per_capita': 'GDP per Capita', 'co2': 'CO2 Emissions'})
fig.show()

In [55]:
# Correlation analysis
correlation_matrix = df[['gdp_per_capita', 'co2', 'energy_per_capita']].corr()

# Plotting heatmap of correlation
fig = go.Figure(data=go.Heatmap(
                   z=correlation_matrix,
                   x=correlation_matrix.columns,
                   y=correlation_matrix.columns,
                   hoverongaps = False))
fig.update_layout(title='Correlation Matrix', xaxis_title='Variables', yaxis_title='Variables')
fig.show()