Initialize

In [None]:
# import all modules & packages
import pandas as pd
import os
import seaborn as sns
from matplotlib import pyplot as plt

%matplotlib inline

import warnings

warnings.filterwarnings(action="ignore")

sns.set()

In [None]:
# Aquire and clean data
%run -i "get_data.py"
%run "marie-eda.ipynb"
%run "Ideal dataset creation.ipynb"

In [None]:
# import all datasets
ghg_time = pd.read_csv(os.path.join("data", "ghg_country_timeseries_df.csv"))
ghg_country = pd.read_csv(os.path.join("data", "ghg_country_2018_df.csv"))

# TODO remove below in this cell

ghg_time = ghg_time.rename(
    columns={
        "Total greenhouse gas emissions (kt of CO2 equivalent)": "co2e_total",
        "Total greenhouse gas emissions per capita (kt of CO2 equivalent per person)": "co2e_per_cap",
        "Total greenhouse gas emissions as % of Total (kt of CO2 equivalent)": "co2e_percent",
    }
)

# Plot #1 a i 1: Plot pattern of Total GHG emissions for the world. Find pattern
ghg_time.co2e_total /= 1000000
ghg_time.co2e_per_cap *= 1000

ghg_time.columns

Part 1: Characterizing GHG trends from 1990 - 2018

Plot series #1 (a): Are there observable patterns (linear, exponential) of GHG emissions from 1990 - 2018? What do those patterns look like when countries are grouped by region or income?

Plot series #1: Totals (i)

In [None]:
gt_world = ghg_time[ghg_time["Country Name"] == "World"]

# transform world data to gigatons and per capita to tons for ease of reading
fig, axs = plt.subplots(1, 2, sharex=True, figsize=(15, 7))
fig.suptitle("Global GHG Emissions (CO2e)")
sns.lineplot(ax=axs[0], data=gt_world, x="Year", y="co2e_total")
axs[0].set_title("Total GHG Emissions (CO2e)")
axs[0].set_ylabel("CO2e (gigatons)")

sns.lineplot(ax=axs[1], data=gt_world, x="Year", y="co2e_per_cap")
axs[1].set_title("Per-capita GHG Emissions (CO2e)")
axs[1].set_ylabel("CO2e (tons)")

In [None]:
gi_time = ghg_time.groupby(["Year", "Income Level"]).agg(
    {
        "co2e_total": "sum",
        "co2e_percent": "sum",
        "co2e_per_cap": "mean",
        "Population, total": "sum",
    }
)

fig, axs = plt.subplots(1, 3, sharex=False, figsize=(22.5, 7.5))
fig.suptitle("World GHG Emissions by Income Level")

print(gi_time.columns)
sns.lineplot(
    ax=axs[0],
    data=gi_time,
    x="Year",
    y="co2e_total",
    hue="Income Level",
)
axs[0].set_title("Total World GHG Emissions (CO2e)")
axs[0].set_ylabel("CO2e (gigatons)")

sns.lineplot(
    ax=axs[1],
    data=gi_time,
    x="Year",
    y="co2e_per_cap",
    hue="Income Level",
    legend=False,
)
axs[1].set_title("Per Capita World GHG Emissions (CO2e)")
axs[1].set_ylabel("CO2e (tons)")

sns.lineplot(
    ax=axs[1],
    data=gi_time,
    x="Year",
    y="Population, total",
    hue="Income Level",
    legend=False,
)
axs[1].set_title("Per Capita World GHG Emissions (CO2e)")
axs[1].set_ylabel("CO2e (tons)")

In [None]:
gr_time = ghg_time.groupby(["Region", "Year"]).sum().reset_index()

fig, axs = plt.subplots(1, 2, figsize=(15, 7.5))
fig.suptitle("GHG Emissions by World Region Income Level (CO2e)")

sns.lineplot(
    ax=axs[0],
    data=gr_time,
    x="Year",
    y="co2e_total",
    hue="Region",
).set_title("Total World GHG Emissions (CO2e)")
axs[0].set_ylabel("CO2e (gigatons)")

sns.lineplot(
    ax=axs[1],
    data=gr_time,
    x="Year",
    y="co2e_per_cap",
    hue="Region",
    legend=False,
).set_title("Per Capita World GHG Emissions (CO2e)")
axs[1].set_ylabel("CO2e (kilotons)")

In [None]:
wdi_ind_pivot = pd.read_csv("data/wdi_cleaned.csv")


print(wdi_ind_pivot.query('`Country Name` == "Russian Federation"'))

Part 2: Analyzing a snapshot (2018) of GHG emissions by country

Analysis #1: Are there observable differences in GHG emissions for 2018 amongst different country groupings (region and income groups)?

Analysis #1 a: ANOVA test for region and income groupings, where the null hypothesis is that the per capita emissions are equal for the different groupings. 

Analysis #1 a i: ANOVA test for region

Analysis #1 a ii: ANOVA test for income

Analysis #1 b: Box plots to understand which groupings are the most different

Analysis #1 c: Perform t-tests for the most interesting differences. Arrive to conclusions

Analysis #2: Which variables are related to GHG emissions? 

Analysis #2 a i: Check if asumptions are being met for Log(Emissions)

Analysis #2 a i 1: Normality

Analysis #2 a i 2: Homoscedasticity

Analysis #2 a i 3: Linearity

Analysis #2 a i 4: Outliers/influential points

Analysis #2 a ii: Check if asumptions are being met for % of total GHG Emissions

Analysis #2 a ii 1: Normality

Analysis #2 a ii 2: Homoscedasticity

Analysis #2 a ii 3: Linearity

Analysis #2 a ii 4: Outliers/influential points

Analysis #2 a iii: Check if asumptions are being met for GHG Emissions per capita 

Analysis #2 a iii 1: Normality

Analysis #2 a iii 2: Homoscedasticity

Analysis #2 a iii 3: Linearity

Analysis #2 a iii 4: Outliers/influential points

Are robust methods required?

Analysis #2 b: Linear Regression analysis to understand relationships

Analysis #2 b i: Electricity 

Analysis #2 b i 1: Log(GHG Emissions) ~ Electricity 

Analysis #2 b i 2: GHG Emissions per capita ~ Electricity 

Analysis #2 b i 3: GHG Emissions % of total ~ Electricity 

Analysis #2 b ii: Land use (Agricultural, Arable or Permanent crops)

Analysis #2 b ii 1: Log(GHG Emissions) ~ Land use

Analysis #2 b ii 2: GHG Emissions per capita ~ Land use

Analysis #2 b ii 3: GHG Emissions % of total ~ Land use 

Analysis #2 b iii: GDP or GDP Growth

Analysis #2 b iii 1: Log(GHG Emissions) ~ GDP

Analysis #2 b iii 2: GHG Emissions per capita ~ GDP

Analysis #2 b iii 3: GHG Emissions % of total ~ GDP

Analysis #2 b iv: Urban-to-rural ratio or population density

Analysis #2 b iv 1: Log(GHG Emissions) ~ Urban-to-rural ratio

Analysis #2 b iv 2: GHG Emissions per capita ~ Urban-to-rural ratio

Analysis #2 b iv 3: GHG Emissions % of total ~ Urban-to-rural ratio

Analysis #2 c: Include to all models created in #2 b stratification by income and perform F-tests.

Analysis #2 c i: Electricity + Income

Analysis #2 c i 1: Log(GHG Emissions) ~ Electricity + Income

Analysis #2 c i 2: GHG Emissions per capita ~ Electricity + Income

Analysis #2 c i 3: GHG Emissions % of total ~ Electricity + Income

Analysis #2 c ii: Land use (Agricultural, Arable or Permanent crops)+ Income

Analysis #2 c ii 1: Log(GHG Emissions) ~ Land use + Income

Analysis #2 c ii 2: GHG Emissions per capita ~ Land use + Income

Analysis #2 c ii 3: GHG Emissions % of total ~ Land use + Income

Analysis #2 c iii: GDP or GDP Growth + Income

Analysis #2 c iii 1: Log(GHG Emissions) ~ GDP + Income

Analysis #2 c iii 2: GHG Emissions per capita ~ GDP + Income

Analysis #2 c iii 3: GHG Emissions % of total ~ GDP + Income

Analysis #2 c iv: Urban-to-rural ratio or population density + Income

Analysis #2 c iv 1: Log(GHG Emissions) ~ Urban-to-rural ratio + Income

Analysis #2 c iv 2: GHG Emissions per capita ~ Urban-to-rural ratio + Income

Analysis #2 c iv 3: GHG Emissions % of total ~ Urban-to-rural ratio + Income

Analysis #2 d: Include to all models created in #2 b stratification by region and perform F-tests.

Analysis #2 d i: Electricity + region

Analysis #2 d i 1: Log(GHG Emissions) ~ Electricity + region

Analysis #2 d i 2: GHG Emissions per capita ~ Electricity + region

Analysis #2 d i 3: GHG Emissions % of total ~ Electricity + region

Analysis #2 d ii: Land use (Agricultural, Arable or Permanent crops) + region

Analysis #2 d ii 1: Log(GHG Emissions) ~ Land use + region

Analysis #2 d ii 2: GHG Emissions per capita ~ Land use + region

Analysis #2 d ii 3: GHG Emissions % of total ~ Land use + region

Analysis #2 d iii: GDP or GDP Growth + region

Analysis #2 d iii 1: Log(GHG Emissions) ~ GDP + region

Analysis #2 d iii 2: GHG Emissions per capita ~ GDP + region

Analysis #2 d iii 3: GHG Emissions % of total ~ GDP + region

Analysis #2 d iv: Urban-to-rural ratio or population density + region

Analysis #2 d iv 1: Log(GHG Emissions) ~ Urban-to-rural ratio + region

Analysis #2 d iv 2: GHG Emissions per capita ~ Urban-to-rural ratio + region

Analysis #2 d iv 3: GHG Emissions % of total ~ Urban-to-rural ratio + region

Part 3: Dive deeper in interesting conclusions, patters or further questions