<h4>Global CO<sub>2</sub> Emissions Dataset from Maven Analytics</h4>

<p>Objectives:</p>
<ul>
    <li>What is the trend of CO<sub>2</sub> emissions in the world over time?
    Can it be related to the demographics?</li>
    <li>Which country contributes both the least and the most emissions?</li>
    <li>What are the major sources of these emissions? </li>
    <li>Is temperature related to the amount of CO<sub>2</sub> emissions?</li>
</ul>

In [245]:
#Libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

In [246]:
df = pd.read_csv("data/co2_data.csv")
df.shape

(50598, 79)

In [247]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50598 entries, 0 to 50597
Data columns (total 79 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   country                                    50598 non-null  object 
 1   year                                       50598 non-null  int64  
 2   iso_code                                   42142 non-null  object 
 3   population                                 40008 non-null  float64
 4   gdp                                        14564 non-null  float64
 5   cement_co2                                 24974 non-null  float64
 6   cement_co2_per_capita                      22714 non-null  float64
 7   co2                                        31349 non-null  float64
 8   co2_growth_abs                             29010 non-null  float64
 9   co2_growth_prct                            25032 non-null  float64
 10  co2_including_luc     

<p>Filtering the data columns based on the objectives.</p>

In [248]:
#Columns containing demographics and CO2 emissions
df_demographics = df[['country', 'year', 'iso_code', 'population', 'gdp','co2',
'land_use_change_co2']]

# Columns containing the sources of emissions 
df_sources = df[['cement_co2','coal_co2','flaring_co2','gas_co2','land_use_change_co2',\
    'consumption_co2','oil_co2','other_industry_co2']]

# Columns containing temperature changes
df_temp = df[[col for col in df.columns if 'temperature' in col]]

#Columns for the methane and NOx
df_chem = df[['year','methane','nitrous_oxide']]

In [249]:
# By pd.melt, reverse the columns into rows and maintain the values
df_sources =pd.melt(df_sources.reset_index(),id_vars=['index'],value_vars=df_sources.columns, \
    var_name="emission_sources",value_name="million_tons").copy()
    
df_sources = df_sources.groupby('emission_sources')['million_tons'].sum().reset_index()

In [260]:
# Adding the Percentage Column
df_sources['percentage'] = (df_sources['million_tons']/df_sources['million_tons'].sum())*100
df_sources['percentage'] = df_sources['percentage'].apply(lambda x: round(x,0) if x > 1 else round(x,2))
df_sources

Unnamed: 0,emission_sources,million_tons,percentage
0,cement_co2,209594.066,1.0
1,coal_co2,3870596.153,21.0
2,consumption_co2,6073187.268,33.0
3,flaring_co2,86909.154,0.47
4,gas_co2,1249162.183,7.0
5,land_use_change_co2,4333960.833,23.0
6,oil_co2,2789318.374,15.0
7,other_industry_co2,43974.095,0.24


In [252]:
# Combining those selected data columns
filtered_df = pd.concat([df_demographics,df_temp],axis=1)
filtered_df.shape

(50598, 12)

In [253]:
# Add country foreign key
unique_countries = filtered_df['country'].unique()
country_values = pd.Series(np.arange(len(unique_countries)), index=unique_countries)
filtered_df['country_key'] = filtered_df['country'].apply(lambda x: str(10 + country_values[x]))

filtered_df.insert(0, 'country_key', filtered_df.pop('country_key'))  # Insert 'country_key' column at the beginning


In [254]:
filtered_df.head()

Unnamed: 0,country_key,country,year,iso_code,population,gdp,co2,land_use_change_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o
0,10,Afghanistan,1850,AFG,3752993.0,,,2.931,,,,,
1,10,Afghanistan,1851,AFG,3767956.0,,,2.968,0.165,0.0,0.0,0.0,0.0
2,10,Afghanistan,1852,AFG,3783940.0,,,2.968,0.164,0.0,0.0,0.0,0.0
3,10,Afghanistan,1853,AFG,3800954.0,,,3.004,0.164,0.0,0.0,0.0,0.0
4,10,Afghanistan,1854,AFG,3818038.0,,,3.004,0.163,0.0,0.0,0.0,0.0


In [255]:

temp_cols

Unnamed: 0,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o
0,,,,,
1,0.165,0.000,0.000,0.000,0.0
2,0.164,0.000,0.000,0.000,0.0
3,0.164,0.000,0.000,0.000,0.0
4,0.163,0.000,0.000,0.000,0.0
...,...,...,...,...,...
50593,0.114,0.001,0.001,0.002,0.0
50594,0.114,0.001,0.001,0.002,0.0
50595,0.113,0.001,0.001,0.002,0.0
50596,0.112,0.001,0.001,0.002,0.0
