<h4><strong>1. Dataset Prep:</strong></h4>

Source : [Our world in data](https://ourworldindata.org/co2-and-greenhouse-gas-emissions)

There are a total of four datasets downloaded for this project.


*   Annual Total Greenhouse Gas Emissions
*   Annual Greenhouse Gas Emissions - By Carbon Dioxide, Methane and Nitrous Oxide

*   Carbon tax status
*   Contribution to Temperature rise by Greenhouse Gases

The first two will be combined into a single file (in Tidy format) and the other two will be rendered separately in the final plot. This file will only focus on combining the first two datasets and analysing them, while the later two will be directly loaded in the actual project file.



In [1]:
import pandas as pd
import requests

In [2]:
df_total = pd.read_csv("https://ourworldindata.org/grapher/total-ghg-emissions.csv?v=1&csvType=full&useColumnShortNames=true", storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})
df_total.head()

Unnamed: 0,Entity,Code,Year,annual_emissions_ghg_total_co2eq
0,Afghanistan,AFG,1850,7435743.5
1,Afghanistan,AFG,1851,7499858.5
2,Afghanistan,AFG,1852,7560495.5
3,Afghanistan,AFG,1853,7619898.0
4,Afghanistan,AFG,1854,7678120.0


In [3]:
df_gas = pd.read_csv("https://ourworldindata.org/grapher/ghg-emissions-by-gas.csv?v=1&csvType=full&useColumnShortNames=true", storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})
df_gas.head()

Unnamed: 0,Entity,Code,Year,annual_emissions_n2o_total_co2eq,annual_emissions_ch4_total_co2eq,annual_emissions_co2_total
0,Afghanistan,AFG,1850,224421.52,3594926.2,3616395.5
1,Afghanistan,AFG,1851,229096.17,3615134.2,3655627.8
2,Afghanistan,AFG,1852,233650.48,3635346.8,3691498.5
3,Afghanistan,AFG,1853,238009.98,3655563.5,3726324.5
4,Afghanistan,AFG,1854,242100.23,3675785.0,3760235.0


In [4]:
#Renaming the columns
df_gas.rename(columns={
    "annual_emissions_n2o_total_co2eq": "N₂O Emissions",
    "annual_emissions_ch4_total_co2eq": "Methane Emissions",
    "annual_emissions_co2_total": "CO₂ Emissions"
}, inplace=True)

df_total.rename(columns={
    "annual_emissions_ghg_total_co2eq": "Overall Emissions"
}, inplace=True)


In [5]:
#Combining all the acquired datasets
df_combined = pd.merge(df_gas, df_total, on=["Entity", "Code", "Year"], how="outer")
df_combined.head()

Unnamed: 0,Entity,Code,Year,N₂O Emissions,Methane Emissions,CO₂ Emissions,Overall Emissions
0,Afghanistan,AFG,1850,224421.52,3594926.2,3616395.5,7435743.5
1,Afghanistan,AFG,1851,229096.17,3615134.2,3655627.8,7499858.5
2,Afghanistan,AFG,1852,233650.48,3635346.8,3691498.5,7560495.5
3,Afghanistan,AFG,1853,238009.98,3655563.5,3726324.5,7619898.0
4,Afghanistan,AFG,1854,242100.23,3675785.0,3760235.0,7678120.0


In [6]:
df_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41238 entries, 0 to 41237
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Entity             41238 non-null  object 
 1   Code               37758 non-null  object 
 2   Year               41238 non-null  int64  
 3   N₂O Emissions      38280 non-null  float64
 4   Methane Emissions  37410 non-null  float64
 5   CO₂ Emissions      41238 non-null  float64
 6   Overall Emissions  37410 non-null  float64
dtypes: float64(4), int64(1), object(2)
memory usage: 2.2+ MB


In [7]:
df_combined.describe()

Unnamed: 0,Year,N₂O Emissions,Methane Emissions,CO₂ Emissions,Overall Emissions
count,41238.0,38280.0,37410.0,41238.0,37410.0
mean,1936.5,25678950.0,101812400.0,327032800.0,488542200.0
std,50.229253,131139700.0,473055100.0,1719202000.0,2392580000.0
min,1850.0,0.0,145.2044,-17819870.0,-14961390.0
25%,1893.0,72425.05,464802.1,212512.0,1835210.0
50%,1936.5,720941.1,3719967.0,6115528.0,15007530.0
75%,1980.0,5099800.0,20169700.0,43148890.0,78243130.0
max,2023.0,2996286000.0,10529810000.0,40305960000.0,53816850000.0


In [8]:
#Tidy data format ensuring easy access of data for final plot
df_tidy = pd.melt(
    df_combined,
    id_vars=["Entity", "Code",  "Year"],
    value_vars=["N₂O Emissions", "Methane Emissions", "CO₂ Emissions", "Overall Emissions"],
    var_name="Gas Type",
    value_name="Emissions"
)

In [9]:
df_tidy.head()

Unnamed: 0,Entity,Code,Year,Gas Type,Emissions
0,Afghanistan,AFG,1850,N₂O Emissions,224421.52
1,Afghanistan,AFG,1851,N₂O Emissions,229096.17
2,Afghanistan,AFG,1852,N₂O Emissions,233650.48
3,Afghanistan,AFG,1853,N₂O Emissions,238009.98
4,Afghanistan,AFG,1854,N₂O Emissions,242100.23


In [10]:
df_tidy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 164952 entries, 0 to 164951
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Entity     164952 non-null  object 
 1   Code       151032 non-null  object 
 2   Year       164952 non-null  int64  
 3   Gas Type   164952 non-null  object 
 4   Emissions  154338 non-null  float64
dtypes: float64(1), int64(1), object(3)
memory usage: 6.3+ MB


<h4><strong>2. Analysis</strong></h4>

In [11]:
# Extracting unique years for each gas type
unique_years = df_tidy.groupby('Gas Type')['Year'].unique().reset_index()

# Determine the earliest and latest year for each gas type
early_latest_years = unique_years.copy()

# Mapping the earliest and latest year
early_latest_years['Earliest Year'] = early_latest_years['Year'].apply(lambda x: min(x))
early_latest_years['Latest Year'] = early_latest_years['Year'].apply(lambda x: max(x))

# Displaying the result
result = early_latest_years[['Gas Type', 'Earliest Year', 'Latest Year']]

In [12]:
result

Unnamed: 0,Gas Type,Earliest Year,Latest Year
0,CO₂ Emissions,1850,2023
1,Methane Emissions,1850,2023
2,N₂O Emissions,1850,2023
3,Overall Emissions,1850,2023


All of thr Gas types have the same year scale.

In [14]:
df_tidy['Emissions'].describe()

Unnamed: 0,Emissions
count,154338.0
mean,236846000.0
std,1506237000.0
min,-17819870.0
25%,274741.6
50%,4069672.0
75%,29765390.0
max,53816850000.0


In [15]:
# Find records with negative emissions values
negative_emissions = df_tidy[df_tidy['Emissions'] < 0]

# Show the records with negative emissions
print(negative_emissions)

                 Entity Code  Year           Gas Type   Emissions
83226           Andorra  AND  1904      CO₂ Emissions    -421.360
83227           Andorra  AND  1905      CO₂ Emissions    -476.320
83228           Andorra  AND  1906      CO₂ Emissions    -916.000
83229           Andorra  AND  1907      CO₂ Emissions   -1135.840
83230           Andorra  AND  1908      CO₂ Emissions   -1584.680
...                 ...  ...   ...                ...         ...
157900  Solomon Islands  SLB  1932  Overall Emissions -137530.000
157909  Solomon Islands  SLB  1941  Overall Emissions -118830.670
157913  Solomon Islands  SLB  1945  Overall Emissions -103931.420
157925  Solomon Islands  SLB  1957  Overall Emissions -102100.664
157931  Solomon Islands  SLB  1963  Overall Emissions  -43720.832

[2112 rows x 5 columns]


There are negative values for Emissions. These are not errors but there could be reasons for these negative values. Negative emissions typically indicate that these areas are acting as carbon sinks rather than carbon sources.

*   Forests and vegetation absorb more CO₂ through photosynthesis than is released.


*   Land use changes result in increased carbon sequestration.


*   Natural ecosystems like peatlands and wetlands store carbon.







In [17]:
#Downloading the dataset
df_tidy.to_csv('tidy_df.csv', index=False)