A greenhouse gas (GHG) is a gas that traps heat in the atmosphere. Greenhouse gases let sunlight pass through the atmosphere but prevent heat from leaving the atmosphere, also known as the greenhouse effect. Greenhouse gases are essential to keeping the Earth warm; without them, the Earth would be an average of about 0°F. The primary greenhouse gases in Earth’s atmosphere are water vapor, carbon dioxide, methane, nitrous oxide, and ozone.
  
Carbon dioxide (CO2) is a colourless, odourless and non-poisonous gas formed by combustion of carbon and in the respiration of living organisms and is considered a greenhouse gas. Emissions means the release of greenhouse gases and/or their precursors into the atmosphere over a specified area and period of time. Carbon dioxide emissions or CO2 emissions are emissions stemming from the burning of fossil fuels and the manufacture of cement; they include carbon dioxide produced during consumption of solid, liquid, and gas fuels as well as gas flaring.

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame,Series
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('owid-co2-data.csv')

In [4]:
df.head()

Unnamed: 0,iso_code,country,year,co2,co2_per_capita,trade_co2,cement_co2,cement_co2_per_capita,coal_co2,coal_co2_per_capita,...,ghg_excluding_lucf_per_capita,methane,methane_per_capita,nitrous_oxide,nitrous_oxide_per_capita,population,gdp,primary_energy_consumption,energy_per_capita,energy_per_gdp
0,AFG,Afghanistan,1949,0.015,0.002,,,,0.015,0.002,...,,,,,,7624058.0,,,,
1,AFG,Afghanistan,1950,0.084,0.011,,,,0.021,0.003,...,,,,,,7752117.0,9421400000.0,,,
2,AFG,Afghanistan,1951,0.092,0.012,,,,0.026,0.003,...,,,,,,7840151.0,9692280000.0,,,
3,AFG,Afghanistan,1952,0.092,0.012,,,,0.032,0.004,...,,,,,,7935996.0,10017320000.0,,,
4,AFG,Afghanistan,1953,0.106,0.013,,,,0.038,0.005,...,,,,,,8039684.0,10630520000.0,,,


In [5]:
df.shape

(25191, 60)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25191 entries, 0 to 25190
Data columns (total 60 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   iso_code                             21960 non-null  object 
 1   country                              25191 non-null  object 
 2   year                                 25191 non-null  int64  
 3   co2                                  23949 non-null  float64
 4   co2_per_capita                       23307 non-null  float64
 5   trade_co2                            3976 non-null   float64
 6   cement_co2                           12248 non-null  float64
 7   cement_co2_per_capita                12218 non-null  float64
 8   coal_co2                             17188 non-null  float64
 9   coal_co2_per_capita                  16860 non-null  float64
 10  flaring_co2                          4382 non-null   float64
 11  flaring_co2_per_capita      

In [7]:
# Creating a subset of regions 
regions = df[(df['country'] == 'Africa' ) | (df['country'] == 'Asia')|(df['country'] == 'Asia (excl. China & India)')|
(df['country'] == 'EU-27')|(df['country'] == 'EU-28')|(df['country'] == 'Europe')| (df['country'] == 'Europe (excl. EU-27)')| 
(df['country'] == 'Europe (excl. EU-28)')| (df['country'] == 'International transport')|(df['country']== 'North America')|
(df['country'] == 'South America')]
regions = regions.loc[regions['year'] >= 2000]
regions.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 231 entries, 188 to 20867
Data columns (total 60 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   iso_code                             0 non-null      object 
 1   country                              231 non-null    object 
 2   year                                 231 non-null    int64  
 3   co2                                  231 non-null    float64
 4   co2_per_capita                       210 non-null    float64
 5   trade_co2                            200 non-null    float64
 6   cement_co2                           210 non-null    float64
 7   cement_co2_per_capita                210 non-null    float64
 8   coal_co2                             210 non-null    float64
 9   coal_co2_per_capita                  210 non-null    float64
 10  flaring_co2                          210 non-null    float64
 11  flaring_co2_per_capita      

In [8]:
regions.head()

Unnamed: 0,iso_code,country,year,co2,co2_per_capita,trade_co2,cement_co2,cement_co2_per_capita,coal_co2,coal_co2_per_capita,...,ghg_excluding_lucf_per_capita,methane,methane_per_capita,nitrous_oxide,nitrous_oxide_per_capita,population,gdp,primary_energy_consumption,energy_per_capita,energy_per_gdp
188,,Africa,2000,886.562,1.094,-266.512,31.51,0.039,370.247,0.457,...,,,,,,810984230.0,,3192.725,3936.853,
189,,Africa,2001,884.168,1.064,-275.648,32.908,0.04,371.98,0.448,...,,,,,,830902539.0,,3317.609,3992.775,
190,,Africa,2002,892.575,1.049,-292.434,35.231,0.041,361.119,0.424,...,,,,,,851298437.0,,3354.459,3940.405,
191,,Africa,2003,967.22,1.109,-298.849,35.508,0.041,390.893,0.448,...,,,,,,872248336.0,,3513.656,4028.276,
192,,Africa,2004,1036.686,1.16,-320.442,38.17,0.043,417.515,0.467,...,,,,,,893842786.0,,3765.085,4212.245,


In [21]:
df = df.sort_values("iso_code")

In [24]:
df["iso_code"].isnull().sum()

503

In [27]:
# Dropping regions from the main dataframe
df.drop(df.tail(503).index, inplace = True)

In [48]:
# Checking the dataframe if it contains any region name
df.loc[(df['country'] == 'Europe')|(df['country'] == 'Africa')|(df['country'] == 'Europe (excl. EU-27)')]

Unnamed: 0,country,year,co2,co2_per_capita,oil_co2,oil_co2_per_capita,co2_growth_prct,co2_growth_abs,co2_per_gdp,co2_per_unit_energy,...,total_ghg_excluding_lucf,ghg_excluding_lucf_per_capita,methane,methane_per_capita,nitrous_oxide,nitrous_oxide_per_capita,population,gdp,primary_energy_consumption,energy_per_capita


In [33]:
# Filtering the date 
df = df.loc[df['year'] >= 2000]

In [34]:
# Dropping columns which we don't need
df.drop(['iso_code'], axis=1, inplace=True)

In [35]:
# Checking the data after dropping and filtering the date
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4591 entries, 1052 to 25190
Data columns (total 54 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   country                              4591 non-null   object 
 1   year                                 4591 non-null   int64  
 2   co2                                  4534 non-null   float64
 3   co2_per_capita                       4513 non-null   float64
 4   cement_co2                           3073 non-null   float64
 5   cement_co2_per_capita                3060 non-null   float64
 6   gas_co2                              2481 non-null   float64
 7   gas_co2_per_capita                   2481 non-null   float64
 8   oil_co2                              4534 non-null   float64
 9   oil_co2_per_capita                   4513 non-null   float64
 10  other_industry_co2                   966 non-null    float64
 11  other_co2_per_capita      

In [36]:
df.isna().sum()

country                                   0
year                                      0
co2                                      57
co2_per_capita                           78
cement_co2                             1518
cement_co2_per_capita                  1531
gas_co2                                2110
gas_co2_per_capita                     2110
oil_co2                                  57
oil_co2_per_capita                       78
other_industry_co2                     3625
other_co2_per_capita                   3625
co2_growth_prct                           4
co2_growth_abs                           59
co2_per_gdp                            1451
co2_per_unit_energy                     912
consumption_co2                        2195
consumption_co2_per_capita             2195
consumption_co2_per_gdp                2349
cumulative_co2                           57
cumulative_cement_co2                  1518
cumulative_coal_co2                    2028
cumulative_flaring_co2          

In [40]:
#Dropping columns which have a lot of missing values
df.drop(df.iloc[:, 4:8], inplace = True, axis = 1)

In [43]:
df.drop(df.columns[[6,7,12,13,14,16,17,18,19,21,22,24,25,26,27,29,31,32,33,34,36,49]],axis =1, inplace = True)

In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4591 entries, 1052 to 25190
Data columns (total 28 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   country                          4591 non-null   object 
 1   year                             4591 non-null   int64  
 2   co2                              4534 non-null   float64
 3   co2_per_capita                   4513 non-null   float64
 4   oil_co2                          4534 non-null   float64
 5   oil_co2_per_capita               4513 non-null   float64
 6   co2_growth_prct                  4587 non-null   float64
 7   co2_growth_abs                   4532 non-null   float64
 8   co2_per_gdp                      3140 non-null   float64
 9   co2_per_unit_energy              3679 non-null   float64
 10  cumulative_co2                   4534 non-null   float64
 11  cumulative_oil_co2               4534 non-null   float64
 12  share_global_co2

In [45]:
df.head()

Unnamed: 0,country,year,co2,co2_per_capita,oil_co2,oil_co2_per_capita,co2_growth_prct,co2_growth_abs,co2_per_gdp,co2_per_unit_energy,...,total_ghg_excluding_lucf,ghg_excluding_lucf_per_capita,methane,methane_per_capita,nitrous_oxide,nitrous_oxide_per_capita,population,gdp,primary_energy_consumption,energy_per_capita
1052,Aruba,2020,0.753,7.055,0.753,7.055,-11.51,-0.098,,,...,,,,,,,106766.0,,,
1034,Aruba,2002,2.437,25.65,2.437,25.65,1.22,0.029,,0.515,...,,,,,,,94992.0,,4.729,49778.552
1035,Aruba,2003,2.561,26.399,2.561,26.399,5.11,0.125,,0.486,...,,,,,,,97016.0,,5.266,54290.27
1036,Aruba,2004,2.616,26.494,2.616,26.494,2.15,0.055,,0.476,...,,,,,,,98744.0,,5.492,55472.046
1037,Aruba,2005,2.719,27.179,2.719,27.179,3.92,0.103,,0.458,...,,,,,,,100028.0,,5.931,59305.478
