# 🌍 Global CO₂ Emissions Data Analysis

In this project, we analyze historical CO₂ emissions data by country from 1750 to 2021.

**Dataset Source:** [Our World in Data – CO₂ Data](https://github.com/owid/co2-data)

We'll explore:
- Global emission trends
- Top contributing countries
- Emissions per capita
- Correlation with GDP and population

Let's begin by importing the dataset and understanding its structure.


In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [7]:
df = pd.read_csv('C:/Users/berek/owid-co2-data.txt')

In [8]:
df.head()

Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
0,Afghanistan,1750,AFG,2802560.0,,0.0,0.0,,,,...,,,,,,,,,,
1,Afghanistan,1751,AFG,,,0.0,,,,,...,,,,,,,,,,
2,Afghanistan,1752,AFG,,,0.0,,,,,...,,,,,,,,,,
3,Afghanistan,1753,AFG,,,0.0,,,,,...,,,,,,,,,,
4,Afghanistan,1754,AFG,,,0.0,,,,,...,,,,,,,,,,


In [9]:
df.describe()

Unnamed: 0,year,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,co2_including_luc,co2_including_luc_growth_abs,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
count,50191.0,41019.0,15251.0,28863.0,25358.0,29137.0,26981.0,26002.0,23585.0,23285.0,...,2108.0,41001.0,38060.0,41001.0,41001.0,38060.0,37410.0,37236.0,4535.0,4535.0
mean,1919.883067,56861410.0,330049500000.0,7.767746,0.059036,415.698178,6.208882,43.104462,535.581202,7.214604,...,7.512655,2.269285,0.003026,0.00767,0.011023,0.000509,488.542225,316.133529,-7.232399,20.52444
std,65.627296,319990500.0,3086383000000.0,62.595292,0.120328,1945.843973,62.322553,1729.939596,2202.219657,99.34798,...,17.671054,9.315325,0.016519,0.043694,0.061901,0.003043,2392.57991,1839.602293,250.640012,52.744956
min,1750.0,215.0,49980000.0,0.0,0.0,0.0,-1977.75,-100.0,-99.693,-2325.5,...,0.0,-0.81,-0.001,0.0,-0.001,0.0,-14.961,0.0,-2195.952,-98.849
25%,1875.0,327313.0,7874038000.0,0.0,0.0,0.374,-0.005,-1.1025,6.418,-0.908,...,0.20475,0.004,0.0,0.0,0.0,0.0,1.835,0.235,-3.1795,-6.168
50%,1924.0,2289522.0,27438610000.0,0.0,0.001,4.99,0.044,3.8035,27.691,0.078,...,0.838,0.078,0.0,0.0,0.0,0.0,15.0075,2.371,1.518,8.701
75%,1974.0,9862459.0,121262700000.0,0.486,0.07575,53.273,1.002,10.89075,123.959,2.62,...,3.211,0.359,0.001,0.001,0.001,0.0,78.24275,29.3375,9.1535,32.666
max,2023.0,8091735000.0,130112600000000.0,1696.308,2.484,37791.57,1865.208,180870.0,41416.48,2340.184,...,100.0,100.0,0.422,1.161,1.668,0.085,53816.852,44114.785,1798.999,568.635


## 🔍 Step 8: Exploratory Data Analysis (EDA)

In this section, we’ll explore the structure and quality of the dataset:

- Dimensions of the data
- Missing values and data types
- Column overview
- Simple visualizations to understand trends


In [10]:
df.shape


(50191, 79)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50191 entries, 0 to 50190
Data columns (total 79 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   country                                    50191 non-null  object 
 1   year                                       50191 non-null  int64  
 2   iso_code                                   42262 non-null  object 
 3   population                                 41019 non-null  float64
 4   gdp                                        15251 non-null  float64
 5   cement_co2                                 28863 non-null  float64
 6   cement_co2_per_capita                      25358 non-null  float64
 7   co2                                        29137 non-null  float64
 8   co2_growth_abs                             26981 non-null  float64
 9   co2_growth_prct                            26002 non-null  float64
 10  co2_including_luc     

In [12]:
df.columns.tolist()

['country',
 'year',
 'iso_code',
 'population',
 'gdp',
 'cement_co2',
 'cement_co2_per_capita',
 'co2',
 'co2_growth_abs',
 'co2_growth_prct',
 'co2_including_luc',
 'co2_including_luc_growth_abs',
 'co2_including_luc_growth_prct',
 'co2_including_luc_per_capita',
 'co2_including_luc_per_gdp',
 'co2_including_luc_per_unit_energy',
 'co2_per_capita',
 'co2_per_gdp',
 'co2_per_unit_energy',
 'coal_co2',
 'coal_co2_per_capita',
 'consumption_co2',
 'consumption_co2_per_capita',
 'consumption_co2_per_gdp',
 'cumulative_cement_co2',
 'cumulative_co2',
 'cumulative_co2_including_luc',
 'cumulative_coal_co2',
 'cumulative_flaring_co2',
 'cumulative_gas_co2',
 'cumulative_luc_co2',
 'cumulative_oil_co2',
 'cumulative_other_co2',
 'energy_per_capita',
 'energy_per_gdp',
 'flaring_co2',
 'flaring_co2_per_capita',
 'gas_co2',
 'gas_co2_per_capita',
 'ghg_excluding_lucf_per_capita',
 'ghg_per_capita',
 'land_use_change_co2',
 'land_use_change_co2_per_capita',
 'methane',
 'methane_per_capita',
 