# Analyzing industry carbon emissions

## 1.Overview
Product emissions make up more than 75% of global emissions. But which industries are the worst offenders?

## 2. Objective

In this project, we will discover some aspects about America's highest and lowest amount of carbon emission by Year, the country with the most significant and most downward emissions per capital in 2017 compared to the changes of emissions per capital in 1975.


## 3. Data Collection

The [database](https://www.kaggle.com/datasets/vineethakkinapalli/united-nations-environment-data?select=Carbon+Dioxide+Emission+Estimates.csv) contains the amount of carbon emissions per capital in a country as well as total carbon emission in countries over the world.

<u>Database Columns</u>
* Country(Area)
* Year
* Series
* Value

### 3.1. Import libraries

In [1]:
# Data manipulation
import pandas as pd
# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Database connection
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
from urllib.parse import quote_plus

### 3.2. Database Connection

In [2]:
load_dotenv()

# MySQL database connection using SQLAlchemy
username = os.getenv('MYSQL_ROOT_USER')
password = os.getenv('MYSQL_ROOT_PASSWORD')
host = "localhost"
port = "3306"
databasename = "PROJECT"

# URL-encode the password
encoded_password = quote_plus(password)

# Construct the connection string with the encoded password
db_uri = f"mysql+pymysql://{username}:{encoded_password}@{host}:{port}/{databasename}"
# set echo=False and all logging will be disabled
engine = create_engine(db_uri,echo=False)

In [3]:
%load_ext sql
%sql engine
%config SqlMagic.displaylimit = 20

### 3.2 Data loading
First, create a DataFrame in Python using pandas and load the dataset

In [4]:
# Load dataset "Carbon_Emission".csv
data = pd.read_csv('data/Carbon_Emission.csv')
print('Dataframe shape:',data.shape)
display('Dataframe',data.head())

Dataframe shape: (2132, 4)


'Dataframe'

Unnamed: 0,CO2 emission estimates,Year,Series,Value
0,Albania,1975,Emissions (thousand metric tons of carbon diox...,4338.334
1,Albania,1985,Emissions (thousand metric tons of carbon diox...,6929.926
2,Albania,1995,Emissions (thousand metric tons of carbon diox...,1848.549
3,Albania,2005,Emissions (thousand metric tons of carbon diox...,3825.184
4,Albania,2010,Emissions (thousand metric tons of carbon diox...,3930.295


**OBS:** The amount of data is almost 2K, so we can load the data directly into the database without specifying the chunk size.

In [5]:
# Rename column "CO2 emission estimates" before import to MySQL Database
data.rename(columns={"CO2 emission estimates": "Country"}, inplace=True)
data.head()

Unnamed: 0,Country,Year,Series,Value
0,Albania,1975,Emissions (thousand metric tons of carbon diox...,4338.334
1,Albania,1985,Emissions (thousand metric tons of carbon diox...,6929.926
2,Albania,1995,Emissions (thousand metric tons of carbon diox...,1848.549
3,Albania,2005,Emissions (thousand metric tons of carbon diox...,3825.184
4,Albania,2010,Emissions (thousand metric tons of carbon diox...,3930.295


Second, verify if the `Carbon_Emission` table exists

In [6]:
%%sql
-- Check "Carbon_Emission" table if no exist:
SHOW TABLES LIKE 'Carbon_Emission';

Tables_in_PROJECT (Carbon_Emission)


In [13]:
%%sql
-- Remove 'Carbon_Emission' table, if exists:
DROP TABLE IF EXISTS Carbon_Emission;

In [14]:
%%sql
-- Create 'Carbon_Emission' table:
CREATE TABLE Carbon_Emission(
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    Country VARCHAR(50),
    Year INT,
    Series VARCHAR(255),
    Value DECIMAL(10, 3)
);

Finally, populate the `Carbon_Emission` table from the python dataframe.

In [15]:
data.to_sql(name = "Carbon_Emission", 
            con = engine,
            if_exists = 'append',
           index=False)
engine.dispose()

In [16]:
%%sql
SELECT * FROM Carbon_Emission LIMIT 5;

id,Country,Year,Series,Value
1,Albania,1975,Emissions (thousand metric tons of carbon dioxide),4338.334
2,Albania,1985,Emissions (thousand metric tons of carbon dioxide),6929.926
3,Albania,1995,Emissions (thousand metric tons of carbon dioxide),1848.549
4,Albania,2005,Emissions (thousand metric tons of carbon dioxide),3825.184
5,Albania,2010,Emissions (thousand metric tons of carbon dioxide),3930.295


## 4.Exploratory Data Analysis(EDA):

### 4.1. Data Dimensions

In [19]:
%%sql
SELECT COUNT(id) AS Total_rows_info FROM Carbon_Emission;

Total_rows_info
2132


**OBS:** There are 2132 rows.

### 4.2. Data Type

In [20]:
%%sql
DESCRIBE Carbon_Emission;

Field,Type,Null,Key,Default,Extra
id,int,NO,PRI,,auto_increment
Country,varchar(50),YES,,,
Year,int,YES,,,
Series,varchar(255),YES,,,
Value,"decimal(10,3)",YES,,,


**OBS:** Data type is ok.

### 4.3. Missing values
Let's identify missing values to explore the limitations of our database.

In [22]:
%%sql
-- For Numeric, date and time Data Types: missing value = NULL
-- For String: missing value = NULL or ''
SELECT
    COUNT(CASE WHEN Country IS NULL OR Country = '' THEN 1 END) AS missing_country,
    COUNT(CASE WHEN Year  IS NULL THEN 1 END) AS missing_year,
    COUNT(CASE WHEN Series IS NULL OR Series = '' THEN 1 END) AS missing_series,
    COUNT(CASE WHEN Value IS NULL THEN 1 END) AS missing_value
FROM Carbon_Emission;

missing_country,missing_year,missing_series,missing_value
0,0,0,0


**OBS:** There are no missing values.

### 4.4. Duplicated rows:
Let's Check for duplicated rows:

In [23]:
%%sql
SELECT
    Country, Year, Series, Value, COUNT(*)
FROM Carbon_Emission
GROUP BY Country, Year, Series, Value
HAVING COUNT(*) > 1;

Country,Year,Series,Value,COUNT(*)


**OBS:** There are duplicated rows.

### 4.5. Distinct Values and Range


Let's check the range of the column "Year", distinct value of the column "Series", the range of the column "Value" with different conditions. . 

In [26]:
%%sql
-- Check distinct values for 'Series' Column
SELECT DISTINCT(Series) AS Distinct_series
FROM Carbon_Emission;

Distinct_series
Emissions (thousand metric tons of carbon dioxide)
Emissions per capita (metric tons of carbon dioxide)


In [27]:
%%sql
-- Check the range (min_value and max_value) in 'Year' Column:
SELECT MIN(Year) AS min_year, MAX(Year) AS max_year
FROM Carbon_Emission;

min_year,max_year
1975,2017


In [28]:
%%sql
-- Check the range (min_value and max_value) in 'Value' Column:
-- Filer by Series = 'Emissions (thousand metric tons of carbon dioxide)'
SELECT MIN(Value) AS min_value, MAX(Value) AS max_value
FROM Carbon_Emission
WHERE Series = 'Emissions (thousand metric tons of carbon dioxide)';

min_value,max_value
62.036,9257933.9


In [29]:
%%sql
-- Check the range (min_value and max_value) in 'Value' Column:
-- Filer by Series = 'Emissions per capita (metric tons of carbon dioxide)'
SELECT
    MIN(Value) AS min_value_per_capita,
    MAX(Value) AS max_value_per_capita
FROM Carbon_Emission
WHERE Series = 'Emissions per capita (metric tons of carbon dioxide)';

min_value_per_capita,max_value_per_capita
0.023,38.395


**OBS:** 
* We found that the `Series` column has 2 distinct values.
    * Emissions (thousand metric tons of carbon dioxide)
    * Emissions per capita (metric tons of carbon dioxide)
* We can split these values into two different tables, to facilitate the analysis..

## 5.Data Preprocessing:
### 5.1. Create emissions table 
Let's create a new table called `emissions` for the series `Emissions (thousand metric tons of carbon dioxide)`

In [30]:
%%sql
-- Create the 'emissions' table, similar to the 'Carbon_Emission' table.
-- For the 'Series' column, we can use the ENUM data type.
DROP TABLE IF EXISTS emissions;
CREATE TABLE emissions(
    Country nvarchar(50),
    Year INT, 
    Series ENUM('Emissions (thousand metric tons of carbon dioxide)'), 
    Value decimal(10,3)
);

Now, let's insert values from the `Carbon_Emission` table where Series is `Emission (thousand metric tons of carbon dioxide)` into the newly created `emissions` table.

In [32]:
%%sql

INSERT INTO emissions
SELECT Country, Year, Series, Value
FROM Carbon_Emission
WHERE Series = 'Emissions (thousand metric tons of carbon dioxide)';

SELECT * FROM emissions LIMIT 5;

Country,Year,Series,Value
Albania,1975,Emissions (thousand metric tons of carbon dioxide),4338.334
Albania,1985,Emissions (thousand metric tons of carbon dioxide),6929.926
Albania,1995,Emissions (thousand metric tons of carbon dioxide),1848.549
Albania,2005,Emissions (thousand metric tons of carbon dioxide),3825.184
Albania,2010,Emissions (thousand metric tons of carbon dioxide),3930.295


### 5.2. Create perCapital table
Now, let's create a new table called `perCapital` for the series `Emissions per capita (metric tons of carbon dioxide)`

In [33]:
%%sql
-- Create the 'emissions' table, similar to the 'Carbon_Emission' table.
-- For the 'Series' column, we can use the ENUM data type.
DROP TABLE IF EXISTS perCapital;
CREATE TABLE perCapital(
    Country nvarchar(50),
    Year int,
    Series ENUM('Emissions per capita (metric tons of carbon dioxide)'),
    Value decimal(10,3)
);

Let's insert values from the `Carbon_Emission` table where `series = Emissions per capita (metric tons of carbon dioxide)` into the created `perCapital` table.

In [34]:
%%sql

INSERT INTO perCapital
SELECT Country, Year, Series, Value
FROM Carbon_Emission
WHERE Series = 'Emissions per capita (metric tons of carbon dioxide)';

SELECT * FROM perCapital LIMIT 5;

Country,Year,Series,Value
Albania,1975,Emissions per capita (metric tons of carbon dioxide),1.804
Albania,1985,Emissions per capita (metric tons of carbon dioxide),2.337
Albania,1995,Emissions per capita (metric tons of carbon dioxide),0.58
Albania,2005,Emissions per capita (metric tons of carbon dioxide),1.27
Albania,2010,Emissions per capita (metric tons of carbon dioxide),1.349


## 6. Data Analysis

Let's retrieve data specifically for the United States. Identify countries whose names start with the letter 'U' (as there are many names for America).

In [35]:
%%sql

SELECT DISTINCT Country
FROM perCapital
WHERE COUNTRY LIKE 'U%';

Country
Ukraine
United Arab Emirates
United Kingdom
United Rep. of Tanzania
United States of America
Uruguay
Uzbekistan


Let's find the min and max value of Carbon Emissions per capital in the country `United States of America`.

In [36]:
%%sql

SELECT
    MIN(Value) as min_value_USA, 
    MAX(Value) as Max_value_USA
FROM perCapital
WHERE Country = 'United States of America';

min_value_USA,Max_value_USA
14.606,20.168


Let's determine the years corresponding to the minimum value of 14,606 and the maximum value of 20,168 in carbon emissions per capita for the United States of America.

In [37]:
%%sql
SELECT Year, Value
FROM perCapital
WHERE Country = 'United States of America'
AND Value IN (14.606,20.168);

Year,Value
1975,20.168
2017,14.606


Let's calculate the changes in emissions per capita for all countries in 2017 compared to the changes in emissions per capita in 1975.

Use the following formula to calculate the changes:

<span style="font-size: x-large;">$changes = \frac{Value\_2017 - Value\_1975}{Value\_1975}$</span>

In [38]:
%%sql
-- Use Common Table Expressions (CTE) value1975 and value2017
WITH value1975 AS (
    SELECT Country, Value AS old_value
    FROM perCapital
    WHERE Year = 1975),
value2017 AS (
    SELECT Country, Value AS new_value
    FROM perCapital
    WHERE Year = 2017
)

SELECT 
    DISTINCT(PC.Country), 
    ROUND((V2.new_value - V1.old_value)/V1.old_value,2) AS changes 
FROM value1975 AS V1
INNER JOIN value2017 AS V2
    ON V1.Country = V2.Country
INNER JOIN perCapital AS PC
    ON V1.Country = PC.Country
ORDER BY changes DESC
LIMIT 5;

Country,changes
Oman,16.25
Nepal,13.38
Gibraltar,7.68
Bangladesh,6.66
Thailand,6.08


**OBS:** Oman and Nepal are the countries with the highest rate of increasing changes of emissions per capital.

Let's determine the minimum value, maximum values, and their respective years of emissions for the United States of America.

In [40]:
%%sql
-- Filter 'emissions' table by the 'United States of America' country
SELECT * 
FROM emissions
WHERE Country = 'United States of America'
LIMIT 5;

Country,Year,Series,Value
United States of America,1975,Emissions (thousand metric tons of carbon dioxide),4355839.181
United States of America,1985,Emissions (thousand metric tons of carbon dioxide),4514313.221
United States of America,1995,Emissions (thousand metric tons of carbon dioxide),5073896.072
United States of America,2005,Emissions (thousand metric tons of carbon dioxide),5703220.175
United States of America,2010,Emissions (thousand metric tons of carbon dioxide),5352120.423


In [41]:
%%sql
-- Get the max and min Value from 'emissions' table
-- Filter by the 'United States of America' country
SELECT MIN(Value) AS min_value, MAX(Value) AS max_value
FROM emissions
WHERE Country = 'United States of America';

min_value,max_value
4355839.181,5703220.175


In [42]:
%%sql
-- Retrieve the minimum value, maximum value, and their respective year from the 'emissions' table
-- Filter by the country 'United States of America'
-- Combine both results using UNION
SELECT Year, Value
FROM emissions
WHERE Country = 'United States of America'
AND Value IN (
    SELECT MIN(Value) AS min_value
    FROM emissions
    WHERE Country = 'United States of America'
    UNION
    SELECT MAX(Value) AS max_value
    FROM emissions
    WHERE Country = 'United States of America'
);

Year,Value
1975,4355839.181
2005,5703220.175


Finally, let's find out the TOP-5 countries which have the highest amount of carbon emissions

In [43]:
%%sql
-- Top-5 countries with highest amount of carbon emissions
-- Order by Sum of mount of carbon emissions
SELECT Country, SUM(Value) AS sum_value
FROM emissions
GROUP BY Country
ORDER BY sum_value DESC
LIMIT 5;

Country,sum_value
China,46219584.789
United States of America,39527777.706
India,10199848.899
Russian Federation,9141272.145
Japan,8563346.271


## 7.Conclusion

* Based on the original table `Carbon_Emission` we split this tables into 2 tables, `Emissions` and 
`Emissions per capita` to make easier the analysis.
* Per capita emissions in `the USA show efforts to reduce emissions `from 1975 to 2017.
* `Oman and Nepal` are the countries with `the highest rate of increase in emissions` (period 1975 to 2017), this could be due to the increase in population and industrialization.
* The TOP-5 countries with the highest amount of carbon emissions are `China`, `The United States of America`, `India`, `Russia` and `Japan`. These countries have a high presence and dominance in the world market and industrialization. That is the main reason for higher carbon emissions.

## 8.References
* https://www.datacamp.com/projects/1590
* https://github.com/alissadao/Carbon-Emission-Project-SQL-and-PowerBI-/tree/main
* https://www.kaggle.com/datasets/vineethakkinapalli/united-nations-environment-data?select=Carbon+Dioxide+Emission+Estimates.csv