# EDSA Regression Project

### Project Title: Average Temperatures Regression Model - Training and Testing.
#### Done By: Erich du Plessis

© ExploreAI 2024

---

## Table of Contents

<a id="cont"></a>

<a href=#INT>1. Introduction</a>

<a href=#packages>2. Importing Packages</a>

<a href=#Data>3. Loading Data </a>

<a href=#Cleaning>4. Data Cleaning and Pre-Processing</a>

<a href=#EDA>5. Exploratory Data Analysis (EDA)</a>

<a href=#Feature>6. Feature Engineering</a>

<a href=#Model>7. Model Training and Testing </a>

---
 <a id="BC"></a>
## **Introduction**
<a href=#cont>Back to Table of Contents</a>

---

This regression project requires us to analyse and predict the effect of CO2 emmisions from the agri-food sector on climate change. We will be using a comprehensive dataset compiled by the Food and Agriculture Organization (FAO) and the Intergovernmental Panel on Climate Change (IPCC). From this dataset we will train regression models to predict the average temperature variations based on multiple factors and emmision sources linked to the agri-food sector.
We will use our insights gained and ML models to provide recommendations and sollutions to climate change related discussions and issues from the agri-food sector.<br>

**<ins>Data Library:</ins><br>**
The dataset includes information on various agri-food sector related emmision sources and highlights their contribution towards climate change.
- Savanna fires: Emissions from fires in savanna ecosystems.
- Forest fires: Emissions from fires in forested areas.
- Crop Residues: Emissions from burning or decomposing leftover plant material after crop harvesting.
- Rice Cultivation: Emissions from methane released during rice cultivation.
- Drained organic soils (CO2): Emissions from carbon dioxide released when draining organic soils.
- Pesticides Manufacturing: Emissions from the production of pesticides.
- Food Transport: Emissions from transporting food products.
- Forestland: Land covered by forests.
- Net Forest conversion: Change in forest area due to deforestation and afforestation.
- Food Household Consumption: Emissions from food consumption at the household level.
- Food Retail: Emissions from the operation of retail establishments selling food.
- On-farm Electricity Use: Electricity consumption on farms.
- Food Packaging: Emissions from the production and disposal of food packaging materials.
- Agrifood Systems Waste Disposal: Emissions from waste disposal in the agrifood system.
- Food Processing: Emissions from processing food products.
- Fertilizers Manufacturing: Emissions from the production of fertilizers.
- IPPU: Emissions from industrial processes and product use.
- Manure applied to Soils: Emissions from applying animal manure to agricultural soils.
- Manure left on Pasture: Emissions from animal manure on pasture or grazing land.
- Manure Management: Emissions from managing and treating animal manure.
- Fires in organic soils: Emissions from fires in organic soils.
- Fires in humid tropical forests: Emissions from fires in humid tropical forests.
- On-farm energy use: Energy consumption on farms.
- Rural population: Number of people living in rural areas.
- Urban population: Number of people living in urban areas.
- Total Population - Male: Total number of male individuals in the population.
- Total Population - Female: Total number of female individuals in the population.
- Total_emission: Total greenhouse gas emissions from various sources.
- Average Temperature °C: The average increasing of temperature (by year) in degrees Celsius.

CO2 is measured in kilotonnes (kt).<br>
The average temperature will be our response / target variable and indicates the change in the average yearly temperature.<br>

---
 <a id="packages"></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

---

In [28]:
import numpy as np
import pandas as pd

---
 <a id="Data"></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

---

In [46]:
# Load the dataset and store it in a pandas DataFrame
df_raw = pd.read_csv('co2_emissions_from_agri.csv')

---
 <a id="Cleaning"></a>
## **Data Cleaning and Pre-Processing**
<a href=#cont>Back to Table of Contents</a>

---

In [44]:
# View the first 4 rows of the dataset to get familiar with the data
df_raw.head(4)

Unnamed: 0,Area,Year,Savanna fires,Forest fires,Crop Residues,Rice Cultivation,Drained organic soils (CO2),Pesticides Manufacturing,Food Transport,Forestland,...,Manure Management,Fires in organic soils,Fires in humid tropical forests,On-farm energy use,Rural population,Urban population,Total Population - Male,Total Population - Female,total_emission,Average Temperature °C
0,Afghanistan,1990,14.7237,0.0557,205.6077,686.0,0.0,11.807483,63.1152,-2388.803,...,319.1763,0.0,0.0,,9655167.0,2593947.0,5348387.0,5346409.0,2198.963539,0.536167
1,Afghanistan,1991,14.7237,0.0557,209.4971,678.16,0.0,11.712073,61.2125,-2388.803,...,342.3079,0.0,0.0,,10230490.0,2763167.0,5372959.0,5372208.0,2323.876629,0.020667
2,Afghanistan,1992,14.7237,0.0557,196.5341,686.0,0.0,11.712073,53.317,-2388.803,...,349.1224,0.0,0.0,,10995568.0,2985663.0,6028494.0,6028939.0,2356.304229,-0.259583
3,Afghanistan,1993,14.7237,0.0557,230.8175,686.0,0.0,11.712073,54.3617,-2388.803,...,352.2947,0.0,0.0,,11858090.0,3237009.0,7003641.0,7000119.0,2368.470529,0.101917


**<ins>Missing Data:</ins>**<br>

**<ins>Duplicate Observations:</ins>**<br>

**<ins>Outlier Analysis:</ins>**<br>

**<ins>Data Formatting and Pre-Processing:</ins>**<br>

---
 <a id="EDA"></a>
## **Exploratory Data Analysis**
<a href=#cont>Back to Table of Contents</a>

---

---
 <a id="Feature"></a>
## **Feature Engineering**
<a href=#cont>Back to Table of Contents</a>

---

---
 <a id="Model"></a>
## **Model Training and Testing**
<a href=#cont>Back to Table of Contents</a>

---