# [Project Title]

## Table of Contents

- <b> [1. Project Overview](#chapter1)
    - [1.1. Introduction](#section_1_1)
    - [1.2. Objective](#section_1_2)
- <b> [2. Importing Packages](#chapter2)
- <b> [3. Data Loading and Inspection](#chapter3)
- <b> [4. Data Cleaning](#chapter4)
- <b> [5. Exploratory Data Analysis (EDA)](#chapter5)
- <b> [6. Feature Engineering](#chapter6)</b>
- <b> [7. Model Development](#chapter7)</b>
- <b> [8. Model Performance](#chapter8)
- <b> [9. Conclusion and Insights](#chapter10)</b>


##1. Project Overview <a class="anchor" id="chapter1"></a>

### 1.1 Introduction 



*   Brief background on the climate change and agricultural activities
*   State project aim
*   State key questions??
*   State dataset origin and describe features
*   Describe notebook structure
*   Brief description of analysis methodology



### 1.2 Objective 


*   Perform exploratory data analysis on the agricultural dataset??
*   Identify the relationship between CO2 emmisions and climate change or temperature variations??
*   Identify emission sources that are significant contributors of CO2 emissions and have major influence on temperature variations??

*   Develop a regression model to predict temperature variations??



### 2. Importing Packages 

##2. Importing Packages <a class="anchor" id="chapter2"></a>

In [None]:
# Libraries for data manipulation and analysis
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Libraries for regression analysis
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Displays output inline
%matplotlib inline

# Libraries for Handing Warnings
import warnings
warnings.filterwarnings('ignore')

### 3. Data Loading and Inspection <a class="anchor" id="chapter3"></a>

##3. Data Loading and Inspection <a class="anchor" id="chapter3"></a>

In [3]:
import pandas as pd

#reading the dataset into a dataframe
df = pd.read_csv('co2_emissions_from_agri.csv')
#display first five columns of dataframe
df.head()


Unnamed: 0,Area,Year,Savanna fires,Forest fires,Crop Residues,Rice Cultivation,Drained organic soils (CO2),Pesticides Manufacturing,Food Transport,Forestland,...,Manure Management,Fires in organic soils,Fires in humid tropical forests,On-farm energy use,Rural population,Urban population,Total Population - Male,Total Population - Female,total_emission,Average Temperature °C
0,Afghanistan,1990,14.7237,0.0557,205.6077,686.0,0.0,11.807483,63.1152,-2388.803,...,319.1763,0.0,0.0,,9655167.0,2593947.0,5348387.0,5346409.0,2198.963539,0.536167
1,Afghanistan,1991,14.7237,0.0557,209.4971,678.16,0.0,11.712073,61.2125,-2388.803,...,342.3079,0.0,0.0,,10230490.0,2763167.0,5372959.0,5372208.0,2323.876629,0.020667
2,Afghanistan,1992,14.7237,0.0557,196.5341,686.0,0.0,11.712073,53.317,-2388.803,...,349.1224,0.0,0.0,,10995568.0,2985663.0,6028494.0,6028939.0,2356.304229,-0.259583
3,Afghanistan,1993,14.7237,0.0557,230.8175,686.0,0.0,11.712073,54.3617,-2388.803,...,352.2947,0.0,0.0,,11858090.0,3237009.0,7003641.0,7000119.0,2368.470529,0.101917
4,Afghanistan,1994,14.7237,0.0557,242.0494,705.6,0.0,11.712073,53.9874,-2388.803,...,367.6784,0.0,0.0,,12690115.0,3482604.0,7733458.0,7722096.0,2500.768729,0.37225


* **Savanna fires:** Emissions from fires in savanna ecosystems.
* **Forest fires:** Emissions from fires in forested areas.
* **Crop Residues:** Emissions from burning or decomposing leftover plant material after crop harvesting.
* **Rice Cultivation:** Emissions from methane released during rice cultivation.
* **Drained organic soils (CO2):** Emissions from carbon dioxide released when draining organic soils.
* **Pesticides Manufacturing:** Emissions from the production of pesticides.
* **Food Transport:** Emissions from transporting food products.
* **Forestland:** Land covered by forests.
* **Net Forest conversion:** Change in forest area due to deforestation and afforestation.
* **Food Household Consumption:** Emissions from food consumption at the household level.
* **Food Retail:** Emissions from the operation of retail establishments selling food.
* **On-farm Electricity Use:** Electricity consumption on farms.
* **Food Packaging:** Emissions from the production and disposal of food packaging materials.
* **Agrifood Systems Waste Disposal:** Emissions from waste disposal in the agrifood system.
* **Food Processing:** Emissions from processing food products.
* **Fertilizers Manufacturing:** Emissions from the production of fertilizers.
* **IPPU:** Emissions from industrial processes and product use.
* **Manure applied to Soils:** Emissions from applying animal manure to agricultural soils.
* **Manure left on Pasture:** Emissions from animal manure on pasture or grazing land.
* **Manure Management:** Emissions from managing and treating animal manure.
* **Fires in organic soils:** Emissions from fires in organic soils.
* **Fires in humid tropical forests:** Emissions from fires in humid tropical forests.
* **On-farm energy use:** Energy consumption on farms.
* **Rural population:** Number of people living in rural areas.
* **Urban population:** Number of people living in urban areas.
* **Total Population - Male:** Total number of male individuals in the population.
* **Total Population - Female:** Total number of female individuals in the population.
* **total_emission:** Total greenhouse gas emissions from various sources.
* **Average Temperature °C:** The average increasing or decreasing of temperature (by year) in degrees Celsius

**CO2 is recorded in kilotonnes (kt) and  1 kt represents 1000 kg of CO2.**

### 4. Data Cleaning

##4. Data Cleaning <a class="anchor" id="chapter4"></a>

In [20]:
#renaming features to adhere to python naming standards
df= df.rename(columns = {'Average Temperature °C' : 'average_temperature_change', 'Total Population - Female' : 'female_population', 'Total Population - Male':'male_population', 'Urban population': 'urban_population', 'Rural population':'rural_population', 'On-farm energy use' : 'on_farm_energy_use' , 'Fires in humid tropical forests' : 'fires_in_humid_tropical_forests'
                         ,'Fires in organic soils' : 'fires_in_organic_soils', 'Manure Management' : 'manure_management', 'Manure left on Pasture': 'manure_left_on_pasture', 'Manure applied to Soils' : 'manure_applied_to_soils','Fertilizers Manufacturing' : 'fertilizers_manufacturing', 'Food Processing' : 'food_processing', 'Agrifood Systems Waste Disposal' : 'agrifood_systems_waste_disposal'
                         ,'Food Packaging' :'food_packaging', 'On-farm Electricity Use': 'on_farm_electricity_use', 'Food Retail' : 'food_retail' , 'Food Household Consumption' : 'food_household_consumption' , 'Net Forest conversion' : 'net_forest_conversion', 'Forestland': 'forestland', 'Food Transport' : 'food_transport', 'Pesticides Manufacturing' : 'pesticides_manufacturing'
                         ,'Drained organic soils (CO2)': 'drained_organic_soils', 'Rice Cultivation' : 'rice_cultivation', 'Crop Residues':'crop_residues', 'Forest fires':'forest_fires', 'Savanna fires': 'savanna_fires', 'Area' :'area', 'Year' : 'year'})
                        
                    

In [27]:
df.head()

Unnamed: 0,area,year,savanna_fires,forest_fires,crop_residues,rice_cultivation,drained_organic_soils,pesticides_manufacturing,food_transport,forestland,...,manure_management,fires_in_organic_soils,fires_in_humid_tropical_forests,on_farm_energy_use,rural_population,urban_population,male_population,female_population,total_emission,average_temperature_change
0,Afghanistan,1990,14.7237,0.0557,205.6077,686.0,0.0,11.807483,63.1152,-2388.803,...,319.1763,0.0,0.0,,9655167.0,2593947.0,5348387.0,5346409.0,2198.963539,0.536167
1,Afghanistan,1991,14.7237,0.0557,209.4971,678.16,0.0,11.712073,61.2125,-2388.803,...,342.3079,0.0,0.0,,10230490.0,2763167.0,5372959.0,5372208.0,2323.876629,0.020667
2,Afghanistan,1992,14.7237,0.0557,196.5341,686.0,0.0,11.712073,53.317,-2388.803,...,349.1224,0.0,0.0,,10995568.0,2985663.0,6028494.0,6028939.0,2356.304229,-0.259583
3,Afghanistan,1993,14.7237,0.0557,230.8175,686.0,0.0,11.712073,54.3617,-2388.803,...,352.2947,0.0,0.0,,11858090.0,3237009.0,7003641.0,7000119.0,2368.470529,0.101917
4,Afghanistan,1994,14.7237,0.0557,242.0494,705.6,0.0,11.712073,53.9874,-2388.803,...,367.6784,0.0,0.0,,12690115.0,3482604.0,7733458.0,7722096.0,2500.768729,0.37225


In [24]:

#Checking for null values per column
print("Null Values in each column")
print(df.isnull().sum())

Null Values in each column
area                                  0
year                                  0
savanna_fires                        31
forest_fires                         93
crop_residues                      1389
rice_cultivation                      0
drained_organic_soils                 0
pesticides_manufacturing              0
food_transport                        0
forestland                          493
net_forest_conversion               493
food_household_consumption          473
food_retail                           0
on_farm_electricity_use               0
food_packaging                        0
agrifood_systems_waste_disposal       0
food_processing                       0
fertilizers_manufacturing             0
IPPU                                743
manure_applied_to_soils             928
manure_left_on_pasture                0
manure_management                   928
fires_in_organic_soils                0
fires_in_humid_tropical_forests     155
on_farm_energ

In [31]:

#Replacing Missing Values with zero
df[['savanna_fires', 'forest_fires', 'crop_residues', 'forestland', 'net_forest_conversion', 'food_household_consumption', 'IPPU','manure_applied_to_soils','manure_management', 'fires_in_humid_tropical_forests', 'on_farm_energy_use']] = df[['savanna_fires', 'forest_fires', 'crop_residues', 'forestland', 'net_forest_conversion', 'food_household_consumption', 'IPPU','manure_applied_to_soils','manure_management', 'fires_in_humid_tropical_forests', 'on_farm_energy_use']].fillna(0)


The empty cells are specific to specific areas, hence their replacement with zero

In [37]:
#Checking for duplicates
print("Number of duplicated rows: ")
print(df.duplicated().sum())

Number of duplicated rows: 
0


In [39]:

#Generate descriptive statistics
df.describe()

Unnamed: 0,year,savanna_fires,forest_fires,crop_residues,rice_cultivation,drained_organic_soils,pesticides_manufacturing,food_transport,forestland,net_forest_conversion,...,manure_management,fires_in_organic_soils,fires_in_humid_tropical_forests,on_farm_energy_use,rural_population,urban_population,male_population,female_population,total_emission,average_temperature_change
count,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,...,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0,6965.0
mean,2005.12491,1183.101572,907.027206,799.538604,4259.666673,3503.228636,333.418393,1939.58176,-16566.355335,16359.47,...,1961.78226,1210.315532,653.577094,2595.976217,17857740.0,16932300.0,17619630.0,17324470.0,64091.24,0.872989
std,8.894665,5235.195713,3696.662005,3334.783518,17613.825187,15861.445678,1429.159367,5616.748808,79014.907125,97615.71,...,7469.521165,22669.84776,3229.846962,11783.996613,89015210.0,65743620.0,76039930.0,72517110.0,228313.0,0.55593
min,1990.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001,-797183.079,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,250.0,270.0,-391884.1,-1.415833
25%,1997.0,0.0,0.0,0.0284,181.2608,0.0,6.0,27.9586,-2299.3872,0.0,...,11.0381,0.0,0.0,5.0136,97311.0,217386.0,201326.0,207890.0,5221.244,0.511333
50%,2005.0,1.587,0.4164,43.0048,534.8174,0.0,13.0,204.9628,-30.8531,9.029,...,163.4639,0.0,0.0,55.9822,1595322.0,2357581.0,2469660.0,2444135.0,12147.65,0.8343
75%,2013.0,108.3617,61.2372,264.718,1536.64,690.4088,116.325487,1207.0009,0.0,3830.905,...,883.1703,0.0,6.9418,845.7131,8177340.0,8277123.0,9075924.0,9112588.0,35139.73,1.20675
max,2020.0,114616.4011,52227.6306,33490.0741,164915.2556,241025.0696,16459.0,67945.765,171121.076,1605106.0,...,70592.6465,991717.5431,51771.2568,248879.1769,900099100.0,902077800.0,743586600.0,713341900.0,3115114.0,3.558083


In [None]:
print("Correlation Matrix:")
df.corr()

### 5. Exploratory Data Analysis

##5. Exploratory Data Analysis <a class="anchor" id="chapter5"></a>

*Brief section introduction

*Insights/comments

### 6. Feature Engineering

##6. Feature Engineering <a class="anchor" id="chapter6"></a>

*Brief section introduction

*Insights

### 7. Model Development

##7. Model Development <a class="anchor" id="chapter7"></a>

*Brief section introduction

*Insights/comments

### 8. Model Performance

##8. Model Performance <a class="anchor" id="chapter8"></a>

*Brief section introduction

*Insights/comments

### 9. Conclusion and Recommendations 

##9. Conclusion and Recommendations <a class="anchor" id="chapter9"></a>

*Summarise Insights
*Offer recommendations such as sustainable agricultural practices