# QCTO - Workplace Module

### Project Title: Global Deforestation
#### Done By: Ntembeko Mhlungu

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>


### Project Introduction: Global Deforestation Dataset Analysis

* Deforestation, the large-scale clearing of forests, poses significant threats to biodiversity, climate stability, and global ecosystems. With rapid urbanization, agricultural expansion, and logging activities, the world's forest cover continues to diminish at alarming rates. To address the complexities of this environmental challenge, understanding patterns of deforestation on a global scale is essential for creating effective policies and conservation strategies.

### Project Goals

* Data-Driven Insights: Analyze global deforestation data to uncover key trends, patterns, and factors contributing to forest loss across different regions.

* Identify Hotspots: Map deforestation hotspots and regions most affected by forest depletion, and correlate these findings with socio-economic and environmental factors.

* Predictive Modeling: Develop models that forecast future deforestation trends, leveraging historical data and satellite imagery to inform policy-making and forest conservation efforts.

* Impact Analysis: Assess the impact of deforestation on biodiversity, local communities, and carbon emissions to highlight the broader consequences of forest loss.

### Significance of the Project

* The significance of analyzing the global deforestation dataset lies in its ability to provide actionable insights into one of the most pressing environmental challenges of our time. By understanding deforestation patterns, governments, environmental organizations, and global agencies can implement more targeted conservation efforts, prioritize areas for reforestation, and promote sustainable land-use practices. Additionally, predicting future deforestation trends can help mitigate climate change impacts, preserve biodiversity, and protect the livelihoods of millions of people who depend on forests for survival.
---

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [2]:
#Please use code cells to code in and do not forget to comment your code.
# Data Manipulation
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>


The global deforestation dataset was likely collected from a variety of sources, including national and international agencies responsible for monitoring environmental changes. Common sources for such data include:

- Government reports on forest cover.
- Satellite imagery and remote sensing technologies, often analyzed and made available by agencies such as NASA or the European Space Agency (ESA).
- International organizations like the Food and Agriculture Organization (FAO), which compiles global forest data as part of their Forest Resources Assessment (FRA).
- Public repositories like the World Bank, Global Forest Watch, and other open data initiatives focused on environmental conservation.

### Data Collection Methods
The dataset have been compiled using one or more of the following methods:

- Satellite Remote Sensing: Uses satellite data to estimate forest area and deforestation rates.
- API Services: Some organizations provide APIs to retrieve real-time or historical forest cover data.
- Government Reports and Surveys: National agencies and research bodies publish official reports on forest resources, which may be manually collected and processed.
- Web Scraping: Public data sources or research papers might have been accessed through web scraping, although this is less common for high-quality environmental data.

### Dataset Overview
Size and Scope: The dataset is expected to cover various countries, regions, or even the entire world, and tracks forest area over time. The size would depend on the number of regions and time points available in the dataset.
Time Period: It could span multiple years or decades, capturing forest changes over time.

### Types of Data:
- Numerical Data: Metrics like forest area in square kilometers or hectares, percentage of forest cover, and changes in forest area over time.
- Categorical Data: Country or region names, forest type classifications, and land-use categories.
- Temporal Data: Dates or years indicating the time period for each observation.

---

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [8]:
#Please use code cells to code in and do not forget to comment your code.

# Load the dataset
file_path = "forest_area_km.csv"
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,Country Name,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Afghanistan,AFG,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,...,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4
1,Albania,ALB,7888.0,7868.5,7849.0,7829.5,7810.0,7790.5,7771.0,7751.5,...,7849.17,7863.405,7877.64,7891.875,7891.8,7889.025,7889.0,7889.0,7889.0,7889.0
2,Algeria,DZA,16670.0,16582.0,16494.0,16406.0,16318.0,16230.0,16142.0,16054.0,...,19332.0,19408.0,19484.0,19560.0,19560.0,19430.0,19300.0,19390.0,19490.0,19583.333
3,American Samoa,ASM,180.7,180.36,180.02,179.68,179.34,179.0,178.66,178.32,...,173.7,173.4,173.1,172.8,172.5,172.2,171.9,171.6,171.3,171.0
4,Andorra,AND,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,...,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0


In [9]:
# Display basic statistics and data types for the loaded data
df.describe(), df.dtypes


(               1990          1991          1992          1993          1994  \
 count  2.150000e+02  2.190000e+02  2.480000e+02  2.510000e+02  2.510000e+02   
 mean   9.839669e+05  9.632554e+05  1.795314e+06  1.771321e+06  1.767266e+06   
 std    2.363107e+06  2.337849e+06  5.276585e+06  5.239372e+06  5.227734e+06   
 min    0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00   
 25%    3.328300e+03  2.598345e+03  3.836765e+03  3.937705e+03  3.939040e+03   
 50%    5.672000e+04  4.939237e+04  4.680975e+04  4.557822e+04  4.541296e+04   
 75%    2.939515e+05  2.721789e+05  3.651964e+05  3.591394e+05  3.593736e+05   
 max    1.134854e+07  1.135040e+07  4.203424e+07  4.200109e+07  4.192261e+07   
 
                1995          1996          1997          1998          1999  \
 count  2.510000e+02  2.510000e+02  2.510000e+02  2.510000e+02  2.510000e+02   
 mean   1.763211e+06  1.759155e+06  1.755100e+06  1.751045e+06  1.746990e+06   
 std    5.216108e+06  5.204492e+06  5.

---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [10]:
#Please use code cells to code in and do not forget to comment your code.

# Checking for missing values
missing_values = df.isnull().sum()

# Display the count of missing values for each column
missing_values


Country Name     0
Country Code     0
1990            44
1991            40
1992            11
1993             8
1994             8
1995             8
1996             8
1997             8
1998             8
1999             8
2000             6
2001             6
2002             6
2003             6
2004             6
2005             6
2006             4
2007             4
2008             4
2009             4
2010             4
2011             1
2012             0
2013             0
2014             0
2015             0
2016             0
2017             0
2018             0
2019             0
2020             0
2021             0
dtype: int64

In [12]:
# Filling missing values by forward fill method
df_cleaned = df.fillna(method='ffill')
df_cleaned

  df_cleaned = df.fillna(method='ffill')


Unnamed: 0,Country Name,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Afghanistan,AFG,12084.4,12084.400,1.208440e+04,1.208440e+04,1.208440e+04,12084.40,1.208440e+04,1.208440e+04,...,12084.40,1.208440e+04,12084.40,1.208440e+04,1.208440e+04,1.208440e+04,12084.40,12084.4,12084.4,1.208440e+04
1,Albania,ALB,7888.0,7868.500,7.849000e+03,7.829500e+03,7.810000e+03,7790.50,7.771000e+03,7.751500e+03,...,7849.17,7.863405e+03,7877.64,7.891875e+03,7.891800e+03,7.889025e+03,7889.00,7889.0,7889.0,7.889000e+03
2,Algeria,DZA,16670.0,16582.000,1.649400e+04,1.640600e+04,1.631800e+04,16230.00,1.614200e+04,1.605400e+04,...,19332.00,1.940800e+04,19484.00,1.956000e+04,1.956000e+04,1.943000e+04,19300.00,19390.0,19490.0,1.958333e+04
3,American Samoa,ASM,180.7,180.360,1.800200e+02,1.796800e+02,1.793400e+02,179.00,1.786600e+02,1.783200e+02,...,173.70,1.734000e+02,173.10,1.728000e+02,1.725000e+02,1.722000e+02,171.90,171.6,171.3,1.710000e+02
4,Andorra,AND,160.0,160.000,1.600000e+02,1.600000e+02,1.600000e+02,160.00,1.600000e+02,1.600000e+02,...,160.00,1.600000e+02,160.00,1.600000e+02,1.600000e+02,1.600000e+02,160.00,160.0,160.0,1.600000e+02
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254,Sub-Saharan Africa,SSF,7339644.5,7306915.748,7.274187e+06,7.241458e+06,7.208729e+06,7176000.74,7.143272e+06,7.110543e+06,...,6588424.32,6.549278e+06,6510132.44,6.470986e+06,6.430837e+06,6.391509e+06,6352213.38,6311896.3,6271975.7,6.231791e+06
255,Sub-Saharan Africa (excluding high income),SSA,7339307.5,7306578.748,7.273850e+06,7.241121e+06,7.208392e+06,7175663.74,7.142935e+06,7.110206e+06,...,6588087.32,6.548941e+06,6509795.44,6.470650e+06,6.430500e+06,6.391172e+06,6351876.38,6311559.3,6271638.7,6.231454e+06
256,Sub-Saharan Africa (IDA & IBRD countries),TSS,7339644.5,7306915.748,7.274187e+06,7.241458e+06,7.208729e+06,7176000.74,7.143272e+06,7.110543e+06,...,6588424.32,6.549278e+06,6510132.44,6.470986e+06,6.430837e+06,6.391509e+06,6352213.38,6311896.3,6271975.7,6.231791e+06
257,Upper middle income,UMC,7339644.5,7306915.748,2.141549e+07,2.136692e+07,2.131835e+07,21269786.57,2.122122e+07,2.117265e+07,...,20792527.81,2.077668e+07,20760824.67,2.074497e+07,2.074251e+07,2.071789e+07,20709726.46,20699281.0,20689597.8,2.067941e+07


In [13]:
# Verifying if missing values have been handled
missing_values_cleaned = df_cleaned.isnull().sum()
missing_values_cleaned


Country Name    0
Country Code    0
1990            0
1991            0
1992            0
1993            0
1994            0
1995            0
1996            0
1997            0
1998            0
1999            0
2000            0
2001            0
2002            0
2003            0
2004            0
2005            0
2006            0
2007            0
2008            0
2009            0
2010            0
2011            0
2012            0
2013            0
2014            0
2015            0
2016            0
2017            0
2018            0
2019            0
2020            0
2021            0
dtype: int64

---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
