# Table of Contents
- <b> [1. Project Overview](#chapter1)
    - [1.1. Introduction](#section_1_1)
    - [1.2. Objective](#section_1_2)
- <b> [2. Importing Packages](#chapter2)
- <b> [3. Data Loading](#chapter3)
- <b> [4. Data Cleaning](#chapter4)
- <b> [5. Exploratory Data Analysis (EDA)](#chapter5)
- <b> [6. Conclusion and Insights](#chapter6)</b>

# 1. Project Overview <a class="anchor" id="chapter1"></a>

## 1.1. Introduction<a class="anchor" id="section_1_1"></a>

## 1.2. Objective<a class="anchor" id="section_1_2"></a>

# 2. Importing Packages<a class="anchor" id="chapter2"></a>

Importing notes

In [2]:
# Libraries for data loading, manipulation and analysis

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Displays output inline
%matplotlib inline

# Libraries for Handing Errors
import warnings
warnings.filterwarnings('ignore')

# 3. Data Loading<a class="anchor" id="chapter3"></a>

Loading notes

In [3]:
# Display all columns
pd.set_option("display.max_columns", None)

In [25]:
# loading dataset
original_df = pd.read_csv("forest_area_km.csv", index_col=False)

# Display the first few rows
original_df.head()

Unnamed: 0,Country Name,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Afghanistan,AFG,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4
1,Albania,ALB,7888.0,7868.5,7849.0,7829.5,7810.0,7790.5,7771.0,7751.5,7732.0,7712.5,7693.0,7705.77,7718.54,7731.31,7744.08,7756.85,7769.62,7782.39,7795.16,7807.93,7820.7,7834.935,7849.17,7863.405,7877.64,7891.875,7891.8,7889.025,7889.0,7889.0,7889.0,7889.0
2,Algeria,DZA,16670.0,16582.0,16494.0,16406.0,16318.0,16230.0,16142.0,16054.0,15966.0,15878.0,15790.0,16129.0,16468.0,16807.0,17146.0,17485.0,17824.0,18163.0,18502.0,18841.0,19180.0,19256.0,19332.0,19408.0,19484.0,19560.0,19560.0,19430.0,19300.0,19390.0,19490.0,19583.333
3,American Samoa,ASM,180.7,180.36,180.02,179.68,179.34,179.0,178.66,178.32,177.98,177.64,177.3,177.0,176.7,176.4,176.1,175.8,175.5,175.2,174.9,174.6,174.3,174.0,173.7,173.4,173.1,172.8,172.5,172.2,171.9,171.6,171.3,171.0
4,Andorra,AND,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0


In [19]:
# Create copy of dataset
df = original_df.copy()

In [24]:
# Displays number of rows and columns
df.shape

(259, 34)

**Results**: The dataset consists of 259 rows and 34 columns.

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 259 entries, 0 to 258
Data columns (total 34 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country Name  259 non-null    object 
 1   Country Code  259 non-null    object 
 2   1990          215 non-null    float64
 3   1991          219 non-null    float64
 4   1992          248 non-null    float64
 5   1993          251 non-null    float64
 6   1994          251 non-null    float64
 7   1995          251 non-null    float64
 8   1996          251 non-null    float64
 9   1997          251 non-null    float64
 10  1998          251 non-null    float64
 11  1999          251 non-null    float64
 12  2000          253 non-null    float64
 13  2001          253 non-null    float64
 14  2002          253 non-null    float64
 15  2003          253 non-null    float64
 16  2004          253 non-null    float64
 17  2005          253 non-null    float64
 18  2006          255 non-null    

# 4. Data Cleaning<a class="anchor" id="chapter4"></a>

## 4.1. Inspect the Data

Before cleaning, inspect the data for any issues (null values, duplicates, non-essential columns or rows)

In [36]:
def print_null_values(df):
    """
    Prints the count of null (missing) values for each column that has more than 0 null values.
    
    Parameters:
    df (pandas.DataFrame): The input DataFrame

    Returns:
    None
    """
    null_counts = df.isnull().sum()
    
    # Filter out columns with no null values
    null_counts = null_counts[null_counts > 0]
    
    if null_counts.empty:
        print("No columns with null values.")
    else:
        print("Columns with null values:")
        print(null_counts)
        

print_null_values(df)

Columns with null values:
1990    44
1991    40
1992    11
1993     8
1994     8
1995     8
1996     8
1997     8
1998     8
1999     8
2000     6
2001     6
2002     6
2003     6
2004     6
2005     6
2006     4
2007     4
2008     4
2009     4
2010     4
2011     1
dtype: int64


In [35]:
df.duplicated().sum()  # turn into function

np.int64(0)

# 5. Exploratory Data Analysis (EDA)<a class="anchor" id="chapter5"></a>

# 6. Conclusion and Insights <a class="anchor" id="chapter6"></a>