<h1 align="center" style="background-color:black;color:white;border-radius: 8px; padding:15px">Comprehensive Analysis of Ireland's Agriculture and Global Comparisons</h1>

## Table of Contents

- [Introduction](#Introduction)
   - Overview
   - Objective
- [Dataset Overview](#Dataset-Overview)
   - Data Sources
   - Key Variables
- [Preliminary Data Exploration](#Preliminary-Data-Exploration)
   - Import Libraries
   - Load Datasets
   - View First Five Rows of Each Dataframe
- [Data Cleaning and Preprocessing](#Data-Cleaning-and-Preprocessing)
   - Missing Values
   - Data Transformation
- [Exploratory Data Analysis (EDA)](#Exploratory-Data-Analysis)
   - Descriptive Statistics
   - Inferential Statistics
   - Visualization of Trends
- [Machine Learning Analysis](#Machine-Learning-Analysis)
   - Model Selection
   - Predictions
- [Key Findings and Insights](#Key-Findings-and-Insights)
   - Summary
   - Actionable Recommendations
- [Conclusion](#Conclusion)
- [References](#References)

<h2 style="background-color:black;color:white;border-radius: 8px; padding:15px">Introduction</h2>

### **Overview**

Agriculture has been fundamental to human civilization, evolving with modern technologies like data analytics, machine learning, and artificial intelligence to enhance productivity and sustainability. This exercise investigates Ireland’s agricultural sector, with a focus on key crops — **maize and potatoes** — and livestock — **cattle, chickens, and sheep** — as a baseline for comparative analysis against global trends..

Using data from FAOSTAT, maintained by the Food and Agriculture Organization (FAO), this exercise evaluates production trends, trade dynamics, and livestock statistics. By employing methodologies such as forecasting, it aims to uncover actionable insights and provide recommendations to optimize Ireland's agricultural standing.

### **Objective**

The primary objective of this exercise is to evaluate Ireland’s agricultural performance in the context of global trends and provide evidence-based recommendations for improvement. By focusing on specific crops (maize and potatoes) and livestock (cattle, chickens, and sheep), the exercise seeks to:

1. **Analyze Agricultural Production Trends**:

    - Investigate historical and current production data for the selected crops and livestock.
    - Identify patterns, anomalies, and growth trends to provide a clear understanding of Ireland's agricultural landscape.

2. **Compare Ireland’s Agricultural Sector Globally**:
    - Benchmark Ireland's agricultural performance against other countries using key metrics such as yield, production efficiency, and trade volumes.

3. **Conduct Forecasting and Predictive Analysis**:
    - Utilize machine learning models to predict future production trends and identify potential risks and opportunities.

4. **Provide Evidence-Based Recommendations**:
    - Develop actionable recommendations for policymakers, farmers, and stakeholders to improve productivity, trade balance, and economic outcomes in the sector.

<h2 style="background-color:black;color:white;border-radius: 8px; padding:15px">Dataset Overview</h2>

### **Data Source**  

The primary dataset for this research was downloaded from the **FAOSTAT** database provided by the Food and Agriculture Organization (FAO) at [https://www.fao.org/faostat/en/#data/QCL](https://www.fao.org/faostat/en/#data/QCL). The dataset covers agricultural data from **1961 to 2023** for all countries, identified using the **M49 coding system**. The selected focus areas include **Cattle, Chickens, Sheep, Maize (corn), and Potatoes**, providing a comprehensive view of these key agricultural products.  

To complement this primary dataset, an additional dataset, **Unit Definitions**, from the **FAOSTAT Definitions** section (https://www.fao.org/faostat/en/#definitions) was downloaded to provide essential metadata and standardizations. This data includes unit names and descriptions, such as "ha/cap" representing hectares per capita.  

This additional dataset will be merged with the primary dataset during the data cleaning process to enhance data clarity and usability.

### **Key Variables**  
The primary dataset contains the following key variables:  

- **Domain Code/Domain**: Represents the data category, e.g., "QCL" stands for "Crops and livestock products."  
- **Area Code (M49)/Area**: Identifies countries using the M49 coding system and their corresponding names.  
- **Element Code/Element**: Indicates the type of measurement or data recorded, such as "Stocks" for livestock.  
- **Item Code (CPC)/Item**: Refers to the specific agricultural product, e.g., "Cattle," "Maize," or "Potatoes."  
- **Year Code/Year**: Specifies the year of data collection.  
- **Unit**: Describes the unit of measurement, such as "An" (Animal numbers) or metric tons.  
- **Value**: Records the numerical value for the corresponding year, item, and element.  
- **Flag/Flag Description**: Provides additional metadata about the data, e.g., "A" for official figures or "E" for estimated values.  
- **Note**: Contains supplementary remarks or observations about the data entry.  

These variables, along with the definitions from the additional datasets, will form the basis for data analysis, ensuring consistency and accuracy throughout the study.

<h2 style="background-color:black;color:white;border-radius: 8px; padding:15px">Preliminary Data Exploration</h2>

### **Import Libraries**

In [1]:
# !pip install seaborn pandas matplotlib plotly scikit-learn prophet

In [2]:
# Data Manipulation libraries
import numpy as np
import pandas as pd

# Data Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler

# Time Series Forecasting
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet

# Optimization
import time
import multiprocessing

### **Load Datasets**

In [3]:
# Primary dataset
df = pd.read_csv('/kaggle/input/cct-project-datasets/datasets/FAOSTAT_data_en_12-24-2024.csv')

# Definitions and Standards
unit_df = pd.read_csv('/kaggle/input/cct-project-datasets/datasets/FAOSTAT_data_units_12-24-2024.csv')

### **View First Five Rows of Each Dataframe**

In [4]:
all_dfs = [df, unit_df]

for dataframe in all_dfs:
    display(dataframe.head())

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year Code,Year,Unit,Value,Flag,Flag Description,Note
0,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1961,1961,An,2900000.0,A,Official figure,
1,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1962,1962,An,3200000.0,E,Estimated value,
2,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1963,1963,An,3300000.0,E,Estimated value,
3,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1964,1964,An,3350000.0,E,Estimated value,
4,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1965,1965,An,3400000.0,E,Estimated value,


Unnamed: 0,Unit Name,Description
0,%,Percent
1,%LSU,Percent of Total Livestock Units
2,°c,Degrees celsius
3,0.1 g/An,tenth Grams per animal
4,100 g,hundred Grams


### **Use `info` Function to get Insights on Memory Usage and Missing Values**

In [5]:
for dataframe in all_dfs:
    display(dataframe.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90710 entries, 0 to 90709
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       90710 non-null  object 
 1   Domain            90710 non-null  object 
 2   Area Code (M49)   90710 non-null  int64  
 3   Area              90710 non-null  object 
 4   Element Code      90710 non-null  int64  
 5   Element           90710 non-null  object 
 6   Item Code (CPC)   90710 non-null  int64  
 7   Item              90710 non-null  object 
 8   Year Code         90710 non-null  int64  
 9   Year              90710 non-null  int64  
 10  Unit              90710 non-null  object 
 11  Value             89610 non-null  float64
 12  Flag              90710 non-null  object 
 13  Flag Description  90710 non-null  object 
 14  Note              2603 non-null   object 
dtypes: float64(1), int64(5), object(9)
memory usage: 10.4+ MB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65 entries, 0 to 64
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unit Name    65 non-null     object
 1   Description  65 non-null     object
dtypes: object(2)
memory usage: 1.1+ KB


None

**Observation:**

There seems to be missing values in **`df`**. This will be explored in-depth in the data cleaning section. In addition, some datatypes are best represented using object in **`df`** e.g. **year**, **Element Code** and so on. Even though some of the columns might be dropped later, it is a usually best to have data in a correct format when performing analysis.

<h2 style="background-color:black;color:white;border-radius: 8px; padding:15px">Data Cleaning and Preprocessing</h2>

### **Correct Erroneous Datatypes**

In [6]:
# Convert specified columns to object type in main dataframe
int_to_object_cols = [
    'Area Code (M49)', 'Element Code', 'Item Code (CPC)', 'Year Code', 'Year'
]
for col in int_to_object_cols:
    df[col] = df[col].astype(str)

### **Data Merging**

In [7]:
# Merge df with unit_df on Unit and Unit Name
merged_df = df.merge(unit_df,
                    left_on='Unit',
                    right_on='Unit Name',
                    how='left')

In [8]:
# Show first five rows of merged data
merged_df.head()

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year Code,Year,Unit,Value,Flag,Flag Description,Note,Unit Name,Description
0,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1961,1961,An,2900000.0,A,Official figure,,An,Animals
1,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1962,1962,An,3200000.0,E,Estimated value,,An,Animals
2,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1963,1963,An,3300000.0,E,Estimated value,,An,Animals
3,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1964,1964,An,3350000.0,E,Estimated value,,An,Animals
4,QCL,Crops and livestock products,4,Afghanistan,5111,Stocks,2111,Cattle,1965,1965,An,3400000.0,E,Estimated value,,An,Animals


### **Drop Irrelevant Columns**

After merging our datasets, we need to clean up the dataframe by removing redundant and unnecessary columns. Here's why we're removing specific columns:

**Redundant Identifier Columns**
* `Domain Code` → Redundant with `Domain` which provides the same information in a more readable format
* `Area Code (M49)` → Redundant with `Area` column which provides location in text format
* `Item Code (CPC)` → Redundant with `Item` column which gives the name directly
* `Year Code` → Redundant with `Year` column (same information)
* `Unit Name` → Redundant with `Unit` column after our merge

**Empty/Unnecessary Columns**
* `Note` → Contains mostly NaN values and not useful for our analysis
* Flag and Flag Description → Data quality indicators not needed for our current analysis

**Columns We're Keeping**
* `Domain` - Category of data
* `Area` - Geographic location
* `Element` - Type of measurement
* `Item` - Specific item being measured
* `Year` - Time period
* `Unit` - Measurement unit
* `Value` - Actual measurement

These changes will give us a cleaner, more focused dataset while maintaining all essential information for our analysis.

In [9]:
# Drop redundant and unnecessary columns
merged_df = merged_df.drop(columns=[
    'Domain Code',
    'Area Code (M49)',
    'Item Code (CPC)',
    'Year Code',
    'Unit Name',
    'Note',
    'Flag',
    'Flag Description'
])

In [10]:
merged_df.head()

Unnamed: 0,Domain,Area,Element Code,Element,Item,Year,Unit,Value,Description
0,Crops and livestock products,Afghanistan,5111,Stocks,Cattle,1961,An,2900000.0,Animals
1,Crops and livestock products,Afghanistan,5111,Stocks,Cattle,1962,An,3200000.0,Animals
2,Crops and livestock products,Afghanistan,5111,Stocks,Cattle,1963,An,3300000.0,Animals
3,Crops and livestock products,Afghanistan,5111,Stocks,Cattle,1964,An,3350000.0,Animals
4,Crops and livestock products,Afghanistan,5111,Stocks,Cattle,1965,An,3400000.0,Animals
