## Groupe_11
### Project 6: Climate-Smart Agriculture Recommender 

**Problem Statement**  
Farmers need guidance on crop selection as climate patterns shift. Build a recommendation system for climate-appropriate crops.

### Data Sources:

- FAO Global Agro-Ecological Zones: Data portal Climate.gov
- historical weather data: NOAA data portal
- Market price data from regional commodity exchanges

### Deliverables:

1. Crop suitability maps under current and projected climate 
2. Recommendation engine for crop selection
3. Profit optimization model including climate risk
4. Farmer decision-support tool prototype Technical report 

### Technical Requirements:

- Implement collaborative filtering for recommendations
- Use Random Forest for suitability classification
- Monte Carlo simulation for risk analysis
- Create interactive Streamlit app


## 1- Introduction

The present project aims to develop a recommender system that helps farmers make better decisions in the face of climate change. The system should suggest climate-smart agriculture (CSA) practices based on factors such as:
- soil (type, pH, fertility, water retention, texture) 
- crop (type, crop types, planting/harvest times)
- local weather patterns (Rain, Temperature, drought patterns, seasonal changes)
- water availability (irrigation system)
- farming goals (yield or sustainability)

This tool will promote sustainable agriculture by recommending specific actions or technologies that:
- increase productivity
- help adapt to climate change
- reduce greenhouse gas emissions

In [1]:
# modules importation
import os
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

## 2- Data collection

In [2]:
# load data collected
#Data collected on https://www.fao.org/faostat/en/#data
crop_data_quantity = pd.read_csv("data/Crop_data_quantity.csv")
crop_data_USD = pd.read_csv("data/Crop_data_USD.csv")
temperature_change = pd.read_csv("data/temperature_change.csv")

# Data collected on https://open-meteo.com/en/docs/historical-weather-api?start_date=1970-01-01&latitude=-1.2833&longitude=36.8167&daily=temperature_2m_mean,temperature_2m_max,temperature_2m_min,rain_sum,weather_code&timezone=Africa%2FCairo&hourly=&end_date=1970-01-01
meteo_data = pd.read_csv("data/meteo_data.csv")

## 3- Data cleaning

### 3-1 Cleaning crop_data_quatity

In [3]:
# Crop_data_quantity cleaning

crop_data_quantity.head(3)

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year Code,Year,Unit,Value,Flag,Flag Description,Note
0,QCL,Crops and livestock products,404,Kenya,5312,Area harvested,1701.0,"Beans, dry",1961,1961,ha,115000.0,E,Estimated value,
1,QCL,Crops and livestock products,404,Kenya,5412,Yield,1701.0,"Beans, dry",1961,1961,kg/ha,478.3,E,Estimated value,
2,QCL,Crops and livestock products,404,Kenya,5510,Production,1701.0,"Beans, dry",1961,1961,t,55000.0,E,Estimated value,


In [4]:
# Unnecessary column removal

crop_data_quantity = crop_data_quantity.drop(columns=["Domain Code","Area Code (M49)","Area", "Domain", "Note", "Year Code", "Element Code","Item Code (CPC)", "Flag", "Flag Description"])

In [5]:
crop_data_quantity.head(2)

Unnamed: 0,Element,Item,Year,Unit,Value
0,Area harvested,"Beans, dry",1961,ha,115000.0
1,Yield,"Beans, dry",1961,kg/ha,478.3


In [6]:
#We are removing the Yiel lines besause we dont need it

crop_data_quantity = crop_data_quantity[crop_data_quantity["Element"].isin(["Area harvested","Production"])]

In [7]:
# We are removing the soya beans and yams data because they have too much missing value

crop_data_quantity = crop_data_quantity[~crop_data_quantity["Item"].isin(["Soya beans","Yams"])]
crop_data_quantity["Item"].unique()

array(['Beans, dry', 'Cassava, fresh', 'Maize (corn)', 'Millet',
       'Potatoes', 'Rice', 'Sorghum', 'Sweet potatoes', 'Tomatoes',
       'Wheat'], dtype=object)

In [8]:
# Verification of the non value sum

crop_data_quantity.isnull().sum()

Element    0
Item       0
Year       0
Unit       0
Value      0
dtype: int64

In [9]:
# Extraction of the area values to put in column
area = crop_data_quantity[crop_data_quantity["Element"]=="Area harvested"]
area = area.rename(columns={"Value":"Area(ha)"})
area = area.drop(columns=["Element","Unit"])
area.reset_index(inplace=True, drop=True)

In [10]:
area["Year"].unique()

array([1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
       1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
       1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
       1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
       2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
       2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])

In [11]:
area.shape

(630, 3)

In [12]:
production = crop_data_quantity[crop_data_quantity["Element"]=="Production"]
production = production.rename(columns={"Value":"Production(T)"})
production = production.drop(columns=["Element","Unit", "Item", "Year"])
production.reset_index(inplace=True, drop=True)

In [13]:
production.isnull().sum()

Production(T)    0
dtype: int64

In [14]:
production.shape

(630, 1)

In [15]:
# concatenation of Area harvested data with the production data

crop_quantity_clean = pd.concat([area,production], axis=1)
crop_quantity_clean.sample(3)

Unnamed: 0,Item,Year,Area(ha),Production(T)
393,Sorghum,1976,209000.0,223400.0
426,Sorghum,2009,173172.0,99000.0
21,"Beans, dry",1982,450000.0,280000.0


### 3-2 Cleaning crop_data_USD

In [16]:
crop_data_USD.head(2)

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,QV,Value of Agricultural Production,404,Kenya,58,Gross Production Value (constant 2014-2016 tho...,1701.0,"Beans, dry",1961,1961,1000 USD,43914,E,Estimated value
1,QV,Value of Agricultural Production,404,Kenya,58,Gross Production Value (constant 2014-2016 tho...,1701.0,"Beans, dry",1962,1962,1000 USD,43914,E,Estimated value


In [17]:
# Unecessary column removal, "Domain Code","Area Code (M49)","Area", "Domain", "Note", "Year Code"

crop_data_USD = crop_data_USD.drop(columns=["Domain Code","Unit","Element","Area Code (M49)","Area", "Domain", "Year Code", "Element Code","Item Code (CPC)", "Flag", "Flag Description"])

In [18]:
# Column renaming

crop_data_USD = crop_data_USD.rename(columns={"Value":"Amount(1000USD)"})
crop_data_USD.head(2)

Unnamed: 0,Item,Year,Amount(1000USD)
0,"Beans, dry",1961,43914
1,"Beans, dry",1962,43914


In [19]:
crop_data_USD = crop_data_USD.drop(columns=["Item","Year"])

In [20]:
# Concatenation of crop quantity data with the crop value in USD data

crop_data = pd.concat([crop_quantity_clean,crop_data_USD], axis=1)
crop_data.head(2)

Unnamed: 0,Item,Year,Area(ha),Production(T),Amount(1000USD)
0,"Beans, dry",1961,115000.0,55000.0,43914
1,"Beans, dry",1962,115000.0,55000.0,43914


### 3-3 Cleaning temperature change

In [21]:
temperature_change.head(2)

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Months Code,Months,Year Code,Year,Unit,Value,Flag,Flag Description
0,ET,Temperature change on land,404,Kenya,7271,Temperature change,7001,January,1961,1961,°c,0.514,E,Estimated value
1,ET,Temperature change on land,404,Kenya,7271,Temperature change,7001,January,1962,1962,°c,-0.913,E,Estimated value


In [22]:
temperature_change[temperature_change["Year"]==1961].shape

(26, 14)

In [23]:
# Unecessary column removal, "Domain Code","Area Code (M49)","Area", "Domain", "Note", "Year Code"

temperature_change = temperature_change.drop(columns=["Domain Code","Domain", "Area Code (M49)", "Area", "Element Code", "Element", "Months Code", "Year Code", "Unit", "Flag", "Flag Description"])

In [24]:
temperature_change["Months"].unique()

array(['January', 'February', 'March', 'April', 'May', 'June', 'July',
       'August', 'September', 'October', 'November', 'December',
       'Meteorological year'], dtype=object)

In [25]:
# Extraction of the meteorological annual temperature channge data from 1961 to 2023

temperature_change = temperature_change[temperature_change["Months"]=="Meteorological year"]
temperature_change = temperature_change.iloc[0:63]
temperature_change.reset_index(drop=True, inplace=True)

In [26]:
temperature_change = temperature_change.drop(columns = ["Months", "Year"])

In [27]:
# Multilication of the number of lines to meet the 10 crops selected

temp_repeated = pd.DataFrame(np.tile(temperature_change.values, (10, 1)), columns=temperature_change.columns)
temp_repeated.shape

(630, 1)

In [28]:
# concatenation of the crop quantity data, USD data and temperature change data
crop_data = pd.concat([crop_data, temp_repeated], axis=1)
crop_data.head(2)

Unnamed: 0,Item,Year,Area(ha),Production(T),Amount(1000USD),Value
0,"Beans, dry",1961,115000.0,55000.0,43914,0.29
1,"Beans, dry",1962,115000.0,55000.0,43914,-0.234


In [29]:
# Renaming the temperature column
crop_data = crop_data.rename(columns={"Value":"Temp_change(C)"})
crop_data.head(2)

Unnamed: 0,Item,Year,Area(ha),Production(T),Amount(1000USD),Temp_change(C)
0,"Beans, dry",1961,115000.0,55000.0,43914,0.29
1,"Beans, dry",1962,115000.0,55000.0,43914,-0.234


### 3-4 Cleaning meteo_data

In [30]:
meteo_data.head()

Unnamed: 0,latitude,longitude,elevation,utc_offset_seconds,timezone,timezone_abbreviation
0,-1.3005272,36.824646,1677.0,10800,Africa/Cairo,GMT+3
1,time,temperature_2m_mean (°C),temperature_2m_max (°C),temperature_2m_min (°C),rain_sum (mm),weather_code (wmo code)
2,2000-01-01,18.8,24.9,13.9,0.20,51
3,2000-01-02,18.7,23.9,14.2,0.00,3
4,2000-01-03,18.7,24.2,15.1,0.00,3


In [31]:
# Unnecessary columns removal

meteo_data = meteo_data.drop(columns=["elevation","utc_offset_seconds","timezone_abbreviation"])
meteo_data.head(2)

Unnamed: 0,latitude,longitude,timezone
0,-1.3005272,36.824646,Africa/Cairo
1,time,temperature_2m_mean (°C),rain_sum (mm)


In [32]:
# Columns renaming

meteo_data = meteo_data.rename(columns={"latitude":"date","longitude":"temperature(C)", "timezone":"rain(mm)"})

In [33]:
# Unecessary rows removal

meteo_data = meteo_data.drop([0, 1]).reset_index(drop=True)
meteo_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9340 entries, 0 to 9339
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   date            9340 non-null   object
 1   temperature(C)  9340 non-null   object
 2   rain(mm)        9340 non-null   object
dtypes: object(3)
memory usage: 219.0+ KB


In [34]:
# conversion of the type of the data, temperature and rain data columns because they are initially object type

meteo_data["date"]=pd.to_datetime(meteo_data["date"])
meteo_data["temperature(C)"] = meteo_data["temperature(C)"].astype(float)
meteo_data["rain(mm)"] = meteo_data["rain(mm)"].astype(float)

In [35]:
meteo_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9340 entries, 0 to 9339
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   date            9340 non-null   datetime64[ns]
 1   temperature(C)  9340 non-null   float64       
 2   rain(mm)        9340 non-null   float64       
dtypes: datetime64[ns](1), float64(2)
memory usage: 219.0 KB


In [36]:
# Extraction of the year in the year column

meteo_data['year'] = meteo_data['date'].dt.year

In [37]:
# Grouping the data to have the mean of temperature per year

mean_temp = meteo_data.groupby('year')['temperature(C)'].mean().reset_index()
mean_temp.head(2)

Unnamed: 0,year,temperature(C)
0,2000,18.778689
1,2001,18.400548


In [38]:
mean_temp = mean_temp.drop(columns="year")

In [39]:
# Grouping data to have the sum of all the rain per year

sum_rain = meteo_data.groupby('year')['rain(mm)'].sum().reset_index()
sum_rain.head(2)

Unnamed: 0,year,rain(mm)
0,2000,236.2
1,2001,630.8


In [40]:
# concatenation of the temperature data and rain data in meteo data

meteo = pd.concat([sum_rain,mean_temp], axis=1)
meteo.head(2)

Unnamed: 0,year,rain(mm),temperature(C)
0,2000,236.2,18.778689
1,2001,630.8,18.400548


In [41]:
# crop data combination with meteo data
# extrapolation of temperature data because the meteo data is from 2000 to 2025 and the crop data is from 1961 to 2024

In [42]:
# filter the data from 2000 to 2023

meteo = meteo[meteo['year'].between(2000, 2023)]

In [43]:
# Means calculation of data from 2000 to 2023

temperature_mean = meteo["temperature(C)"].mean()
rain_mean = meteo["rain(mm)"].mean()

In [44]:
# Creating a dataframe of missing years
missing_year = pd.DataFrame({"year": range(1961,2000)})

#filling the data with the means calculated
missing_year["rain(mm)"] = rain_mean
missing_year ["temperature(C)"] = temperature_mean

In [45]:
#combining the missing years with meteo data

meteo_extended = pd.concat([missing_year, meteo], ignore_index=True)
meteo_extended.head(3)

Unnamed: 0,year,rain(mm),temperature(C)
0,1961,675.058333,18.790179
1,1962,675.058333,18.790179
2,1963,675.058333,18.790179


In [46]:
# multiplication of the number of line to fit all the 10 crops

meteo_extended = pd.DataFrame(np.tile(meteo_extended,(10,1)), columns = meteo_extended.columns)

In [47]:
#Removing the year column to ease the concatenation with crop data

meteo_extended = meteo_extended.drop(columns = "year")

In [48]:
# Concatenation with the crop data

clean_data_ex = pd.concat([crop_data, meteo_extended], axis=1)

In [49]:
# Data containing the extrapolated data
clean_data_ex.head(2)

Unnamed: 0,Item,Year,Area(ha),Production(T),Amount(1000USD),Temp_change(C),rain(mm),temperature(C)
0,"Beans, dry",1961,115000.0,55000.0,43914,0.29,675.058333,18.790179
1,"Beans, dry",1962,115000.0,55000.0,43914,-0.234,675.058333,18.790179


In [50]:
# Data that contain real data measured from 2000 to 2023

clean_data = clean_data_ex[clean_data_ex["Year"]>=2000]

In [52]:
clean_data.reset_index(drop=True, inplace=True)

In [54]:
clean_data.head(2)

Unnamed: 0,Item,Year,Area(ha),Production(T),Amount(1000USD),Temp_change(C),rain(mm),temperature(C)
0,"Beans, dry",2000,770797.0,331426.0,264621,0.553,236.2,18.778689
1,"Beans, dry",2001,870357.0,331426.0,264621,0.603,630.8,18.400548


In [None]:
# Data cleaning ended
# We have two types of datasets
# 1- clean_data containing the real values measured for both crop and meteo data covering 2000 to 2023
# 2- clean_data_ex containing the data of the meteo extrapolated covering 1961 to 2023

# The idea is to do an analysis of both datasets and see the difference in the results

In [56]:
# Saving the clean data to the memory
clean_data.to_csv("data/Clean_data_for_analysis.csv", index=False)
clean_data_ex.to_csv("data/Clean_data_extrapolated_for_analysis.csv", index=False) 

## 4- Data anlysis

In [58]:
clean_data["Item"].unique()

array(['Beans, dry', 'Cassava, fresh', 'Maize (corn)', 'Millet',
       'Potatoes', 'Rice', 'Sorghum', 'Sweet potatoes', 'Tomatoes',
       'Wheat'], dtype=object)

## 5- Modeling

## 6- Prediction

## 7- Conclusion