# Agriculture Crops Yield Project:

**1 – Business Understanding**

*۞Introduction:* <br> 
This project investigates the impact of rainfall and temperature on crop yield using R. By analyzing data on these environmental factors and their correlation with crop productivity, the study aims to identify optimal conditions for maximizing yields. The findings will provide actionable recommendations for farmers and suggest strategies for mitigating the effects of extreme weather conditions, contributing to more sustainable agricultural practices. <br>

*۞Six-Step Problem Solving Process:*<br>

| **Steps**                  | **Action**                                                                                                                                                                                                 |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **1- Identifying the Problem**| • For farmers located in the Southern region, using Clay soil, during Sunny weather: <br>     - What is the effect of Rainfall on Crop Yield? <br>     - What is the influence of Temperature on Crop Growth?                                                                                                |
| **2- Gather Information**     | • Amount of Rainfall. <br> • Temperature. <br> • Total crops yield.                                                                                                                                        |
| **3- Potential Solutions**    | • Higher rainfall and moderate temperatures lead to higher crop yields. <br> • There are optimal ranges of rainfall and temperature beyond which crop yield does not increase significantly or may decrease. |
| **4- Evaluate alternatives**  | • Data Analysis: <br>     - Perform a multiple regression analysis to determine the combined effect of rainfall and temperature on crop yield. <br>     - Use scatter plots to visualize the relationships for different crops. <br>     - Calculate correlation coefficients to quantify the strength of the relationships. <br> • Visualization: <br>     - Create scatter plots of Rainfall_mm vs. Yield_tons_per_hectare and Temperature_Celsius vs. Yield_tons_per_hectare for each crop. <br>     - Use plots to visualize the combined effect of rainfall and temperature on yield. |
| **5- Best Solution**| • Identify the optimal ranges of rainfall and temperature for the crops. <br>  |
| **6- Implementation** | • Recommendations: <br>     - Provide guidelines for farmers on the optimal ranges of rainfall and temperature for crops. <br>     - Suggest irrigation and temperature management practices to maintain optimal conditions. <br>  |

**2 – Data Understanding**

- Dataset used: Agriculture Crop Yield
- Dataset URL: https://www.kaggle.com/datasets/samuelotiattakorah/agriculture-crop-yield/data


| Facts: | Dimensions: |
| ----------- | ----------- |
| Rainfall_mm | Region |
| Temperature_Celsius | Soil_Type |
| Days_to_Harvest | Fertilizer_Used |
| Yield_tons_per_hectare | Irrigation_Used |
|  | Weather_Condition |

---

**3 – Data Preparation**

---

In [None]:
#### Loading required libraries ####
library(tidyverse)
library(dplyr)
library(ggplot2)
library(tidyr)
library(class)

In [None]:
#### Loading the dataset ####
crops <- read_csv('YOUR/PATH/TO/crop_yield.csv')
glimpse(crops)

In [None]:
#### Renaming variables ####
crops <- crops %>% 
  rename(region=Region,
         soilType=Soil_Type,
         crop=Crop,
         rainfall=Rainfall_mm,
         temperature=Temperature_Celsius,
         fertilizer=Fertilizer_Used,
         irrigation=Irrigation_Used,
         weather=Weather_Condition,
         harvestingDays=Days_to_Harvest,
         totalYield=Yield_tons_per_hectare)
glimpse(crops)

In [None]:
#### Transforming data types ####
crops <- crops %>% 
  mutate(region=as.factor(region),
         soilType=as.factor(soilType),
         crop=as.factor(crop),
         weather=as.factor(weather))
glimpse(crops)

In [None]:
#### Filtering data & Selecting relevant variables ####
crops <- crops %>% 
  filter(region == 'South' & soilType == 'Clay' & weather == 'Sunny') %>% 
  select(-region, -soilType, -weather)

In [None]:
#### Adding residuals & Correlation to the dataset ####
# cur_data() refers to the current data being used and can be replaced with across(everything())
crops <- crops %>% 
  group_by(crop) %>% 
  mutate(residuals = residuals(lm(totalYield ~ rainfall + temperature, data = cur_data())), 
         rainCor = cor(crops$rainfall, crops$totalYield),
         tempCor = cor(crops$temperature, crops$totalYield))
glimpse(crops)

In [None]:
#### sub-setting the dataset by crops ####
unique(crops$crop)

# Cotton
cotton_yield <- subset(crops, crop == 'Cotton')
cotton_yield
# Rice
rice_yield <- cotton_yield <- subset(crops, crop == 'Rice')
rice_yield
# Barley
barley_yield <- cotton_yield <- subset(crops, crop == 'Barley')
barley_yield
# Soybean
soybean_yield <- cotton_yield <- subset(crops, crop == 'Soybean')
soybean_yield
# Wheat
wheat_yield <- cotton_yield <- subset(crops, crop == 'Wheat')
wheat_yield
# Maize
maize_yield <- cotton_yield <- subset(crops, crop == 'Maize')
maize_yield

---

**4 – Data Modeling**

---

In [None]:
# Multiple regression analysis
harvest_model <- lm(harvestingDays ~ temperature+rainfall , data=crops)
summary(harvest_model)
coef(harvest_model)

crops_model <- lm(totalYield ~ rainfall+temperature, data=crops)
summary(crops_model)
coef(crops_model)

In [None]:
# Correlations
cor(crops$rainfall, crops$totalYield) # Total yield is strongly affected by rainfall
cor(crops$temperature, crops$totalYield) # The affect of temperature on total yield is minimal

cor(crops$rainfall, crops$harvestingDays) # Harvesting days are not affected by rainfall
cor(crops$temperature, crops$harvestingDays) # Harvesting days are not affected by temperature

---

**5 – Predictive Analytics**

---

In [None]:
# Ranges
rainfall_range <- seq(min(crops$rainfall), max(crops$rainfall), length.out = 100)
temperature_range <- seq(min(crops$temperature), max(crops$temperature), length.out = 100)
prediction_grid <- expand.grid(rainfall = rainfall_range, temperature = temperature_range) # unique combination of rainfall and temperature

In [None]:
# Model
predict_model <- lm(totalYield ~ rainfall + temperature, data = crops)

In [None]:
# Predictions
prediction_grid$predicted_yield <- predict(predict_model, newdata = prediction_grid)

In [None]:
# Optimal ranges
optimal_ranges <- prediction_grid %>%
  summarise(optimal_rainfall = rainfall[max(predicted_yield)],
            optimal_temperature = temperature[max(predicted_yield)])
print(optimal_ranges)

---

**6 – Visualization**

---

In [None]:
# rainfall Vs. totalYield (for each crop)
ggplot(crops, aes(x = rainfall, y = totalYield, color = crop)) +
  labs(title = 'Rainfall vs. Yield for Each Crop', 
       x = 'Rainfall (mm)', 
       y = 'Total yield (Tons/Hectare)') +
  geom_point() +
  geom_smooth(method = 'lm', col='blue') +
  facet_wrap(~ crop)

In [None]:
# rainfall Vs. totalYield (Fertilizer)
ggplot(crops, aes(x = rainfall, y = totalYield, color = fertilizer)) +
  labs(title='Rainfall Vs. Total Yield (Effect of Fertilizer on Crops)',
       x='Rainfall (mm)', 
       y='Total yield (Tons/Hectare)') +
  geom_point()+
  geom_smooth(method='lm', col='brown') +
  facet_wrap(~ crop)

In [None]:
# rainfall Vs. totalYield (Irrigation)
ggplot(crops, aes(x = rainfall, y = totalYield, color = irrigation)) +
  labs(title='Rainfall Vs. Total Yield (Effect of Irrigation on Crops)',
       x='Rainfall (mm)', 
       y='Total yield (Tons/Hectare)') +
  geom_point()+
  geom_smooth(method='lm', col='brown') +
  facet_wrap(~ crop)

In [None]:
# rainfall Vs. residuals (for each crop)
ggplot(crops, aes(x = rainfall, y = residuals, color = crop)) +
  labs(title='Rainfall Vs. Residuals for Each Crop',
       x='Rainfall (mm)', 
       y='Residuals') +
  geom_point()+
  geom_smooth(method='lm', col='black') +
  facet_wrap(~ crop)

In [None]:
# Prediction of Rainfall on Total Yield
ggplot(prediction_grid, aes(x = rainfall, y = predicted_yield)) +
  labs(title='Prediction: Rainfall Vs. Predicted Yield',
       x='Rainfall (mm)', 
       y='Total Yield') +
  geom_point()+
  geom_smooth(method='lm', col='orange')

In [None]:
# Prediction of Temperature on Total Yield
ggplot(prediction_grid, aes(x = temperature, y = predicted_yield)) +
  labs(title='Prediction: Temperature Vs. Predicted Yield',
       x='Temperature °C', 
       y='Total Yield') +
  geom_point()+
  geom_smooth(method='lm', col='orange')

---

**7 – Evaluation**

---

- Total yield is strongly affected by rainfall
- The affect of temperature on total yield is minimal
- Harvesting days are not affected by rainfall
- Harvesting days are not affected by temperature
- Optimal Rainfall: 154.5653 mm
- Optimal Temperature: 15 °C