## Exercises

We are provided with the `Crop_yield` dataset that contains various factors that could influence the yield of a particular crop across different regions.

We are interested in how some features, namely `Temperature`, `Rainfall`, `Fertilizer_Usage`, and `Pesticide_Usage`, influence the yield of the crop.

### Import libraries and dataset

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import pickle

In [2]:
# Load dataset
df= pd.read_csv("https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Crop_yield.csv")
df.head(5)

Unnamed: 0,Region,Temperature,Rainfall,Soil_Type,Fertilizer_Usage,Pesticide_Usage,Irrigation,Crop_Variety,Yield
0,East,23.152156,803.362573,Clayey,204.792011,20.76759,1,Variety B,40.316318
1,West,19.382419,571.56767,Sandy,256.201737,49.290242,0,Variety A,26.846639
2,North,27.89589,-8.699637,Loamy,222.202626,25.316121,0,Variety C,-0.323558
3,East,26.741361,897.426194,Loamy,187.98409,17.115362,0,Variety C,45.440871
4,East,19.090286,649.384694,Loamy,110.459549,24.068804,1,Variety B,35.478118


### Exercise 1

We begin by training and evaluating a multiple linear regression model to map the relationship between the features: `Temperature`, `Rainfall`, `Fertilizer_Usage`, and `Pesticide_Usage` and the response variable, `Yield`. This model will enable us to predict crop yields based on the given factors.

In [5]:
# split the dataset
x = df[['Temperature','Rainfall', 'Fertilizer_Usage', 'Pesticide_Usage']]
y = df['Yield']

# # scale the features - removed as not needed for linear regression
# scaler = MinMaxScaler()
# scaled_features = scaler.fit_transform(x)

# # Convert the scaled features back to a DataFrame
# df_scaled = pd.DataFrame(scaled_features, columns=x.columns)

# split into test and training data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print results
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)
print(" Mean Squared Error (MSE):", round(mse, 2))
print(" R-squared (R²):", round(r2, 3))

Intercept: 2.1715060145405367
Coefficients: [ 0.10202899  0.05017984 -0.02046369  0.00054639]
 Mean Squared Error (MSE): 0.33
 R-squared (R²): 0.996


### Exercise 2

We want to be able to conveniently retrieve and use the model for predictions in future. We therefore need to persist the trained model by saving it to a file.

Use the `pickle` library to serialise the model and store it in a file named `crop_yield_model.pkl`.

In [8]:
# Define the path where the model will be saved.
model_save_path = 'crop_yield_model.pkl'

#Save (serialize) model
with open('crop_yield_model.pkl', 'wb') as file:
    pickle.dump(model, file)
print("Model saved using pickle!")

Model saved using pickle!


### Exercise 3

We have received a new set of conditions for which we need to predict the crop yield. To accomplish this, we'll utilise the model we previously trained and saved.

New conditions:
- Temperature – `25`
- Rainfall – `150`
- Fertilizer usage – `200`  
- Pesticide usage – `30`

Prepare the new data, load the saved model, and use it to make predictions on the given feature values.

In [11]:
# A dictionary to store the new set of conditions
new_conditions = {
    'Temperature': [25],  # average temperature in °C
    'Rainfall': [150],  # total rainfall in mm
    'Fertilizer_Usage': [200],  # fertilizer used in kg per hectare
    'Pesticide_Usage': [30]  # pesticide used in litres per hectare
}

# Convert to DataFrame
new_conditions_df = pd.DataFrame(new_conditions)

# This is where the model is saved
model_load_path = 'crop_yield_model.pkl'

# Load (deserialize) model
with open('crop_yield_model.pkl', 'rb') as file:
    loaded_model = pickle.load(file)

# Test the loaded model
y_pred = loaded_model.predict(new_conditions_df)

print(f"Predicted Yield for the new conditions: {y_pred[0]} tonnes per hectare")

Predicted Yield for the new conditions: 8.172860333818216 tonnes per hectare
