Project Title : Crop Yield Prediction for AgriBoost

Date : 28 June 2025

Description : A basic machine learning model that predicts crop yield for Agriboost.

Input : CropYield.csv

In [1]:
# 1. Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from google.colab import drive


In [2]:
# 2. Load dataset
file_path = '/content/drive/MyDrive/Colab Notebooks/Crop Yield Prediction for AgriBoost India.csv'
df = pd.read_csv(file_path)
df_encoded = pd.get_dummies(df, drop_first=True)

In [3]:
print(df.head())

  crop_type soil_type  rainfall_mm  temperature_c  humidity_percent  soil_ph  \
0     Wheat     Silty        570.6           25.9              41.4     6.27   
1     Maize     Loamy        710.1           29.7              67.9     6.89   
2    Cotton     Sandy        996.8           21.6              68.6     7.67   
3    Cotton     Loamy        890.0           33.0              78.5     7.94   
4     Wheat     Silty        994.4           24.1              59.9     7.04   

   fertilizer_used_kg_per_hectare  pesticide_used_kg_per_hectare  \
0                           108.4                           1.64   
1                           140.1                           1.88   
2                           128.3                           1.27   
3                           163.3                           1.89   
4                           200.1                           1.78   

  irrigation_type  seed_quality_index  satellite_ndvi_index  \
0       Sprinkler                7.28          

Issue # 1 : Encoding the values that are of string types.

Issue #2 : Selecting the right model.
Selected model : Regression model


We use Simple Linear Regression when:

The dataset contains one independent variable (feature) and one dependent variable (target).

We assume there's a linear relationship between those two variables.



In [4]:
# 3. Split into X and y
X=df_encoded.drop(columns='predicted_yield_quintal_per_hectare')
y = df['predicted_yield_quintal_per_hectare']

# 4. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [5]:

# 5. Create Linear Regression model
model = LinearRegression()


In [6]:
# 6. Train the model
model.fit(X_train, y_train)

In [7]:

# 7. Predict using the model

y_pred = model.predict(X_test)

# 8. Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R² Score:", r2)



Mean Squared Error: 4.521764301560941
R² Score: 0.8644280166054816


In [8]:
# Example Input
new_input = pd.DataFrame([{
    'crop_type': 'Wheat',
    'soil_type': 'Loamy',
    'rainfall_mm': 150.0,
    'temperature_c': 25.0,
    'humidity_percent': 60.0,
    'soil_ph': 6.5,
    'fertilizer_used_kg_per_hectare': 100.0,
    'pesticide_used_kg_per_hectare': 5.0,
    'irrigation_type': 'Drip',
    'seed_quality_index': 0.85,
    'satellite_ndvi_index': 0.72
}])

new_input_encoded = pd.get_dummies(new_input)

new_input_encoded = new_input_encoded.reindex(columns=X.columns, fill_value=0)
predicted_yield = model.predict(new_input_encoded)

print("Predicted Yield:", predicted_yield[0])


Predicted Yield: 24.60423985207659


## 📌 Conclusion
- The model shows a strong relationship between rainfall and crop yield
- This can be used for yield estimation and decision-making in agriculture.
- The model can be further improved by adding more features like temperature, soil type, etc.

