# Yield Prediction using Random Forest Regressor

This notebook implements a Random Forest Regressor model to predict the yield.

In [15]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score
# Load data
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
sample_submission = pd.read_csv("sample_submission.csv")

## 2. Data Preprocessing

In [16]:
def preprocess_data(train, test):
    X = train.drop(columns=["id", "Row#", "yield"])
    y = train["yield"]
    test_ids = test["id"]
    X_test = test.drop(columns=["id", "Row#"])
    return X, y, X_test, test_ids

X, y, X_test, test_ids = preprocess_data(train, test)

## 3. Model Training

Here,I split the training data into training (80%) and validation (20%).This helps in estimating how well the model performs before testing in the Kaggle test set.

In [17]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

rf = RandomForestRegressor(random_state=42)
rf.fit(X_train, y_train)

## 4. Evaluation

1.Performed 5-fold cross-validation and calculated MAE scores
2.Printed those scores

In [18]:
mae_scores = -cross_val_score(rf, X, y, scoring='neg_mean_absolute_error', cv=5)
print("MAE scores for 5 folds:", mae_scores)
print("Average MAE:", np.mean(mae_scores))

MAE scores for 5 folds: [248.83124246 257.71970917 255.53638857 252.5840679  256.97197323]
Average MAE: 254.32867626812418


## 5. Predictions on Test Set

In [19]:
test_preds = rf.predict(X_test)

## 6. Submission File

In [20]:
submission = pd.DataFrame({
    "id": test_ids,
    "yield": test_preds
})

submission.to_csv("random_forest_submission.csv", index=False)
submission.head()

Unnamed: 0,id,yield
0,15000,5444.970951
1,15001,6313.401481
2,15002,5555.596612
3,15003,3084.404649
4,15004,3339.873876
