## 🧠 Step 3: Model Training – Teach BrewBuddy to be an Economist! (Pure Creation!)

**Your Mission:** Use all that clean, numeric data to train a model that **predicts coffee prices**, your AI Economist in action!

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

In [None]:
df = pd.read_csv("Data/cleaned_CoffeeData.csv")

### Define X (inputs) and y (what to predict)

In [None]:
X = df.drop(columns=['Actual_Price_INR'])
y = df['Actual_Price_INR']

## ✂️ Split the Data

### Why: Give BrewBuddy “study material” (train) and “secret exam questions” (test).

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train['Month'] = pd.to_datetime(X_train['Sale_Date']).dt.month
X_train.drop('Sale_Date', axis=1, inplace=True)
X_test['Month'] = pd.to_datetime(X_test['Sale_Date']).dt.month
X_test.drop('Sale_Date', axis=1, inplace=True)

### shapes

In [None]:
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)

## 🤖 Choose Your Brain

### Model: RandomForestRegressor (an “expert committee” of decision trees)

In [None]:
import joblib
pipeline = joblib.load('coffee_price_predictor_pipeline.pkl')

In [None]:
model = RandomForestRegressor(
    n_estimators=100,     # number of trees
    random_state=42
)

In [None]:
param_grid = {
  'model__regressor__n_estimators': [50, 100, 200],
  'model__regressor__max_depth': [None, 5, 10],
  'model__regressor__min_samples_leaf': [1, 2, 5]
}



In [None]:
grid = GridSearchCV(
    pipeline,  # ✅ full pipeline!
    param_grid=param_grid,
    scoring='neg_mean_absolute_error',
    cv=5,
    n_jobs=-1
)

In [None]:
grid.fit(X_train, y_train)
print("Best params:", grid.best_params_)
print("Best CV MAE:", -grid.best_score_)


## 🚀 Train the Brain

In [None]:
best_model = grid.best_estimator_
best_model.fit(X_train, y_train)

## 📏 Evaluate – Check the “Mistake Score”

### Make predictions

In [None]:
y_pred = grid.predict(X_test)

In [None]:
residuals = y_test - y_pred
plt.scatter(y_test, residuals, alpha=0.6)
y_min = y_test.min()
y_max = y_test.max()
plt.hlines(
    y=0,
    xmin=y_min,
    xmax=y_max,
    linestyles='dashed',
    colors='red'
)

plt.xlabel("True Price (₹)")
plt.ylabel("Residual (True − Predicted)")
plt.title("Residuals vs. True Price")
plt.show()


### Compute Mean Absolute Error (MAE)

In [None]:
X_processed = X.copy()
X_processed['Month'] = pd.to_datetime(X_processed['Sale_Date']).dt.month
X_processed.drop('Sale_Date', axis=1, inplace=True)

# 👉 Use your trained pipeline, not just the bare model:
scores = cross_val_score(
    pipeline,
    X_processed,
    y,
    cv=5,
    scoring='neg_mean_absolute_error'
)
print("CV MAE:", -scores.mean(), "±", scores.std())


### Interpret:

MAE is the average amount your model’s price guess is off by (in your currency).

A lower MAE = a smarter BrewBuddy!