# Model 2: Random Forest (Class)

This notebook implements a Random Forest Regressor to capture non-linear relationships.

## 1. Imports and Setup

In [1]:
import sys
import pandas as pd
from pathlib import Path
# Add parent directory to path to import src
sys.path.append(str(Path('..').resolve()))
from src.models import RandomForestModel
print("="*70)
print("MODEL 2: RANDOM FOREST (Class Model)")
print("="*70)
# Load data
data_dir = Path('..').resolve() / 'data' / 'processed'
X_train = pd.read_csv(data_dir / 'multi_X_train.csv')
y_train = pd.read_csv(data_dir / 'multi_y_train.csv').values.ravel()
X_test = pd.read_csv(data_dir / 'multi_X_test.csv')
y_test = pd.read_csv(data_dir / 'multi_y_test.csv').values.ravel()
feature_names = X_train.columns.tolist()
# Train
print("\nTraining Random Forest...")
model = RandomForestModel(random_state=42)
model.train(
    X_train.values, 
    y_train, 
    feature_names=feature_names,
    tune_hyperparams=True,
    cv_folds=5,
    verbose=1
)
# Evaluate
print("\nEvaluation:")
model.evaluate(X_test.values, y_test, dataset_name="Test")
# Save
model.save(data_dir / 'model_02_random_forest.pkl')

MODEL 2: RANDOM FOREST (Class Model)

Training Random Forest...
[Random Forest] Starting hyperparameter tuning with 5-fold CV...
Fitting 5 folds for each of 216 candidates, totalling 1080 fits


[Random Forest] Best parameters: {'max_depth': 30, 'max_features': 'log2', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}
[Random Forest] Best CV R² score: 0.7719
[Random Forest] Training completed in 23.61s

Evaluation:

[Random Forest] Test Set Performance:
  R² Score:  0.8696
  RMSE:      0.1464
  MAE:       0.1016
  MSE:       0.0214
[Random Forest] Model saved to /Users/himanshishrivas/Documents/IntroMLCapstone/data/processed/model_02_random_forest.pkl
