# Model 3: XGBoost (Class)

This notebook implements XGBoost (Extreme Gradient Boosting), a powerful tree-based model.

## 1. Imports and Setup

In [1]:
import sys
import pandas as pd
from pathlib import Path
# Add parent directory to path to import src
sys.path.append(str(Path('..').resolve()))
from src.models import XGBoostModel
print("="*70)
print("MODEL 3: XGBOOST (Class Model)")
print("="*70)
# Load data
data_dir = Path('..').resolve() / 'data' / 'processed'
X_train = pd.read_csv(data_dir / 'multi_X_train.csv')
y_train = pd.read_csv(data_dir / 'multi_y_train.csv').values.ravel()
X_test = pd.read_csv(data_dir / 'multi_X_test.csv')
y_test = pd.read_csv(data_dir / 'multi_y_test.csv').values.ravel()
feature_names = X_train.columns.tolist()
# Train
print("\nTraining XGBoost...")
model = XGBoostModel(random_state=42)
model.train(
    X_train.values, 
    y_train, 
    feature_names=feature_names,
    tune_hyperparams=True,
    cv_folds=5,
    verbose=1
)
# Evaluate
print("\nEvaluation:")
model.evaluate(X_test.values, y_test, dataset_name="Test")
# Save
model.save(data_dir / 'model_03_xgboost.pkl')

MODEL 3: XGBOOST (Class Model)

Training XGBoost...
[XGBoost] Starting hyperparameter tuning with 5-fold CV...
Fitting 5 folds for each of 108 candidates, totalling 540 fits


[XGBoost] Best parameters: {'colsample_bytree': 0.8, 'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 200, 'subsample': 1.0}
[XGBoost] Best CV R² score: 0.8022
[XGBoost] Training completed in 16.01s

Evaluation:

[XGBoost] Test Set Performance:
  R² Score:  0.9021
  RMSE:      0.1269
  MAE:       0.0763
  MSE:       0.0161
[XGBoost] Model saved to /Users/himanshishrivas/Documents/IntroMLCapstone/data/processed/model_03_xgboost.pkl
