Model Development, Tuning, and Ensemble Methods

Notebook Purpose
Build, tune, and combine machine learning models to predict Titanic survival.

Input
- `train_features.csv` - Training data with engineered features
- `test_features.csv` - Test data with engineered features  
- `test_passenger_ids.csv` - Passenger IDs for submission

Output
- Trained model files (`.pkl`)
- Model comparison metrics and visualizations
- Kaggle submission file


In [1]:
# Initial Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import (
    train_test_split, cross_val_score, StratifiedKFold,
    GridSearchCV, RandomizedSearchCV
)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import (
    RandomForestClassifier, GradientBoostingClassifier,
    VotingClassifier, StackingClassifier
)
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, roc_auc_score, roc_curve
)
from xgboost import XGBClassifier
import joblib

In [23]:
# Set up visualization options
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

In [24]:
# Set random seed for reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

Load Feature-Engineered Data

Loading from the previous checkpoint - our feature engineering outputs.

In [25]:
# Load the feature-engineered datasets
train_df = pd.read_csv('../data/processed/train_features.csv')
test_df = pd.read_csv('../data/processed/test_features.csv')
test_ids = pd.read_csv('../data/processed/test_passenger_ids.csv')

print(f"Training set: {train_df.shape}")
print(f"Test set: {test_df.shape}")
print(f"\nTraining columns: {train_df.columns.tolist()}")

Training set: (891, 35)
Test set: (418, 34)

Training columns: ['Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex_encoded', 'Embarked_C', 'Embarked_Q', 'Embarked_S', 'Deck_A', 'Deck_B', 'Deck_C', 'Deck_D', 'Deck_E', 'Deck_F', 'Deck_G', 'Deck_T', 'Deck_Unknown', 'FamilySize', 'IsAlone', 'Title_Master', 'Title_Miss', 'Title_Mr', 'Title_Mrs', 'Title_Rare', 'Age_Child', 'Age_Teenager', 'Age_Young_Adult', 'Age_Adult', 'Age_Senior', 'Fare_Low', 'Fare_Medium', 'Fare_High', 'Fare_Very_High']


Prepare Features and Target

Why Separate X and y?
Machine learning models expect:
- **X** (features): The input variables used to make predictions
- **y** (target): The outcome we're trying to predict (Survived)

In [26]:
# Separate features and target
X = train_df.drop('Survived', axis=1)
y = train_df['Survived']

# Test set (no target - that's what we predict)
X_test_final = test_df.copy()

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Target distribution:\n{y.value_counts(normalize=True)}")

Features shape: (891, 34)
Target shape: (891,)
Target distribution:
Survived
0    0.616162
1    0.383838
Name: proportion, dtype: float64


Train-Validation Split

Why Split the Training Data?
We need to evaluate our model on data it hasn't seen during training. This gives us 
an honest estimate of how well it will perform on the actual test set.

- **Training set (80%)**: Used to train the model
- **Validation set (20%)**: Used to evaluate and compare models

Stratified Splitting
We use stratified splitting to ensure the same proportion of survivors in both 
train and validation sets. This is important because our classes are imbalanced (~38% survived).

In [27]:
# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=RANDOM_STATE,
    stratify=y  # Maintain class proportions
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"\nTraining target distribution:\n{y_train.value_counts(normalize=True)}")
print(f"\nValidation target distribution:\n{y_val.value_counts(normalize=True)}")

Training set: (712, 34)
Validation set: (179, 34)

Training target distribution:
Survived
0    0.616573
1    0.383427
Name: proportion, dtype: float64

Validation target distribution:
Survived
0    0.614525
1    0.385475
Name: proportion, dtype: float64


Feature Scaling

Why Scale Features?
Some algorithms (like Logistic Regression) are sensitive to feature scales:
- Age ranges from 0-80
- Fare ranges from 0-512
- Binary features are just 0 or 1

**StandardScaler** transforms features to have mean=0 and std=1.

Important: Fit on Training Only!
We fit the scaler on training data and transform both train and validation. 
This prevents "data leakage" - using information from the validation set during training.

In [28]:
# Initialize scaler
scaler = StandardScaler()

# Fit on training data, transform both
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test_final)

# Convert back to DataFrames for easier handling
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_val_scaled = pd.DataFrame(X_val_scaled, columns=X_val.columns, index=X_val.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test_final.columns)

print("Feature scaling complete!")
print(f"\nScaled training data sample (first 3 rows, first 5 columns):")
print(X_train_scaled.iloc[:3, :5])

Feature scaling complete!

Scaled training data sample (first 3 rows, first 5 columns):
       Pclass       Age     SibSp     Parch      Fare
692  0.829568 -0.322182 -0.465084 -0.466183  0.513812
481 -0.370945  0.053575 -0.465084 -0.466183 -0.662563
527 -1.571457  0.805089 -0.465084 -0.466183  3.955399


Cross-Validation Setup

Why Cross-Validation?
A single train-validation split can give unstable results depending on which samples 
end up in which set. Cross-validation gives us a more reliable estimate by:

1. Splitting data into K folds
2. Training K times, each time using a different fold as validation
3. Averaging the results

We'll use **5-fold stratified cross-validation** for model comparison.

In [8]:
# Set up cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_STATE)

# Helper function to evaluate models
def evaluate_model(model, X_train, y_train, X_val, y_val, model_name):
    """
    Train model and return comprehensive evaluation metrics.
    """
    # Train
    model.fit(X_train, y_train)
    
    # Predict
    y_pred = model.predict(X_val)
    y_pred_proba = model.predict_proba(X_val)[:, 1] if hasattr(model, 'predict_proba') else None
    
    # Calculate metrics
    metrics = {
        'Model': model_name,
        'Accuracy': accuracy_score(y_val, y_pred),
        'Precision': precision_score(y_val, y_pred),
        'Recall': recall_score(y_val, y_pred),
        'F1': f1_score(y_val, y_pred),
        'ROC_AUC': roc_auc_score(y_val, y_pred_proba) if y_pred_proba is not None else None
    }
    
    # Cross-validation score (on full training set)
    cv_scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy')
    metrics['CV_Mean'] = cv_scores.mean()
    metrics['CV_Std'] = cv_scores.std()
    
    return metrics, y_pred, y_pred_proba

Model 1: Logistic Regression (Baseline)

Why Start with Logistic Regression?
Logistic Regression is the classic baseline for binary classification:
- **Simple and interpretable**: Coefficients show feature importance
- **Fast to train**: Works well even on larger datasets
- **Sets a benchmark**: If complex models don't beat this, they're not worth the complexity

How It Works
Logistic regression models the probability of survival as:
P(Survived=1) = 1 / (1 + e^-(β₀ + β₁x₁ + ... + βₙxₙ))

Each coefficient (β) tells us how that feature affects survival odds.

In [29]:
print("Training baseline models...")
print("-" * 50)

lr_base = LogisticRegression(random_state=RANDOM_STATE, max_iter=1000)
lr_base_score = cross_val_score(lr_base, X_train_scaled, y_train, cv=cv).mean()
print(f"Logistic Regression: {lr_base_score:.4f}")

rf_base = RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)
rf_base_score = cross_val_score(rf_base, X_train, y_train, cv=cv).mean()
print(f"Random Forest:       {rf_base_score:.4f}")

xgb_base = XGBClassifier(random_state=RANDOM_STATE, eval_metric='logloss')
xgb_base_score = cross_val_score(xgb_base, X_train, y_train, cv=cv).mean()
print(f"XGBoost:             {xgb_base_score:.4f}")

baseline = {'Logistic Regression': lr_base_score, 
            'Random Forest': rf_base_score, 
            'XGBoost': xgb_base_score}

Training baseline models...
--------------------------------------------------
Logistic Regression: 0.8132
Random Forest:       0.8132
XGBoost:             0.8132


In [30]:
print("\nTuning Logistic Regression...")
lr_grid = GridSearchCV(
    LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
    param_grid={
        'C': [0.01, 0.1, 1, 10, 100],
        'penalty': ['l1', 'l2'],
        'solver': ['liblinear']
    },
    cv=cv, scoring='accuracy', n_jobs=-1, verbose=1
)
lr_grid.fit(X_train_scaled, y_train)
lr_tuned = lr_grid.best_estimator_
print(f"Best CV: {lr_grid.best_score_:.4f} | Params: {lr_grid.best_params_}")


Tuning Logistic Regression...
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best CV: 0.8231 | Params: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}


In [31]:
print("\nTuning Random Forest...")
rf_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=RANDOM_STATE),
    param_distributions={
        'n_estimators': [100, 200, 300],
        'max_depth': [5, 10, 15, None],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'max_features': ['sqrt', 'log2']
    },
    n_iter=40, cv=cv, scoring='accuracy', n_jobs=-1, verbose=1, random_state=RANDOM_STATE
)
rf_search.fit(X_train, y_train)
rf_tuned = rf_search.best_estimator_
print(f"Best CV: {rf_search.best_score_:.4f} | Params: {rf_search.best_params_}")


Tuning Random Forest...
Fitting 5 folds for each of 40 candidates, totalling 200 fits
Best CV: 0.8343 | Params: {'n_estimators': 200, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'max_depth': 5}


In [32]:
print("\nTuning XGBoost...")
xgb_search = RandomizedSearchCV(
    XGBClassifier(random_state=RANDOM_STATE, eval_metric='logloss'),
    param_distributions={
        'n_estimators': [100, 200, 300],
        'max_depth': [3, 5, 7],
        'learning_rate': [0.01, 0.05, 0.1, 0.2],
        'subsample': [0.6, 0.8, 1.0],
        'colsample_bytree': [0.6, 0.8, 1.0]
    },
    n_iter=40, cv=cv, scoring='accuracy', n_jobs=-1, verbose=1, random_state=RANDOM_STATE
)
xgb_search.fit(X_train, y_train)
xgb_tuned = xgb_search.best_estimator_
print(f"Best CV: {xgb_search.best_score_:.4f} | Params: {xgb_search.best_params_}")


Tuning XGBoost...
Fitting 5 folds for each of 40 candidates, totalling 200 fits
Best CV: 0.8371 | Params: {'subsample': 0.6, 'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.1, 'colsample_bytree': 0.8}


In [35]:
print("\nTuning KNN...")
knn_grid = GridSearchCV(
    KNeighborsClassifier(),
    param_grid={
        'n_neighbors': [3, 5, 7, 9, 11],
        'weights': ['uniform', 'distance'],
        'metric': ['euclidean', 'manhattan']
    },
    cv=cv, scoring='accuracy', n_jobs=-1, verbose=1
)
knn_grid.fit(X_train_scaled, y_train)
knn_tuned = knn_grid.best_estimator_
print(f"Best CV: {knn_grid.best_score_:.4f}")


Tuning KNN...
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best CV: 0.8105


In [36]:
print("\nTuning Gradient Boosting...")
gb_search = RandomizedSearchCV(
    GradientBoostingClassifier(random_state=RANDOM_STATE),
    param_distributions={
        'n_estimators': [100, 200],
        'max_depth': [3, 5, 7],
        'learning_rate': [0.05, 0.1, 0.2],
        'min_samples_split': [2, 5]
    },
    n_iter=20, cv=cv, scoring='accuracy', n_jobs=-1, verbose=1, random_state=RANDOM_STATE
)
gb_search.fit(X_train, y_train)
gb_tuned = gb_search.best_estimator_
print(f"Best CV: {gb_search.best_score_:.4f}")


Tuning Gradient Boosting...
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best CV: 0.8329


In [37]:
all_models = {
    'Logistic Regression': (lr_tuned, lr_grid.best_score_, True),
    'Random Forest': (rf_tuned, rf_search.best_score_, False),
    'XGBoost': (xgb_tuned, xgb_search.best_score_, False),
    'KNN': (knn_tuned, knn_grid.best_score_, True),
    'Gradient Boosting': (gb_tuned, gb_search.best_score_, False)
}

print("\n" + "=" * 50)
print("TUNED SINGLE MODELS")
print("=" * 50)
for name, (_, score, _) in sorted(all_models.items(), key=lambda x: x[1][1], reverse=True):
    base = baseline.get(name, None)
    diff = f" (+{score-base:.4f})" if base else ""
    print(f"{name:22}: {score:.4f}{diff}")


TUNED SINGLE MODELS
XGBoost               : 0.8371 (+0.0239)
Random Forest         : 0.8343 (+0.0211)
Gradient Boosting     : 0.8329
Logistic Regression   : 0.8231 (+0.0098)
KNN                   : 0.8105


Ensemble Methods

Voting Classifier
- **Hard Voting**: Majority class wins
- **Soft Voting**: Average probabilities, highest wins

Stacking Classifier
- Base models make predictions
- Meta-model learns to combine them

In [38]:
print("\nBuilding Voting Ensembles...")

# Hard voting (tree-based models)
voting_hard = VotingClassifier(
    estimators=[('rf', rf_tuned), ('xgb', xgb_tuned), ('gb', gb_tuned)],
    voting='hard'
)
hard_score = cross_val_score(voting_hard, X_train, y_train, cv=cv).mean()
print(f"Hard Voting (3 trees):  {hard_score:.4f}")

# Soft voting (tree-based models)
voting_soft = VotingClassifier(
    estimators=[('rf', rf_tuned), ('xgb', xgb_tuned), ('gb', gb_tuned)],
    voting='soft'
)
soft_score = cross_val_score(voting_soft, X_train, y_train, cv=cv).mean()
print(f"Soft Voting (3 trees):  {soft_score:.4f}")

# Mixed ensemble (5 models with pipelines for scaled models)
voting_mixed = VotingClassifier(
    estimators=[
        ('lr', Pipeline([('scaler', StandardScaler()), ('lr', lr_tuned)])),
        ('rf', rf_tuned),
        ('xgb', xgb_tuned),
        ('gb', gb_tuned)
    ],
    voting='soft'
)
mixed_score = cross_val_score(voting_mixed, X_train, y_train, cv=cv).mean()
print(f"Soft Voting (5 models): {mixed_score:.4f}")


Building Voting Ensembles...
Hard Voting (3 trees):  0.8470
Soft Voting (3 trees):  0.8442
Soft Voting (5 models): 0.8343


In [39]:
print("\nBuilding Stacking Ensembles...")

# Basic stacking
stacking = StackingClassifier(
    estimators=[('rf', rf_tuned), ('xgb', xgb_tuned), ('gb', gb_tuned)],
    final_estimator=LogisticRegression(random_state=RANDOM_STATE),
    cv=5
)
stack_score = cross_val_score(stacking, X_train, y_train, cv=cv).mean()
print(f"Stacking:               {stack_score:.4f}")

# Stacking with passthrough (meta-learner sees original features too)
stacking_pt = StackingClassifier(
    estimators=[('rf', rf_tuned), ('xgb', xgb_tuned), ('gb', gb_tuned)],
    final_estimator=LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
    cv=5,
    passthrough=True
)
stack_pt_score = cross_val_score(stacking_pt, X_train, y_train, cv=cv).mean()
print(f"Stacking + Passthrough: {stack_pt_score:.4f}")


Building Stacking Ensembles...
Stacking:               0.8400
Stacking + Passthrough: 0.8315


In [40]:
ensemble_models = {
    'Hard Voting (3)': (voting_hard, hard_score, False),
    'Soft Voting (3)': (voting_soft, soft_score, False),
    'Soft Voting (5)': (voting_mixed, mixed_score, False),
    'Stacking': (stacking, stack_score, False),
    'Stacking + Passthrough': (stacking_pt, stack_pt_score, False)
}

Complete Comparison

In [41]:
all_results = {**all_models, **ensemble_models}

print("\n" + "=" * 60)
print("ALL MODELS RANKED BY CV ACCURACY")
print("=" * 60)
for name, (_, score, _) in sorted(all_results.items(), key=lambda x: x[1][1], reverse=True):
    tag = "[E]" if name in ensemble_models else "[S]"
    print(f"{tag} {name:25}: {score:.4f}")

best_name = max(all_results.items(), key=lambda x: x[1][1])[0]
best_model, best_score, needs_scaling = all_results[best_name]
print(f"\n{'='*60}")
print(f"BEST MODEL: {best_name} ({best_score:.4f})")
print(f"{'='*60}")


ALL MODELS RANKED BY CV ACCURACY
[E] Hard Voting (3)          : 0.8470
[E] Soft Voting (3)          : 0.8442
[E] Stacking                 : 0.8400
[S] XGBoost                  : 0.8371
[E] Soft Voting (5)          : 0.8343
[S] Random Forest            : 0.8343
[S] Gradient Boosting        : 0.8329
[E] Stacking + Passthrough   : 0.8315
[S] Logistic Regression      : 0.8231
[S] KNN                      : 0.8105

BEST MODEL: Hard Voting (3) (0.8470)


In [44]:
# Visualization
fig, ax = plt.subplots(figsize=(14, 6))

names = list(all_results.keys())
scores = [all_results[n][1] for n in names]
colors = ['steelblue' if n in all_models else 'coral' for n in names]

sorted_data = sorted(zip(names, scores, colors), key=lambda x: x[1], reverse=True)
names, scores, colors = zip(*sorted_data)

bars = ax.bar(names, scores, color=colors)
ax.set_ylabel('CV Accuracy')
ax.set_title('Model Comparison: Single [S] vs Ensemble [E]')
ax.set_ylim(0.78, 0.88)
plt.xticks(rotation=45, ha='right')

from matplotlib.patches import Patch
ax.legend(handles=[
    Patch(facecolor='steelblue', label='Single Model'),
    Patch(facecolor='coral', label='Ensemble')
], loc='upper right')

for bar, score in zip(bars, scores):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.002,
            f'{score:.4f}', ha='center', va='bottom', fontsize=8)

plt.tight_layout()
plt.savefig('../reports/figures/model_comparison.png')
plt.close()

Model Evaluation Visualizations

In [45]:
# Prepare predictions for visualization
top_3 = ['Random Forest', 'XGBoost', 'Gradient Boosting']
preds, probs = {}, {}

for name in top_3:
    model, _, scale = all_models[name]
    Xtr, Xv = (X_train_scaled, X_val_scaled) if scale else (X_train, X_val)
    model.fit(Xtr, y_train)
    preds[name] = model.predict(Xv)
    probs[name] = model.predict_proba(Xv)[:, 1]

In [47]:
# Confusion Matrices
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, name in zip(axes, top_3):
    cm = confusion_matrix(y_val, preds[name])
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
                xticklabels=['Died', 'Survived'], yticklabels=['Died', 'Survived'])
    ax.set_title(name)
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
plt.tight_layout()
plt.savefig('../reports/figures/confusion_matrix.png')
plt.close()

In [49]:
# ROC Curves
fig, ax = plt.subplots(figsize=(8, 6))
for name in top_3:
    fpr, tpr, _ = roc_curve(y_val, probs[name])
    auc = roc_auc_score(y_val, probs[name])
    ax.plot(fpr, tpr, label=f'{name} (AUC={auc:.3f})')
ax.plot([0, 1], [0, 1], 'k--', label='Random')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curves')
ax.legend(loc='lower right')
plt.tight_layout()
plt.savefig('../reports/figures/roc_curve.png')
plt.close()

Feature Importance

In [50]:
importance = pd.DataFrame({
    'Feature': X_train.columns,
    'RF': rf_tuned.feature_importances_,
    'XGB': xgb_tuned.feature_importances_,
    'GB': gb_tuned.feature_importances_
})

# Normalize each column
for col in ['RF', 'XGB', 'GB']:
    importance[col] = importance[col] / importance[col].max()

importance['Average'] = importance[['RF', 'XGB', 'GB']].mean(axis=1)
importance = importance.sort_values('Average', ascending=False)

print("Top 15 Features:")
print(importance[['Feature', 'Average']].head(15).to_string(index=False))

Top 15 Features:
       Feature  Average
      Title_Mr 1.000000
   Sex_encoded 0.562941
        Pclass 0.297074
          Fare 0.289457
  Deck_Unknown 0.241130
           Age 0.227870
    FamilySize 0.174610
    Title_Rare 0.172234
     Title_Mrs 0.135595
    Title_Miss 0.133756
        Deck_E 0.103180
        Deck_D 0.102509
         SibSp 0.097624
Fare_Very_High 0.091744
    Embarked_S 0.084625


In [52]:
top_feat = importance.head(15)
fig, ax = plt.subplots(figsize=(10, 8))
x = np.arange(len(top_feat))
width = 0.25

ax.barh(x - width, top_feat['RF'], width, label='Random Forest')
ax.barh(x, top_feat['XGB'], width, label='XGBoost')
ax.barh(x + width, top_feat['GB'], width, label='Gradient Boosting')

ax.set_yticks(x)
ax.set_yticklabels(top_feat['Feature'])
ax.set_xlabel('Normalized Importance')
ax.set_title('Top 15 Feature Importance')
ax.legend()
ax.invert_yaxis()
plt.tight_layout()
plt.savefig('../reports/figures/feature_importance.png')
plt.close()

Generate Predictions

In [53]:
print(f"\nRetraining {best_name} on full training data...")

if needs_scaling:
    scaler_full = StandardScaler()
    X_full = scaler_full.fit_transform(X)
    best_model.fit(X_full, y)
    X_test_pred = scaler_full.transform(X_test_final)
else:
    best_model.fit(X, y)
    X_test_pred = X_test_final

predictions = best_model.predict(X_test_pred)
print(f"Generated {len(predictions)} predictions")
print(f"Predicted survival rate: {predictions.mean():.2%}")
print(f"Training survival rate:  {y.mean():.2%}")


Retraining Hard Voting (3) on full training data...
Generated 418 predictions
Predicted survival rate: 35.17%
Training survival rate:  38.38%


In [55]:
submission = pd.DataFrame({
    'PassengerId': test_ids['PassengerId'],
    'Survived': predictions
})
submission.to_csv('../submissions/submission.csv', index=False)
print("\nSubmission saved!")
print(submission.head())


Submission saved!
   PassengerId  Survived
0          892         0
1          893         0
2          894         0
3          895         0
4          896         1


Save Models

In [57]:
joblib.dump(lr_tuned, '../models/logistic_regression.pkl')
joblib.dump(rf_tuned, '../models/random_forest.pkl')
joblib.dump(xgb_tuned, '../models/xgboost.pkl')
joblib.dump(knn_tuned, '../models/knn.pkl')
joblib.dump(gb_tuned, '../models/gradient_boosting.pkl')
joblib.dump(best_model, '../models/best_model.pkl')
joblib.dump(scaler, '../models/scaler.pkl')
print("All models saved!")

All models saved!


Summary

In [58]:
print("\n" + "=" * 70)
print("MODELING COMPLETE")
print("=" * 70)

print("\nBaseline vs Tuned:")
for name in baseline:
    base = baseline[name]
    tuned = all_models[name][1]
    print(f"  {name:22}: {base:.4f} -> {tuned:.4f} (+{tuned-base:.4f})")

print(f"\nBest Model: {best_name}")
print(f"CV Accuracy: {best_score:.4f}")


MODELING COMPLETE

Baseline vs Tuned:
  Logistic Regression   : 0.8132 -> 0.8231 (+0.0098)
  Random Forest         : 0.8132 -> 0.8343 (+0.0211)
  XGBoost               : 0.8132 -> 0.8371 (+0.0239)

Best Model: Hard Voting (3)
CV Accuracy: 0.8470
