## 1. Import Libraries

### Purpose:
Import essential libraries for data processing, hyperparameter tuning, and model persistence.

### Libraries Used:
- **pandas**: Data manipulation
- **sklearn.model_selection**: Train/test split and GridSearchCV for hyperparameter tuning
- **sklearn.ensemble**: Random Forest classifier
- **sklearn.metrics**: Model evaluation metrics
- **joblib**: Efficient model serialization (saving/loading)
- **os**: File system operations

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import joblib
import os

## 2. Setup Model Directory

### Purpose:
Create a directory to store the trained model file.

### Key Details:
- **exist_ok=True**: Prevents error if directory already exists
- Ensures organized file structure for model artifacts

In [2]:
# Create models directory if it doesn't exist
os.makedirs('../models', exist_ok=True)
print("✓ Models directory ready")

✓ Models directory ready


## 3. Load and Prepare Data

### Purpose:
Load the featured dataset and prepare it for model training.

In [3]:
# Load featured data
df = pd.read_csv('../data/customer_churn_featured.csv')

# Prepare features and target
X = df.drop(['customer_id', 'churned'], axis=1)
X = pd.get_dummies(X, drop_first=True)
y = df['churned']

print(f"Dataset loaded: {df.shape[0]} customers, {X.shape[1]} features")
print(f"Churn rate: {y.mean():.2%}")

Dataset loaded: 100 customers, 57 features
Churn rate: 44.00%


### Interpretation:
- Confirms data is loaded successfully
- Shows total number of features after one-hot encoding
- Displays overall churn rate in the dataset

## 4. Train-Test Split

### Purpose:
Divide data into training (80%) and testing (20%) sets.

In [4]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Number of features: {X_train.shape[1]}")

Training samples: 80
Test samples: 20
Number of features: 57


### Interpretation:
- Training set will be used for grid search and cross-validation
- Test set remains completely unseen until final evaluation
- Ensures unbiased performance assessment

## 5. Define Parameter Grid

### Purpose:
Specify the hyperparameter values to test in grid search.

### Hyperparameters Explained:

#### **n_estimators** (Number of Trees)
- **50**: Faster training, may underfit
- **100**: Good balance (default)
- **200**: More accurate but slower

#### **max_depth** (Tree Depth)
- **5**: Shallow trees, prevents overfitting but may underfit
- **10**: Moderate depth, good starting point
- **15**: Deep trees, captures complex patterns but risk of overfitting
- **None** (not tested here): Unlimited depth until leaves are pure

#### **min_samples_split** (Minimum Samples to Split)
- **2**: Very sensitive, creates detailed trees (more overfitting risk)
- **5**: Balanced approach
- **10**: Conservative, creates simpler trees (more regularization)

In [5]:
# Define parameter grid for hyperparameter tuning
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15],
    'min_samples_split': [2, 5, 10]
}

total_combinations = (len(param_grid['n_estimators']) * 
                     len(param_grid['max_depth']) * 
                     len(param_grid['min_samples_split']))

print("Parameter Grid:")
for param, values in param_grid.items():
    print(f"  {param}: {values}")
print(f"\nTotal combinations to test: {total_combinations}")
print(f"With 5-fold CV: {total_combinations * 5} total model fits")

Parameter Grid:
  n_estimators: [50, 100, 200]
  max_depth: [5, 10, 15]
  min_samples_split: [2, 5, 10]

Total combinations to test: 27
With 5-fold CV: 135 total model fits


### Interpretation:
- Grid search will train and evaluate 27 different parameter combinations
- Each combination is tested 5 times (cross-validation folds)
- Total: 135 model training iterations
- This ensures robust evaluation of each configuration

## 6. Perform Grid Search with Cross-Validation

### Purpose:
Systematically test all hyperparameter combinations using 5-fold cross-validation to find the optimal configuration.

### How Grid Search Works:
1. **For each parameter combination**:
   - Split training data into 5 folds
   - Train on 4 folds, validate on 1 fold
   - Repeat 5 times (each fold used as validation once)
   - Average the 5 validation scores
2. **Select best combination**: Highest average CV score

### Key Parameters:
- **cv=5**: 5-fold cross-validation
- **scoring='accuracy'**: Optimization metric
- **n_jobs=-1**: Use all CPU cores for parallel processing
- **verbose=1**: Print progress updates

In [6]:
print("=" * 80)
print("HYPERPARAMETER TUNING WITH GRID SEARCH")
print("=" * 80)
print("Starting grid search...\n")

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42), 
    param_grid, 
    cv=5, 
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

# Fit the grid search (this will take some time)
grid_search.fit(X_train, y_train)

print("\n" + "=" * 80)
print("GRID SEARCH COMPLETE")
print("=" * 80)
print(f"✓ Best parameters: {grid_search.best_params_}")
print(f"✓ Best cross-validation score: {grid_search.best_score_:.4f}")

HYPERPARAMETER TUNING WITH GRID SEARCH
Starting grid search...

Fitting 5 folds for each of 27 candidates, totalling 135 fits

GRID SEARCH COMPLETE
✓ Best parameters: {'max_depth': 5, 'min_samples_split': 2, 'n_estimators': 50}
✓ Best cross-validation score: 1.0000

GRID SEARCH COMPLETE
✓ Best parameters: {'max_depth': 5, 'min_samples_split': 2, 'n_estimators': 50}
✓ Best cross-validation score: 1.0000


### Interpretation:
- **Best parameters**: The optimal hyperparameter combination found
- **Best CV score**: Average accuracy across 5 folds for the best configuration
- **Example interpretation**:
  - If best_params = {n_estimators: 200, max_depth: 10, min_samples_split: 5}
  - Means: 200 trees with moderate depth and moderate split threshold performed best
  - If CV score = 0.8542, the model achieves ~85.4% accuracy on average across folds

## 7. Analyze Grid Search Results

### Purpose:
Examine all tested combinations to understand how different hyperparameters affect performance.

In [8]:
# Create DataFrame with all results
cv_results = pd.DataFrame(grid_search.cv_results_)

# Select and sort relevant columns
results_df = cv_results[[
    'param_n_estimators', 
    'param_max_depth', 
    'param_min_samples_split',
    'mean_test_score',
    'std_test_score',
    'rank_test_score'
]].sort_values('rank_test_score')

print("Top 10 Parameter Combinations:")
print("=" * 80)
print(results_df.head(10).to_string(index=False))

Top 10 Parameter Combinations:
 param_n_estimators  param_max_depth  param_min_samples_split  mean_test_score  std_test_score  rank_test_score
                 50                5                        2              1.0             0.0                1
                 50               15                       10              1.0             0.0                1
                200               15                        5              1.0             0.0                1
                100               15                        5              1.0             0.0                1
                 50               15                        5              1.0             0.0                1
                200               15                        2              1.0             0.0                1
                100               15                        2              1.0             0.0                1
                 50               15                        2            

### Interpretation:
- **mean_test_score**: Average accuracy across 5 folds
- **std_test_score**: Standard deviation (lower = more consistent)
- **rank_test_score**: 1 = best combination
- Look for patterns:
  - Do higher n_estimators consistently perform better?
  - Is there an optimal max_depth?
  - How much does performance vary between top combinations?

## 8. Extract and Evaluate Best Model

### Purpose:
Retrieve the best model and evaluate its performance on the held-out test set.

### Why Test Set Evaluation?
- Cross-validation scores are based on training data
- Test set provides unbiased estimate of real-world performance
- Checks if the model generalizes beyond data used in hyperparameter selection

In [9]:
# Get the best model
best_model = grid_search.best_estimator_

# Make predictions on test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print("=" * 80)
print("TEST SET PERFORMANCE")
print("=" * 80)
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"\nComparison:")
print(f"  Best CV Score (training): {grid_search.best_score_:.4f}")
print(f"  Test Score (unseen data): {test_accuracy:.4f}")
print(f"  Difference: {abs(grid_search.best_score_ - test_accuracy):.4f}")
print("\n" + "=" * 80)
print("CLASSIFICATION REPORT")
print("=" * 80)
print(classification_report(y_test, y_pred, target_names=['Not Churned', 'Churned']))

TEST SET PERFORMANCE
Test Accuracy: 1.0000

Comparison:
  Best CV Score (training): 1.0000
  Test Score (unseen data): 1.0000
  Difference: 0.0000

CLASSIFICATION REPORT
              precision    recall  f1-score   support

 Not Churned       1.00      1.00      1.00        11
     Churned       1.00      1.00      1.00         9

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20



### Interpretation:

#### **CV Score vs Test Score Comparison**:
- **Small difference (< 2%)**: Good generalization, model is robust
- **Test score lower by 3-5%**: Acceptable, slight overfitting
- **Test score lower by > 5%**: Significant overfitting, consider:
  - More regularization (higher min_samples_split, lower max_depth)
  - More training data
  - Feature selection
- **Test score higher than CV**: Lucky split or model underfitting training data

#### **Classification Report**:
- Check if tuning improved metrics compared to baseline model
- Focus on recall for churned class (critical for business)
- Balanced precision and recall indicate well-calibrated model

## 9. Save the Tuned Model

### Purpose:
Persist the optimized model to disk for deployment and future predictions.

### Why joblib?
- More efficient than pickle for large numpy arrays
- Standard format for scikit-learn models
- Enables model versioning and deployment

In [10]:
# Save the tuned model
model_path = '../models/tuned_churn_model.pkl'
joblib.dump(best_model, model_path)

print("=" * 80)
print("MODEL SAVED SUCCESSFULLY")
print("=" * 80)
print(f"✓ Model saved to: {model_path}")
print(f"✓ Best parameters: {grid_search.best_params_}")
print(f"✓ Test accuracy: {test_accuracy:.4f}")

# Show how to load the model later
print("\nTo load this model later:")
print(f"  loaded_model = joblib.load('{model_path}')")
print("  predictions = loaded_model.predict(new_data)")

MODEL SAVED SUCCESSFULLY
✓ Model saved to: ../models/tuned_churn_model.pkl
✓ Best parameters: {'max_depth': 5, 'min_samples_split': 2, 'n_estimators': 50}
✓ Test accuracy: 1.0000

To load this model later:
  loaded_model = joblib.load('../models/tuned_churn_model.pkl')
  predictions = loaded_model.predict(new_data)


### Interpretation:
- Model is now ready for production use
- Can be loaded in other scripts for making predictions
- Contains all optimized hyperparameters and learned patterns
- Should be versioned and tracked in your model registry

## 10. Summary and Next Steps

### What We Accomplished:
✅ Systematically tested 27 hyperparameter combinations
✅ Used 5-fold cross-validation for robust evaluation
✅ Identified optimal Random Forest configuration
✅ Validated performance on held-out test set
✅ Saved production-ready model

### Key Takeaways:
- Hyperparameter tuning can significantly improve model performance
- Grid search ensures we don't miss good configurations
- Cross-validation provides reliable performance estimates
- Test set validation ensures model generalizes to new data

### Next Steps:
1. **Feature Analysis**: Examine feature importance from tuned model
2. **Error Analysis**: Investigate misclassified cases
3. **Model Deployment**: Integrate model into production system
4. **Monitoring**: Track model performance on live data
5. **Retraining**: Periodically retune with new data

### Advanced Tuning Options:
- **RandomizedSearchCV**: Sample parameter space (faster for many parameters)
- **Bayesian Optimization**: Intelligent search using prior results
- **Additional hyperparameters**: min_samples_leaf, max_features, class_weight
- **Different scoring metrics**: F1-score, ROC-AUC for imbalanced data