# üè¶ Bank Customer Churn Prediction
## Notebook 7 ‚Äî Production Inference Module

**Goal:** Demonstrate the `CustomerChurn` class from `BankChurn_Module.py` ‚Äî a clean, reusable Python module that encapsulates the entire inference pipeline.

### Why package the inference pipeline into a class?

When deploying a machine learning model in the real world (e.g., as an API, a scheduled job, or an embedded tool), you need a consistent, reproducible way to:
1. Load the model and scaler.
2. Accept raw input data.
3. Apply the *exact same* preprocessing as during training.
4. Return predictions in a human-readable format.

Putting all of this in a well-documented class makes the code:
- **Reusable** ‚Äî import and use from any script or application.
- **Maintainable** ‚Äî one place to update if preprocessing logic changes.
- **Testable** ‚Äî each method can be independently unit-tested.

In [1]:
# Import the module we built (must be in the same directory or on the Python path)
from BankChurn_Module import CustomerChurn,CustomScaler

print('BankChurn_Module imported successfully ‚úì')

BankChurn_Module imported successfully ‚úì


## 1. Module Architecture Overview

```
BankChurn_Module.py
‚îÇ
‚îú‚îÄ‚îÄ class CustomScaler
‚îÇ     ‚îú‚îÄ‚îÄ __init__()    ‚Äî store columns list + StandardScaler
‚îÇ     ‚îú‚îÄ‚îÄ fit()         ‚Äî learn mean/std from training data
‚îÇ     ‚îî‚îÄ‚îÄ transform()   ‚Äî apply scaling, preserve column order
‚îÇ
‚îî‚îÄ‚îÄ class CustomerChurn
      ‚îú‚îÄ‚îÄ __init__()              ‚Äî load model.pkl + scaler.pkl
      ‚îú‚îÄ‚îÄ load_and_clean_data()   ‚Äî read CSV ‚Üí drop cols ‚Üí scale ‚Üí encode ‚Üí reindex
      ‚îî‚îÄ‚îÄ predict_churn()         ‚Äî run model ‚Üí attach predictions ‚Üí return DataFrame
```

The design follows the **Single Responsibility Principle** ‚Äî each method does exactly one thing.

## 2. Instantiate the Model

Creating a `CustomerChurn` object loads both the model and the scaler into memory.  
This is done once and the object is then reused for all predictions.

In [2]:
# Instantiate ‚Äî loads model_file.pkl and Scaler_file.pkl
churn_predictor = CustomerChurn(
    model_file  = 'model_file.pkl',
    scaler_file = 'Scaler_file.pkl'
)

print('Model type   :', type(churn_predictor.model_selected))
print('Scaler type  :', type(churn_predictor.scaler_selected))
print('Ready for inference ‚úì')

Model type   : <class 'sklearn.ensemble._forest.RandomForestClassifier'>
Scaler type  : <class 'BankChurn_Module.CustomScaler'>
Ready for inference ‚úì


## 3. Load and Preprocess New Data

`load_and_clean_data()` mirrors the training pipeline steps:
1. Read raw CSV.
2. Drop identifier/leakage columns.
3. Scale numerical features using the **pre-fitted** scaler (transform only ‚Äî no re-fitting).
4. One-hot encode categorical features.
5. Reindex to the model's expected feature order.

In [3]:
# For this demo we use the original CSV (which contains 'Exited').
# In production, the input CSV would be new customers WITHOUT the 'Exited' column.
# The module will silently skip dropping 'Exited' if it's absent (errors='ignore').

preprocessed_data = churn_predictor.load_and_clean_data('Customer-Churn-Records.csv')

print(f'Preprocessed features shape: {preprocessed_data.shape}')
print(f'Expected columns: {churn_predictor.FEATURE_COLUMNS}')
print()
preprocessed_data.head()

Preprocessed features shape: (10000, 16)
Expected columns: ['HasCrCard', 'IsActiveMember', 'CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary', 'Satisfaction Score', 'Point Earned', 'Geography_Germany', 'Geography_Spain', 'Gender_Male', 'Card Type_GOLD', 'Card Type_PLATINUM', 'Card Type_SILVER']



Unnamed: 0,HasCrCard,IsActiveMember,CreditScore,Age,Tenure,Balance,NumOfProducts,EstimatedSalary,Satisfaction Score,Point Earned,Geography_Germany,Geography_Spain,Gender_Male,Card Type_GOLD,Card Type_PLATINUM,Card Type_SILVER
0,1,1,-0.326221,0.293517,-1.04176,-1.225848,-0.911583,0.021886,-0.72113,-0.630839,0,0,0,0,0,0
1,0,1,-0.440036,0.198164,-1.387538,0.11735,-0.911583,0.216534,-0.009816,-0.666251,0,1,0,0,0,0
2,1,0,-1.536794,0.293517,1.032908,1.333053,2.527057,0.240687,-0.009816,-1.015942,0,0,0,0,0,0
3,0,0,0.501521,0.007457,-1.387538,-1.225848,0.807737,-0.108918,1.412812,-1.135457,0,0,0,1,0,0
4,1,1,2.063884,0.388871,-1.04176,0.785728,-0.911583,-0.365276,1.412812,-0.803472,0,1,0,1,0,0


## 4. Generate Predictions

In [4]:
results = churn_predictor.predict_churn()

print(f'Output DataFrame shape: {results.shape}')
print(f'Columns: {results.columns.tolist()}')
print()
# Show a human-readable view: key identifiers + prediction
display_cols = ['CustomerId', 'Surname', 'Geography', 'Gender', 'Age',
                'Balance', 'NumOfProducts', 'IsActiveMember', 'Predicted_Exited']
# Only include columns that exist in results
display_cols = [c for c in display_cols if c in results.columns]
results[display_cols].head(10)

Output DataFrame shape: (10000, 19)
Columns: ['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited', 'Complain', 'Satisfaction Score', 'Card Type', 'Point Earned', 'Predicted_Exited']



Unnamed: 0,CustomerId,Surname,Geography,Gender,Age,Balance,NumOfProducts,IsActiveMember,Predicted_Exited
0,15634602,Hargrave,France,Female,42,0.0,1,1,1
1,15647311,Hill,Spain,Female,41,83807.86,1,1,0
2,15619304,Onio,France,Female,42,159660.8,3,0,1
3,15701354,Boni,France,Female,39,0.0,2,0,0
4,15737888,Mitchell,Spain,Female,43,125510.82,1,1,0
5,15574012,Chu,Spain,Male,44,113755.78,2,0,1
6,15592531,Bartlett,France,Male,50,0.0,2,1,0
7,15656148,Obinna,Germany,Female,29,115046.74,4,0,1
8,15792365,He,France,Male,44,142051.07,2,1,0
9,15592389,H?,France,Male,27,134603.88,1,1,0


In [5]:
# Prediction distribution
pred_counts = results['Predicted_Exited'].value_counts()
print('Prediction distribution:')
print(f'  Predicted to STAY   : {pred_counts.get(0, 0):,}  ({pred_counts.get(0, 0)/len(results)*100:.1f}%)')
print(f'  Predicted to CHURN  : {pred_counts.get(1, 0):,}  ({pred_counts.get(1, 0)/len(results)*100:.1f}%)')

Prediction distribution:
  Predicted to STAY   : 7,962  (79.6%)
  Predicted to CHURN  : 2,038  (20.4%)


## 5. Validation: Compare Predictions to Actual Labels

Since our demo CSV includes the actual `Exited` column, we can validate the predictions.

In [6]:
from sklearn.metrics import accuracy_score, classification_report

if 'Exited' in results.columns:
    acc = accuracy_score(results['Exited'], results['Predicted_Exited'])
    print(f'Validation accuracy on full dataset: {acc:.4f}')
    print()
    print(classification_report(results['Exited'], results['Predicted_Exited'],
                                 target_names=['Stayed', 'Churned']))
else:
    print('No actual labels available for comparison (production mode).')

Validation accuracy on full dataset: 1.0000

              precision    recall  f1-score   support

      Stayed       1.00      1.00      1.00      7962
     Churned       1.00      1.00      1.00      2038

    accuracy                           1.00     10000
   macro avg       1.00      1.00      1.00     10000
weighted avg       1.00      1.00      1.00     10000



## 6. Export Predictions

The output DataFrame can be saved to a CSV for downstream use ‚Äî uploading to a CRM, scheduling retention campaigns, or further analysis.

In [7]:
output_path = 'churn_predictions_output.csv'
results.to_csv(output_path, index=False)
print(f'‚úÖ Predictions saved to  {output_path}')
print(f'   Rows: {len(results):,}  |  Columns: {len(results.columns)}')

‚úÖ Predictions saved to  churn_predictions_output.csv
   Rows: 10,000  |  Columns: 19


---
## ‚úÖ Project Complete ‚Äî End-to-End Summary

| Notebook | Task | Key Output |
|---|---|---|
| N1 | Data upload & first look | Dataset shape, dtypes, data dictionary |
| N2 | EDA | Distributions, correlations, leakage finding |
| N3 | Data cleaning | `df_cleaned.csv` (14 cols, 10K rows) |
| N4 | Feature engineering | `data_processed.csv`, `Scaler_file.pkl` |
| N5 | Model training & selection | Random Forest selected |
| N6 | Final model saving | `model_file.pkl` |
| **N7** | **Inference module** | **`churn_predictions_output.csv`** |

### Possible Next Steps
- **Hyperparameter tuning** ‚Äî `GridSearchCV` on `n_estimators`, `max_depth`, `min_samples_leaf`.
- **SHAP values** ‚Äî Explain individual predictions ("why did customer X get flagged?").
- **Flask/FastAPI deployment** ‚Äî Wrap `CustomerChurn` class in an HTTP endpoint.
- **Monitoring** ‚Äî Track prediction drift over time as new customer data arrives.