# Day 20 – Model Serialization
### Turning a trained ML model into a deployable service

In this notebook, we’ll:
- Load our tuned RandomForest churn model
- Serialize it using Joblib
- Save preprocessing objects (like Scaler)
- Create a minimal FastAPI-ready prediction function

In [3]:
import pandas as pd
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

## 1. Load and Preprocess Dataset

In [7]:
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)
df['Churn'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)

# Feature selection (numerical + categorical)
numerical_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features = ['Contract', 'InternetService', 'OnlineSecurity', 'TechSupport', 'PaymentMethod']

X_categorical = pd.get_dummies(df[categorical_features], drop_first=True)
X = pd.concat([df[numerical_features], X_categorical], axis=1)
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

# Standardize numerical columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 2. Train and Save the Model

In [3]:
model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))

# Save model and scaler
joblib.dump(model, 'random_forest_tuned.pkl')
joblib.dump(scaler, 'scaler.pkl')
print('✅ Model and scaler saved successfully!')

              precision    recall  f1-score   support

           0       0.83      0.88      0.85      1291
           1       0.60      0.50      0.55       467

    accuracy                           0.78      1758
   macro avg       0.71      0.69      0.70      1758
weighted avg       0.77      0.78      0.77      1758

✅ Model and scaler saved successfully!


## 3. Create a Minimal Prediction Function (FastAPI-ready)

- If we want we can use the tuned model from day-18, where we tuned the hyper parameters.
- I have used the tuned model from day-18.

In [9]:
def predict_churn(sample_dict):
    # Load model and scaler
    model = joblib.load('../Day-18/random_forest_tuned.pkl')
    scaler = joblib.load('../Day-18/scaler.pkl')

    sample = pd.DataFrame([sample_dict])
    X_cat = pd.get_dummies(sample[categorical_features], drop_first=True)
    X_num = sample[numerical_features]
    X_final = pd.concat([X_num, X_cat], axis=1)

    X_final = X_final.reindex(columns=X.columns, fill_value=0)
    X_scaled = scaler.transform(X_final)
    pred = model.predict(X_scaled)[0]
    return int(pred)

sample_input = {
    'tenure': 12,
    'MonthlyCharges': 70,
    'TotalCharges': 840,
    'Contract': 'Month-to-month',
    'InternetService': 'Fiber optic',
    'OnlineSecurity': 'No',
    'TechSupport': 'No',
    'PaymentMethod': 'Electronic check'
}

print('Prediction (1 = Churn, 0 = Retained):', predict_churn(sample_input))

Prediction (1 = Churn, 0 = Retained): 0


✅ **Output:**
- `random_forest_tuned.pkl`
- `scaler.pkl`
- FastAPI-ready predict function

**Deliverable:** `day20_model_serialization.ipynb`

# Day 21 – End-to-End ML Pipeline
### From Raw Data → Prediction → Logging

We’ll now combine all parts into a complete automated churn prediction pipeline.

In [13]:
import pandas as pd
import joblib, logging
from datetime import datetime
from sklearn.metrics import accuracy_score

logging.basicConfig(filename='pipeline_logs.log', level=logging.INFO)

## 1. Load Serialized Model & Scaler

In [11]:
model = joblib.load('../Day-18/random_forest_tuned.pkl')
scaler = joblib.load('../Day-18/scaler.pkl')
print('✅ Loaded serialized model and scaler')

✅ Loaded serialized model and scaler


## 2. Define Automated Pipeline Function

In [10]:
def run_pipeline(new_data):
    from sklearn.preprocessing import StandardScaler
    X_cat = pd.get_dummies(new_data[categorical_features], drop_first=True)
    X_final = pd.concat([new_data[numerical_features], X_cat], axis=1)
    X_final = X_final.reindex(columns=X.columns, fill_value=0)
    X_scaled = scaler.transform(X_final)
    predictions = model.predict(X_scaled)
    new_data['PredictedChurn'] = predictions
    logging.info(f"Run successful at {datetime.now()} with {len(new_data)} rows.")
    return new_data

## 3. Run Pipeline on New Data

In [14]:
sample_df = pd.DataFrame([sample_input])
output_df = run_pipeline(sample_df)
output_df

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,Contract,InternetService,OnlineSecurity,TechSupport,PaymentMethod,PredictedChurn
0,12,70,840,Month-to-month,Fiber optic,No,No,Electronic check,0


## 4. Batch Predictions & Save Outputs

In [15]:
df_test = pd.read_csv('https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv')
df_test['TotalCharges'] = pd.to_numeric(df_test['TotalCharges'], errors='coerce')
df_test.dropna(inplace=True)
df_test = df_test.sample(50, random_state=42)

predictions = run_pipeline(df_test)
predictions.to_csv('predictions.csv', index=False)
print('✅ Batch predictions saved to predictions.csv')

✅ Batch predictions saved to predictions.csv


✅ **Outputs:**
- `pipeline_logs.log`
- `predictions.csv`

**Deliverable:** `day21_pipeline_automation.ipynb`