# Customer Churn Prediction Demo

This notebook demonstrates how to use a trained artificial neural network (ANN) model to predict customer churn using new customer data. The notebook loads the pre-trained model and preprocessors, prepares input data, and makes predictions.

## What this notebook covers:
- Loading the trained model and preprocessors
- Preparing input data for prediction
- Encoding categorical variables
- Scaling features
- Making predictions and interpreting results

---

## 1. Import Required Libraries

First, we'll import all the necessary libraries for loading the model and making predictions.

In [6]:
# Import necessary libraries for model loading and prediction
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle
import pandas as pd
import numpy as np
import os

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("Libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Libraries imported successfully!
TensorFlow version: 2.15.0
Pandas version: 2.3.1
NumPy version: 1.26.4


## 2. Load Trained Model and Preprocessors

Now we'll load the pre-trained ANN model and all the preprocessors (encoders and scaler) that were saved during training. These files should be in the `PickelFiles/` directory.

In [7]:
# Define the path to the PickelFiles directory
PICKLE_DIR = "../PickelFiles/"

# Try to load the trained model (prefer .keras format, fallback to .h5)
try:
    model_path = os.path.join(PICKLE_DIR, "model.keras")
    if os.path.exists(model_path):
        model = load_model(model_path)
        print("✓ Model loaded successfully from model.keras")
    else:
        model_path = os.path.join(PICKLE_DIR, "model.h5")
        model = load_model(model_path)
        print("✓ Model loaded successfully from model.h5")
except Exception as e:
    print(f"❌ Error loading model: {e}")
    raise

# Load the one-hot encoder for Geography
geo_encoder_path = os.path.join(PICKLE_DIR, 'onehot_encoder_geo.pkl')
if os.path.exists(geo_encoder_path):
    try:
        with open(geo_encoder_path, 'rb') as file:
            label_encoder_geo = pickle.load(file)
        print("✓ Geography encoder loaded successfully")
    except Exception as e:
        print(f"❌ Error loading geography encoder: {e}")
        raise
else:
    print(f"❌ Geography encoder file not found at: {geo_encoder_path}")
    print("   Please ensure the encoder was saved during training and the path is correct.")
    label_encoder_geo = None  # Set to None to avoid breaking the notebook

# Load the label encoder for Gender
try:
    with open(os.path.join(PICKLE_DIR, 'label_encoder_gender.pkl'), 'rb') as file:
        label_encoder_gender = pickle.load(file)
    print("✓ Gender encoder loaded successfully")
except Exception as e:
    print(f"❌ Error loading gender encoder: {e}")
    raise

# Load the feature scaler
try:
    with open(os.path.join(PICKLE_DIR, 'scaler.pkl'), 'rb') as file:
        scaler = pickle.load(file)
    print("✓ Scaler loaded successfully")
except Exception as e:
    print(f"❌ Error loading scaler: {e}")
    raise

print("\n🎉 All model components loaded successfully!")

✓ Model loaded successfully from model.keras
✓ Geography encoder loaded successfully
✓ Gender encoder loaded successfully
✓ Scaler loaded successfully

🎉 All model components loaded successfully!


## 3. Prepare Input Data for Prediction

Let's create sample customer data to demonstrate how predictions work. In a real-world scenario, this data would come from your customer database or user input.

In [8]:
# Sample customer data for prediction
# In practice, this would come from user input or database query
input_data = {
    'CreditScore': 600,        # Customer's credit score (300-850)
    'Geography': 'France',     # Customer's country (France, Germany, Spain)
    'Gender': 'Male',          # Customer's gender (Male, Female)
    'Age': 40,                 # Customer's age in years
    'Tenure': 3,               # Number of years as bank customer
    'Balance': 60000,          # Account balance in currency units
    'NumOfProducts': 2,        # Number of bank products used (1-4)
    'HasCrCard': 1,           # Has credit card (1=Yes, 0=No)
    'IsActiveMember': 1,       # Is active member (1=Yes, 0=No)
    'EstimatedSalary': 50000   # Estimated annual salary
}

print("📊 Sample customer data:")
for key, value in input_data.items():
    print(f"  {key}: {value}")
    
# Convert to DataFrame for easier manipulation
input_df = pd.DataFrame([input_data])
print(f"\n📋 Input DataFrame shape: {input_df.shape}")
print("\n", input_df)

📊 Sample customer data:
  CreditScore: 600
  Geography: France
  Gender: Male
  Age: 40
  Tenure: 3
  Balance: 60000
  NumOfProducts: 2
  HasCrCard: 1
  IsActiveMember: 1
  EstimatedSalary: 50000

📋 Input DataFrame shape: (1, 10)

    CreditScore Geography Gender  Age  Tenure  Balance  NumOfProducts  \
0          600    France   Male   40       3    60000              2   

   HasCrCard  IsActiveMember  EstimatedSalary  
0          1               1            50000  


## 4. Data Preprocessing

We need to apply the same preprocessing steps that were used during training:
1. **Encode Gender**: Convert categorical gender to numerical using Label Encoder
2. **One-hot encode Geography**: Convert country names to binary columns
3. **Scale Features**: Normalize all features using the same scaler from training

### 4.1 One-Hot Encode Geography

In [9]:
# One-hot encode the 'Geography' column using the trained encoder
try:
    # Transform the geography value to one-hot encoded format
    geo_encoded = label_encoder_geo.transform([[input_data['Geography']]]).toarray()
    
    # Create DataFrame with proper column names
    geo_encoded_df = pd.DataFrame(
        geo_encoded, 
        columns=label_encoder_geo.get_feature_names_out(['Geography'])
    )
    
    print("🌍 Geography one-hot encoding successful:")
    print(f"  Original: {input_data['Geography']}")
    print(f"  Encoded shape: {geo_encoded_df.shape}")
    print("\n📊 One-hot encoded geography:")
    print(geo_encoded_df)
    
except Exception as e:
    print(f"❌ Error encoding geography: {e}")
    print(f"   Make sure '{input_data['Geography']}' was in the training data")
    raise

🌍 Geography one-hot encoding successful:
  Original: France
  Encoded shape: (1, 3)

📊 One-hot encoded geography:
   Geography_France  Geography_Germany  Geography_Spain
0               1.0                0.0              0.0




### 4.2 Label Encode Gender

In [10]:
# Label encode the 'Gender' column using the trained encoder
try:
    # Store original gender for reference
    original_gender = input_df['Gender'].iloc[0]
    
    # Transform gender to numerical format
    input_df['Gender'] = label_encoder_gender.transform(input_df['Gender'])
    
    print("👤 Gender label encoding successful:")
    print(f"  Original: {original_gender}")
    print(f"  Encoded: {input_df['Gender'].iloc[0]}")
    print("\n📊 DataFrame after gender encoding:")
    print(input_df)
    
except Exception as e:
    print(f"❌ Error encoding gender: {e}")
    print(f"   Make sure '{original_gender}' was in the training data")
    raise

👤 Gender label encoding successful:
  Original: Male
  Encoded: 1

📊 DataFrame after gender encoding:
   CreditScore Geography  Gender  Age  Tenure  Balance  NumOfProducts  \
0          600    France       1   40       3    60000              2   

   HasCrCard  IsActiveMember  EstimatedSalary  
0          1               1            50000  


### 4.3 Combine All Features

In [11]:
# Combine the original features (without Geography) with the one-hot encoded geography columns
# This creates the final feature set that matches the training data format
try:
    # Remove the original 'Geography' column and concatenate with one-hot encoded columns
    input_df_final = pd.concat([
        input_df.drop("Geography", axis=1),  # All features except Geography
        geo_encoded_df                       # One-hot encoded Geography columns
    ], axis=1)
    
    print("🔗 Features combined successfully:")
    print(f"  Original features: {input_df.shape[1]} columns")
    print(f"  Geography encoded: {geo_encoded_df.shape[1]} columns")
    print(f"  Final features: {input_df_final.shape[1]} columns")
    print("\n📊 Final feature DataFrame:")
    print(input_df_final)
    print(f"\n📋 Column names: {list(input_df_final.columns)}")
    
except Exception as e:
    print(f"❌ Error combining features: {e}")
    raise

🔗 Features combined successfully:
  Original features: 10 columns
  Geography encoded: 3 columns
  Final features: 12 columns

📊 Final feature DataFrame:
   CreditScore  Gender  Age  Tenure  Balance  NumOfProducts  HasCrCard  \
0          600       1   40       3    60000              2          1   

   IsActiveMember  EstimatedSalary  Geography_France  Geography_Germany  \
0               1            50000               1.0                0.0   

   Geography_Spain  
0              0.0  

📋 Column names: ['CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Geography_France', 'Geography_Germany', 'Geography_Spain']


### 4.4 Scale Features

In [12]:
# Scale the input features using the same scaler that was used during training
# This ensures the features are in the same range as the training data
try:
    # Apply the same scaling transformation used during training
    input_scaled = scaler.transform(input_df_final)
    
    print("📏 Feature scaling successful:")
    print(f"  Original shape: {input_df_final.shape}")
    print(f"  Scaled shape: {input_scaled.shape}")
    print(f"  Data type: {input_scaled.dtype}")
    print(f"\n📊 Scaled features (first 5 values): {input_scaled[0][:5]}")
    print(f"📊 Feature range: [{input_scaled.min():.3f}, {input_scaled.max():.3f}]")
    
except Exception as e:
    print(f"❌ Error scaling features: {e}")
    print("   Make sure the number of features matches the training data")
    raise

📏 Feature scaling successful:
  Original shape: (1, 12)
  Scaled shape: (1, 12)
  Data type: float64

📊 Scaled features (first 5 values): [-0.52544045  0.90750738  0.10007155 -0.6962018  -0.2629485 ]
📊 Feature range: [-0.867, 1.002]


## 5. Make Prediction

Now we'll use our trained ANN model to predict whether this customer is likely to churn. The model outputs a probability between 0 and 1, where values closer to 1 indicate higher churn likelihood.

In [13]:
# Use the trained model to predict churn probability
try:
    # Make prediction using the scaled input data
    prediction = model.predict(input_scaled, verbose=0)
    
    # Extract the probability value (model outputs a 2D array)
    prediction_proba = prediction[0][0]
    
    print("🔮 Prediction completed successfully:")
    print(f"  Raw prediction output: {prediction}")
    print(f"  Churn probability: {prediction_proba:.4f} ({prediction_proba*100:.2f}%)")
    
except Exception as e:
    print(f"❌ Error making prediction: {e}")
    raise

🔮 Prediction completed successfully:
  Raw prediction output: [[0.0344639]]
  Churn probability: 0.0345 (3.45%)


## 6. Interpret Results

Let's interpret the prediction results and provide actionable insights based on the churn probability.

In [14]:
# Interpret the prediction results with detailed analysis
def interpret_churn_prediction(probability, customer_data):
    """
    Provide detailed interpretation of churn prediction
    """
    print("=" * 60)
    print("🎯 CHURN PREDICTION ANALYSIS")
    print("=" * 60)
    
    # Basic prediction
    if probability > 0.5:
        risk_level = "HIGH RISK"
        emoji = "🚨"
        recommendation = "IMMEDIATE ATTENTION REQUIRED"
    elif probability > 0.3:
        risk_level = "MEDIUM RISK"
        emoji = "⚠️"
        recommendation = "MONITOR CLOSELY"
    else:
        risk_level = "LOW RISK"
        emoji = "✅"
        recommendation = "CUSTOMER LIKELY TO STAY"
    
    print(f"{emoji} Churn Probability: {probability:.4f} ({probability*100:.2f}%)")
    print(f"{emoji} Risk Level: {risk_level}")
    print(f"{emoji} Recommendation: {recommendation}")
    
    print("\n" + "=" * 60)
    print("📊 CUSTOMER PROFILE ANALYSIS")
    print("=" * 60)
    
    # Analyze customer characteristics
    print(f"👤 Customer Demographics:")
    print(f"   Age: {customer_data['Age']} years")
    print(f"   Gender: {customer_data['Gender']}")
    print(f"   Geography: {customer_data['Geography']}")
    
    print(f"\n💳 Banking Relationship:")
    print(f"   Credit Score: {customer_data['CreditScore']}")
    print(f"   Tenure: {customer_data['Tenure']} years")
    print(f"   Products Used: {customer_data['NumOfProducts']}")
    print(f"   Has Credit Card: {'Yes' if customer_data['HasCrCard'] else 'No'}")
    print(f"   Active Member: {'Yes' if customer_data['IsActiveMember'] else 'No'}")
    
    print(f"\n💰 Financial Profile:")
    print(f"   Account Balance: ${customer_data['Balance']:,}")
    print(f"   Estimated Salary: ${customer_data['EstimatedSalary']:,}")
    
    # Risk factors analysis
    print(f"\n🔍 RISK FACTORS ANALYSIS:")
    risk_factors = []
    
    if customer_data['Age'] > 45:
        risk_factors.append("Age above 45 (higher churn risk)")
    if customer_data['NumOfProducts'] == 1:
        risk_factors.append("Only using 1 product (low engagement)")
    if customer_data['IsActiveMember'] == 0:
        risk_factors.append("Inactive member (disengaged)")
    if customer_data['Balance'] == 0:
        risk_factors.append("Zero balance (inactive account)")
    if customer_data['Tenure'] <= 2:
        risk_factors.append("Short tenure (new customer)")
    
    if risk_factors:
        for factor in risk_factors:
            print(f"   ⚠️ {factor}")
    else:
        print("   ✅ No major risk factors identified")
    
    return risk_level, recommendation

# Run the analysis
risk_level, recommendation = interpret_churn_prediction(prediction_proba, input_data)

🎯 CHURN PREDICTION ANALYSIS
✅ Churn Probability: 0.0345 (3.45%)
✅ Risk Level: LOW RISK
✅ Recommendation: CUSTOMER LIKELY TO STAY

📊 CUSTOMER PROFILE ANALYSIS
👤 Customer Demographics:
   Age: 40 years
   Gender: Male
   Geography: France

💳 Banking Relationship:
   Credit Score: 600
   Tenure: 3 years
   Products Used: 2
   Has Credit Card: Yes
   Active Member: Yes

💰 Financial Profile:
   Account Balance: $60,000
   Estimated Salary: $50,000

🔍 RISK FACTORS ANALYSIS:
   ✅ No major risk factors identified


## 7. Actionable Recommendations

Based on the churn prediction, here are specific actions that can be taken to reduce churn risk:

In [15]:
# Generate specific actionable recommendations based on the prediction
def generate_recommendations(probability, customer_data, risk_level):
    """
    Generate specific recommendations for customer retention
    """
    print("=" * 60)
    print("💡 ACTIONABLE RECOMMENDATIONS")
    print("=" * 60)
    
    recommendations = []
    
    if probability > 0.5:
        # High risk customers
        recommendations.extend([
            "🔥 URGENT: Schedule immediate customer outreach call",
            "💰 Offer loyalty rewards or account upgrades",
            "📞 Assign dedicated relationship manager",
            "🎁 Provide exclusive offers or fee waivers"
        ])
    elif probability > 0.3:
        # Medium risk customers
        recommendations.extend([
            "📧 Send targeted retention email campaign",
            "📊 Analyze usage patterns for personalized offers",
            "🎯 Include in upcoming promotional campaigns",
            "💳 Suggest additional products that add value"
        ])
    else:
        # Low risk customers
        recommendations.extend([
            "✅ Continue standard engagement programs",
            "🎉 Consider them for referral programs",
            "📈 Monitor for upselling opportunities",
            "💬 Collect feedback to maintain satisfaction"
        ])
    
    # Specific recommendations based on customer profile
    if customer_data['NumOfProducts'] == 1:
        recommendations.append("🔄 Recommend additional banking products")
    
    if customer_data['IsActiveMember'] == 0:
        recommendations.append("🚀 Launch re-engagement campaign")
    
    if customer_data['Balance'] == 0:
        recommendations.append("💳 Encourage account usage with incentives")
    
    if customer_data['Age'] > 45:
        recommendations.append("👴 Focus on retirement planning services")
    
    # Print recommendations
    print(f"Based on {risk_level} classification:")
    for i, rec in enumerate(recommendations, 1):
        print(f"{i}. {rec}")
    
    print(f"\n📅 Recommended Timeline:")
    if probability > 0.5:
        print("   ⏰ Take action within 24-48 hours")
    elif probability > 0.3:
        print("   ⏰ Take action within 1-2 weeks")
    else:
        print("   ⏰ Include in next quarterly review")
    
    return recommendations

# Generate recommendations for this customer
recommendations = generate_recommendations(prediction_proba, input_data, risk_level)

💡 ACTIONABLE RECOMMENDATIONS
Based on LOW RISK classification:
1. ✅ Continue standard engagement programs
2. 🎉 Consider them for referral programs
3. 📈 Monitor for upselling opportunities
4. 💬 Collect feedback to maintain satisfaction

📅 Recommended Timeline:
   ⏰ Include in next quarterly review


## 8. Conclusion

This notebook demonstrated how to use a trained ANN model for customer churn prediction. The complete workflow includes:

### ✅ What We Accomplished:
1. **Model Loading**: Successfully loaded pre-trained model and preprocessors
2. **Data Preprocessing**: Applied the same transformations used during training
3. **Prediction**: Generated churn probability for new customer data
4. **Analysis**: Provided detailed risk assessment and customer profiling
5. **Recommendations**: Generated actionable retention strategies

### 🔄 Next Steps for Production Use:
1. **Integration**: Integrate this prediction pipeline into your CRM system
2. **Automation**: Automate predictions for batch customer scoring
3. **Monitoring**: Track prediction accuracy and model performance over time
4. **Feedback Loop**: Collect outcomes to retrain and improve the model
5. **A/B Testing**: Test retention strategies based on predictions

### 📊 Key Metrics to Track:
- Prediction accuracy vs. actual churn
- Customer retention rate improvements
- ROI of retention campaigns
- Model drift and performance degradation

---

**💡 Remember**: This model is a tool to support decision-making, not replace human judgment. Always consider business context and customer relationships when taking action based on predictions.