# Customer Churn Prediction: Model Inference and Deployment Preparation

This Jupyter Notebook focuses on the final stage of our customer churn prediction project: loading the trained ANN model and preprocessing objects to make predictions on new, unseen data. This process simulates how the model would be used in a real-world deployment scenario, leveraging the saved `pickle` files for consistent data transformation.

## Key Steps Covered:

1.  **Loading Pre-trained Assets**:
    * Loading the saved **ANN model** (`model.h5`).
    * Loading the **`LabelEncoder`** for 'Gender' (`label_encoder_gender.pkl`).
    * Loading the **`OneHotEncoder`** for 'Geography' (`onehot_encoder_geo.pkl`).
    * Loading the **`StandardScaler`** (`scaler.pkl`).
2.  **Simulating New Input Data**: Defining a dictionary to represent a new customer's features, mimicking the structure of the original dataset.
3.  **Data Preprocessing for Inference**:
    * Converting the raw input data into a pandas DataFrame.
    * Applying the **loaded `LabelEncoder`** to transform the 'Gender' feature.
    * Applying the **loaded `OneHotEncoder`** to transform the 'Geography' feature, ensuring consistent encoding with the training phase.
    * Concatenating the one-hot encoded geography features with the rest of the input DataFrame.
    * Applying the **loaded `StandardScaler`** to scale the numerical features of the input data.
4.  **Prediction**: Using the loaded ANN model to predict the churn probability for the preprocessed input data.
5.  **Interpretation of Prediction**: Converting the raw probability output into a human-readable churn prediction (e.g., "Customer is likely to churn" if probability > 0.5).

This notebook provides a complete flow for taking raw input, transforming it using saved preprocessing steps, and making a prediction with a trained deep learning model, preparing for full-scale application deployment.

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle
import pandas as pd
import numpy as np



In [None]:
### Load the trained model, scaler pickle,onehot
# Load the trained Keras ANN model from the .h5 file
model=load_model('model.h5')

## load the encoder and scaler
# Open and load the one-hot encoder for 'Geography'
with open('onehot_encoder_geo.pkl','rb') as file:
    onehot_encoder_geo=pickle.load(file)

# Open and load the label encoder for 'Gender'
with open('label_encoder_gender.pkl', 'rb') as file:
    label_encoder_gender = pickle.load(file)

# Open and load the standard scaler for numerical features
with open('scaler.pkl', 'rb') as file:
    scaler = pickle.load(file)



In [11]:
# Example input data
input_data = {
    'CreditScore': 600,
    'Geography': 'France',
    'Gender': 'Male',
    'Age': 40,
    'Tenure': 3,
    'Balance': 60000,
    'NumOfProducts': 2,
    'HasCrCard': 1,
    'IsActiveMember': 1,
    'EstimatedSalary': 50000
}

In [None]:
# Convert the single input data dictionary into a pandas DataFrame.
# This ensures it has the correct structure for preprocessing.
input_df=pd.DataFrame([input_data])
input_df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,600,France,Male,40,3,60000,2,1,1,50000


In [14]:
## Encode categorical variables for the new input
# Apply the loaded LabelEncoder to transform the 'Gender' column
# 'Male' will be converted to its corresponding numerical representation (0 or 1)
input_df['Gender']=label_encoder_gender.transform(input_df['Gender'])
input_df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,600,France,1,40,3,60000,2,1,1,50000


In [None]:
# One-hot encode the 'Geography' column using the loaded OneHotEncoder
# `.transform()` expects a 2D array, hence `[[input_data['Geography']]]`
# `.toarray()` converts the sparse output to a dense NumPy array
geo_encoded = onehot_encoder_geo.transform([[input_data['Geography']]]).toarray()
# Create a DataFrame from the one-hot encoded array, using the feature names from the encoder
geo_encoded_df = pd.DataFrame(geo_encoded, columns=label_encoder_geo.get_feature_names_out(['Geography']))
geo_encoded_df




Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0


In [None]:
## Concatenate one-hot encoded columns with the original input DataFrame
# Drop the original 'Geography' column from the input_df as it's now one-hot encoded
# Concatenate the modified input_df with the new geo_encoded_df along columns (axis=1)
input_df=pd.concat([input_df.drop("Geography",axis=1),geo_encoded_df],axis=1)
input_df

Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Geography_Spain
0,600,1,40,3,60000,2,1,1,50000,1.0,0.0,0.0


In [None]:
## Scaling the input data
# Apply the loaded StandardScaler to transform the preprocessed input DataFrame
# This ensures the input features are scaled consistently with the training data
input_scaled=scaler.transform(input_df)
input_scaled

array([[-0.53598516,  0.91324755,  0.10479359, -0.69539349, -0.25781119,
         0.80843615,  0.64920267,  0.97481699, -0.87683221,  1.00150113,
        -0.57946723, -0.57638802]])

In [None]:
## PRedict churn
# Use the loaded ANN model to make a prediction on the scaled input data
prediction=model.predict(input_scaled)
prediction



array([[0.06141514]], dtype=float32)

In [None]:
# Extract the prediction probability (since it's a binary classification with sigmoid output)
# `prediction[0][0]` gets the first (and only) value from the prediction array
prediction_proba = prediction[0][0]

In [None]:
prediction_proba

0.061415136

In [None]:
# Interpret the prediction probability
# If the probability is greater than 0.5, classify as 'likely to churn', otherwise 'not likely to churn'
if prediction_proba > 0.5:
    print('The customer is likely to churn.')
else:
    print('The customer is not likely to churn.')

The customer is not likely to churn.
