# Customer Churn Prediction Model - Experiments

This notebook contains the complete workflow for building a deep learning model to predict customer churn using an Artificial Neural Network (ANN). 

## Project Overview
- **Objective**: Predict whether a bank customer will churn (leave the bank)
- **Model**: Deep Neural Network with TensorFlow/Keras
- **Features**: Customer demographics, account information, and banking behavior
- **Target**: Binary classification (Churn: 1, Stay: 0)

## Workflow
1. Data Loading and Exploration
2. Data Preprocessing and Feature Engineering
3. Model Architecture Design
4. Model Training with Callbacks
5. Model Evaluation and Saving

## File Structure
- **Input**: `../Data/Churn_Modelling.csv`
- **Output**: Model and preprocessors saved to `../PickelFiles/`

In [1]:
# =============================================================================
# 1. IMPORT NECESSARY LIBRARIES
# =============================================================================

# Data manipulation and analysis
import pandas as pd
import numpy as np
import os

# Machine learning utilities
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder

# Model persistence
import pickle

# Set random seed for reproducibility
np.random.seed(42)

print("‚úÖ All libraries imported successfully!")
print(f"üìä Pandas version: {pd.__version__}")
print(f"üî¢ NumPy version: {np.__version__}")
print("üöÄ Ready for data analysis and model training!")

‚úÖ All libraries imported successfully!
üìä Pandas version: 2.3.1
üî¢ NumPy version: 1.26.4
üöÄ Ready for data analysis and model training!


In [2]:
# =============================================================================
# 2. DATA LOADING AND INITIAL EXPLORATION
# =============================================================================

# Load the customer churn dataset from the correct path
try:
    data = pd.read_csv("../Data/Churn_Modelling.csv")
    print(f"‚úÖ Dataset loaded successfully!")
    print(f"üìä Dataset shape: {data.shape}")
    print(f"üìã Columns: {list(data.columns)}")
except FileNotFoundError:
    print("‚ùå Error: Dataset not found at '../Data/Churn_Modelling.csv'")
    print("üí° Please ensure the file exists in the Data directory")
    raise

# Display first few rows
print("\nüìÑ First 5 rows of the dataset:")
data.head()

‚úÖ Dataset loaded successfully!
üìä Dataset shape: (10000, 14)
üìã Columns: ['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited']

üìÑ First 5 rows of the dataset:


Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
# =============================================================================
# 3. DATA PREPROCESSING AND CLEANING
# =============================================================================

# Display basic information about the dataset
print("üìä Dataset Information:")
print("=" * 50)
data.info()

print("\nüìà Basic Statistics:")
print(data.describe())

print("\nüéØ Target Variable Distribution:")
print(data['Exited'].value_counts())
print(f"Churn Rate: {data['Exited'].mean():.2%}")

# Drop irrelevant columns that don't contribute to prediction
# - RowNumber: Just an index, not a feature
# - CustomerId: Unique identifier, not predictive
# - Surname: Customer name, not relevant for churn prediction
columns_to_drop = ['RowNumber', 'CustomerId', 'Surname']
data = data.drop(columns_to_drop, axis=1)

print(f"\n‚úÖ Dropped columns: {columns_to_drop}")
print(f"üìã Remaining columns: {list(data.columns)}")
print(f"üìä New dataset shape: {data.shape}")

# Display cleaned dataset
data.head()

üìä Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB

üìà Basic Statistics:
         RowNumber    CustomerId   CreditSco

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
# =============================================================================
# 4. CATEGORICAL VARIABLE ENCODING - GENDER
# =============================================================================

# 4.1 Label Encoding for Gender (Binary categorical variable)
# Gender has only 2 categories (Male/Female), so LabelEncoder is appropriate
print("üî§ Encoding Gender column...")
print(f"Original Gender values: {data['Gender'].unique()}")

label_encoder_gender = LabelEncoder()
data['Gender'] = label_encoder_gender.fit_transform(data['Gender'])

print("‚úÖ Gender encoding completed:")
print(f"üìã Original gender classes: {label_encoder_gender.classes_}")
print(f"üî¢ Encoded values: {dict(zip(label_encoder_gender.classes_, range(len(label_encoder_gender.classes_))))}")
print(f"üìä Unique values in Gender column: {sorted(data['Gender'].unique())}")

# Display sample of data with encoded Gender column
print("\nüìÑ Sample data after Gender encoding:")
data.head(10)

üî§ Encoding Gender column...
Original Gender values: ['Female' 'Male']
‚úÖ Gender encoding completed:
üìã Original gender classes: ['Female' 'Male']
üî¢ Encoded values: {'Female': 0, 'Male': 1}
üìä Unique values in Gender column: [0, 1]

üìÑ Sample data after Gender encoding:


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,0,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,0,41,1,83807.86,1,0,1,112542.58,0
2,502,France,0,42,8,159660.8,3,1,0,113931.57,1
3,699,France,0,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,0,43,2,125510.82,1,1,1,79084.1,0
5,645,Spain,1,44,8,113755.78,2,1,0,149756.71,1
6,822,France,1,50,7,0.0,2,1,1,10062.8,0
7,376,Germany,0,29,4,115046.74,4,1,0,119346.88,1
8,501,France,1,44,4,142051.07,2,0,1,74940.5,0
9,684,France,1,27,2,134603.88,1,1,1,71725.73,0


In [5]:
# =============================================================================
# 5. CATEGORICAL VARIABLE ENCODING - GEOGRAPHY
# =============================================================================

# 5.1 Check unique values in Geography column before encoding
print("üåç Geography Analysis:")
geography_values = data['Geography'].unique()
print(f"Unique Geography values: {geography_values}")
print(f"Number of unique geography values: {len(geography_values)}")
print(f"Geography distribution:\n{data['Geography'].value_counts()}")

# 5.2 One-Hot Encoding for Geography (Multi-categorical variable)
# Geography has 3+ categories, so One-Hot Encoding prevents ordinality issues
# This creates binary columns for each geography category
print("\nüîÑ Applying One-Hot Encoding to Geography...")

onehot_encoder_geo = OneHotEncoder()
geo_encoded = onehot_encoder_geo.fit_transform(data[['Geography']])

print("‚úÖ Geography One-Hot Encoding completed:")
print(f"üìä Original geography categories: {data['Geography'].unique()}")
print(f"üìè Encoded array shape: {geo_encoded.shape}")
print(f"üè∑Ô∏è Feature names: {list(onehot_encoder_geo.get_feature_names_out(['Geography']))}")

# Convert sparse matrix to dense array for easier handling
geo_encoded_array = geo_encoded.toarray()
print(f"üìä First 5 rows of encoded geography:")
print(geo_encoded_array[:5])

üåç Geography Analysis:
Unique Geography values: ['France' 'Spain' 'Germany']
Number of unique geography values: 3
Geography distribution:
Geography
France     5014
Germany    2509
Spain      2477
Name: count, dtype: int64

üîÑ Applying One-Hot Encoding to Geography...
‚úÖ Geography One-Hot Encoding completed:
üìä Original geography categories: ['France' 'Spain' 'Germany']
üìè Encoded array shape: (10000, 3)
üè∑Ô∏è Feature names: ['Geography_France', 'Geography_Germany', 'Geography_Spain']
üìä First 5 rows of encoded geography:
[[1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]


In [6]:
# Display the feature names created by One-Hot Encoder
print("üè∑Ô∏è Generated feature names from One-Hot Encoding:")
feature_names = onehot_encoder_geo.get_feature_names_out(['Geography'])
for i, name in enumerate(feature_names):
    print(f"{i+1}. {name}")

print(f"\nüìä Total new features created: {len(feature_names)}")
print("üí° Each geography location gets its own binary column")

üè∑Ô∏è Generated feature names from One-Hot Encoding:
1. Geography_France
2. Geography_Germany
3. Geography_Spain

üìä Total new features created: 3
üí° Each geography location gets its own binary column


In [7]:
# Convert the encoded array to a DataFrame for easier handling
print("üîÑ Converting encoded geography to DataFrame...")

geo_encoded_df = pd.DataFrame(
    geo_encoded_array, 
    columns=onehot_encoder_geo.get_feature_names_out(['Geography']),
    index=data.index  # Maintain the same index as original data
)

print("‚úÖ Geography encoding DataFrame created")
print(f"üìä Shape: {geo_encoded_df.shape}")
print(f"üìã Columns: {list(geo_encoded_df.columns)}")

print("\nüìÑ Encoded Geography DataFrame (first 10 rows):")
geo_encoded_df.head(10)

üîÑ Converting encoded geography to DataFrame...
‚úÖ Geography encoding DataFrame created
üìä Shape: (10000, 3)
üìã Columns: ['Geography_France', 'Geography_Germany', 'Geography_Spain']

üìÑ Encoded Geography DataFrame (first 10 rows):


Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0
1,0.0,0.0,1.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,0.0,0.0,1.0
5,0.0,0.0,1.0
6,1.0,0.0,0.0
7,0.0,1.0,0.0
8,1.0,0.0,0.0
9,1.0,0.0,0.0


In [8]:
# =============================================================================
# 6. FEATURE ENGINEERING - COMBINE ENCODED FEATURES
# =============================================================================

print("üîÑ Combining original data with encoded geography features...")

# Store original shape for comparison
original_shape = data.shape
print(f"üìä Original data shape: {original_shape}")

# Remove the original categorical Geography column
data = data.drop('Geography', axis=1)
print(f"üìä After dropping Geography: {data.shape}")

# Concatenate the encoded geography features with the main dataset
data = pd.concat([data, geo_encoded_df], axis=1)

print("‚úÖ Feature engineering completed!")
print(f"üìä Final dataset shape: {data.shape}")
print(f"üìã Final columns: {list(data.columns)}")
print(f"‚ûï Added {data.shape[1] - (original_shape[1] - 1)} new encoded columns")

# Verify no missing values
print(f"\nüîç Missing values check: {data.isnull().sum().sum()} missing values")

# Display the final preprocessed dataset
print("\nüìÑ Final Preprocessed Dataset (first 5 rows):")
data.head()

üîÑ Combining original data with encoded geography features...
üìä Original data shape: (10000, 11)
üìä After dropping Geography: (10000, 10)
‚úÖ Feature engineering completed!
üìä Final dataset shape: (10000, 13)
üìã Final columns: ['CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited', 'Geography_France', 'Geography_Germany', 'Geography_Spain']
‚ûï Added 3 new encoded columns

üîç Missing values check: 0 missing values

üìÑ Final Preprocessed Dataset (first 5 rows):


Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,0,42,2,0.0,1,1,1,101348.88,1,1.0,0.0,0.0
1,608,0,41,1,83807.86,1,0,1,112542.58,0,0.0,0.0,1.0
2,502,0,42,8,159660.8,3,1,0,113931.57,1,1.0,0.0,0.0
3,699,0,39,1,0.0,2,0,0,93826.63,0,1.0,0.0,0.0
4,850,0,43,2,125510.82,1,1,1,79084.1,0,0.0,0.0,1.0


In [9]:
# =============================================================================
# 7. SAVE PREPROCESSORS FOR FUTURE USE
# =============================================================================

# Create PickelFiles directory if it doesn't exist
os.makedirs('../PickelFiles', exist_ok=True)
print("üìÅ Created/verified PickelFiles directory")

# Save the fitted encoders and scaler for use in prediction
# These will be needed to preprocess new data for predictions
print("üíæ Saving preprocessors...")

try:
    # Save Label Encoder for Gender
    with open('../PickelFiles/label_encoder_gender.pkl', 'wb') as file:
        pickle.dump(label_encoder_gender, file)
    print("‚úÖ Gender Label Encoder saved to '../PickelFiles/label_encoder_gender.pkl'")

    # Save One-Hot Encoder for Geography
    with open('../PickelFiles/onehot_encoder_geo.pkl', 'wb') as file:
        pickle.dump(onehot_encoder_geo, file)
    print("‚úÖ Geography One-Hot Encoder saved to '../PickelFiles/onehot_encoder_geo.pkl'")

    print("\nüíæ All preprocessors saved successfully!")
    print("üîó These files will be used by the Streamlit app for consistent preprocessing")
    
except Exception as e:
    print(f"‚ùå Error saving preprocessors: {e}")
    raise

üìÅ Created/verified PickelFiles directory
üíæ Saving preprocessors...
‚úÖ Gender Label Encoder saved to '../PickelFiles/label_encoder_gender.pkl'
‚úÖ Geography One-Hot Encoder saved to '../PickelFiles/onehot_encoder_geo.pkl'

üíæ All preprocessors saved successfully!
üîó These files will be used by the Streamlit app for consistent preprocessing


In [10]:
# Verify the final processed data structure
print("üîç Final Data Verification:")
print(f"üìä Shape: {data.shape}")
print(f"üìã Columns: {list(data.columns)}")
print(f"üéØ Target column present: {'Exited' in data.columns}")

# Check data types
print(f"\nüìà Data Types:")
print(data.dtypes)

# Display final dataset
data.head()

üîç Final Data Verification:
üìä Shape: (10000, 13)
üìã Columns: ['CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited', 'Geography_France', 'Geography_Germany', 'Geography_Spain']
üéØ Target column present: True

üìà Data Types:
CreditScore            int64
Gender                 int32
Age                    int64
Tenure                 int64
Balance              float64
NumOfProducts          int64
HasCrCard              int64
IsActiveMember         int64
EstimatedSalary      float64
Exited                 int64
Geography_France     float64
Geography_Germany    float64
Geography_Spain      float64
dtype: object


Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,0,42,2,0.0,1,1,1,101348.88,1,1.0,0.0,0.0
1,608,0,41,1,83807.86,1,0,1,112542.58,0,0.0,0.0,1.0
2,502,0,42,8,159660.8,3,1,0,113931.57,1,1.0,0.0,0.0
3,699,0,39,1,0.0,2,0,0,93826.63,0,1.0,0.0,0.0
4,850,0,43,2,125510.82,1,1,1,79084.1,0,0.0,0.0,1.0


In [11]:
# =============================================================================
# 8. PREPARE FEATURES AND TARGET VARIABLES
# =============================================================================

# Separate features (X) and target variable (y)
# Features: All columns except 'Exited' (the target we want to predict)
# Target: 'Exited' column (1 = customer churned, 0 = customer stayed)

X = data.drop('Exited', axis=1)  # Features
y = data['Exited']               # Target

print("üéØ Feature-Target Separation:")
print(f"üìä Features (X) shape: {X.shape}")
print(f"üéØ Target (y) shape: {y.shape}")
print(f"üìã Feature columns: {list(X.columns)}")
print(f"üéØ Target distribution: {y.value_counts().to_dict()}")

# =============================================================================
# 9. TRAIN-TEST SPLIT AND FEATURE SCALING
# =============================================================================

# Split the dataset into training and testing sets
# 80% for training, 20% for testing
# random_state=42 ensures reproducible results
# stratify=y maintains the same proportion of target classes in both sets

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=y
)

print("\nüìä Train-Test Split:")
print(f"üìà Training set: X_train {X_train.shape}, y_train {y_train.shape}")
print(f"üìâ Testing set: X_test {X_test.shape}, y_test {y_test.shape}")
print(f"üéØ Train churn rate: {y_train.mean():.2%}")
print(f"üéØ Test churn rate: {y_test.mean():.2%}")

# Feature Scaling using StandardScaler
# Neural networks perform better with normalized/standardized features
print("\n‚öñÔ∏è Feature Scaling:")
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)  # Fit and transform training data
X_test = scaler.transform(X_test)        # Only transform test data (no fitting)

print("‚úÖ Features scaled using StandardScaler")
print(f"üìä Scaled training data shape: {X_train.shape}")
print(f"üìä Scaled testing data shape: {X_test.shape}")

üéØ Feature-Target Separation:
üìä Features (X) shape: (10000, 12)
üéØ Target (y) shape: (10000,)
üìã Feature columns: ['CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Geography_France', 'Geography_Germany', 'Geography_Spain']
üéØ Target distribution: {0: 7963, 1: 2037}

üìä Train-Test Split:
üìà Training set: X_train (8000, 12), y_train (8000,)
üìâ Testing set: X_test (2000, 12), y_test (2000,)
üéØ Train churn rate: 20.38%
üéØ Test churn rate: 20.35%

‚öñÔ∏è Feature Scaling:
‚úÖ Features scaled using StandardScaler
üìä Scaled training data shape: (8000, 12)
üìä Scaled testing data shape: (2000, 12)


In [12]:
# Preview the scaled training data
print("üìä Scaled Training Data (X_train) - First 5 samples:")
print("Features are now standardized (mean‚âà0, std‚âà1)")
print(X_train[:5])

print(f"\nüìà Scaling Verification:")
print(f"Mean of scaled features: {X_train.mean():.6f}")
print(f"Standard deviation of scaled features: {X_train.std():.6f}")
print("üí° Values close to 0 and 1 respectively indicate proper scaling")

üìä Scaled Training Data (X_train) - First 5 samples:
Features are now standardized (mean‚âà0, std‚âà1)
[[ 1.058568    0.90750738  1.71508648  0.68472287 -1.22605881 -0.91025649
   0.64104192 -1.030206    1.04208392  1.00175153 -0.57831252 -0.57773517]
 [ 0.91362605  0.90750738 -0.65993547 -0.6962018   0.41328769 -0.91025649
   0.64104192 -1.030206   -0.62355635 -0.99825153  1.72916886 -0.57773517]
 [ 1.07927399 -1.10191942 -0.18493108 -1.73189531  0.60168748  0.80883036
   0.64104192  0.97067965  0.30812779 -0.99825153  1.72916886 -0.57773517]
 [-0.92920731  0.90750738 -0.18493108 -0.00573947 -1.22605881  0.80883036
   0.64104192 -1.030206   -0.29019914  1.00175153 -0.57831252 -0.57773517]
 [ 0.42703522  0.90750738  0.95507945  0.3394917   0.54831832  0.80883036
  -1.55996038  0.97067965  0.13504224 -0.99825153  1.72916886 -0.57773517]]

üìà Scaling Verification:
Mean of scaled features: -0.000000
Standard deviation of scaled features: 1.000000
üí° Values close to 0 and 1 respectiv

In [13]:
# Save the fitted scaler for future use in predictions
# This is crucial for maintaining consistency in feature scaling
try:
    with open('../PickelFiles/scaler.pkl', 'wb') as file:
        pickle.dump(scaler, file)
    
    print("üíæ StandardScaler saved successfully to '../PickelFiles/scaler.pkl'!")
    print("‚ö†Ô∏è  Important: Use the same scaler for new predictions to maintain consistency")
    
except Exception as e:
    print(f"‚ùå Error saving scaler: {e}")
    raise

üíæ StandardScaler saved successfully to '../PickelFiles/scaler.pkl'!
‚ö†Ô∏è  Important: Use the same scaler for new predictions to maintain consistency

‚ö†Ô∏è  Important: Use the same scaler for new predictions to maintain consistency


In [14]:
# Final verification before model training
print("‚úÖ Data Preparation Summary:")
print("=" * 50)
print(f"‚úÖ Original dataset loaded from '../Data/Churn_Modelling.csv'")
print(f"‚úÖ Categorical variables encoded (Gender: Label, Geography: One-Hot)")
print(f"‚úÖ Features scaled using StandardScaler")
print(f"‚úÖ Data split into train/test sets (80/20)")
print(f"‚úÖ All preprocessors saved to '../PickelFiles/'")

print(f"\nüìä Final Data Shapes:")
print(f"üî¢ Number of input features: {X_train.shape[1]}")
print(f"üìà Training samples: {X_train.shape[0]}")
print(f"üìâ Testing samples: {X_test.shape[0]}")

print("\nüöÄ Ready for Neural Network Model Development!")

‚úÖ Data Preparation Summary:
‚úÖ Original dataset loaded from '../Data/Churn_Modelling.csv'
‚úÖ Categorical variables encoded (Gender: Label, Geography: One-Hot)
‚úÖ Features scaled using StandardScaler
‚úÖ Data split into train/test sets (80/20)
‚úÖ All preprocessors saved to '../PickelFiles/'

üìä Final Data Shapes:
üî¢ Number of input features: 12
üìà Training samples: 8000
üìâ Testing samples: 2000

üöÄ Ready for Neural Network Model Development!


# üß† Artificial Neural Network (ANN) Implementation

## Model Architecture Design
We'll build a deep neural network for binary classification with the following characteristics:

### Architecture Overview
- **Input Layer**: Accepts all preprocessed features
- **Hidden Layer 1**: 64 neurons with ReLU activation
- **Hidden Layer 2**: 32 neurons with ReLU activation  
- **Output Layer**: 1 neuron with Sigmoid activation (probability output)

### Key Design Decisions
- **ReLU Activation**: Prevents vanishing gradient problem
- **Sigmoid Output**: Outputs probability between 0 and 1
- **Decreasing Layer Size**: Creates hierarchical feature learning
- **Binary Classification**: Perfect for churn prediction (Yes/No)

### Training Strategy
- **Optimizer**: Adam (adaptive learning rate)
- **Loss Function**: Binary Crossentropy
- **Metrics**: Accuracy tracking
- **Callbacks**: Early Stopping, TensorBoard logging

In [15]:
# =============================================================================
# 10. DEEP LEARNING MODEL IMPLEMENTATION
# =============================================================================

# Import TensorFlow and Keras components for deep learning
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
import datetime

# Set random seed for reproducibility
tf.random.set_seed(42)

print("üß† Starting Deep Learning Model Development...")
print(f"üîß TensorFlow version: {tf.__version__}")
print(f"üñ•Ô∏è GPU Available: {len(tf.config.list_physical_devices('GPU')) > 0}")
print(f"üíæ Physical devices: {[device.name for device in tf.config.list_physical_devices()]}")
print("‚úÖ All TensorFlow components imported successfully!")


üß† Starting Deep Learning Model Development...
üîß TensorFlow version: 2.15.0
üñ•Ô∏è GPU Available: False
üíæ Physical devices: ['/physical_device:CPU:0']
‚úÖ All TensorFlow components imported successfully!
üß† Starting Deep Learning Model Development...
üîß TensorFlow version: 2.15.0
üñ•Ô∏è GPU Available: False
üíæ Physical devices: ['/physical_device:CPU:0']
‚úÖ All TensorFlow components imported successfully!


In [16]:
# Check the number of input features for the model architecture
input_features = X_train.shape[1]
print(f"üî¢ Number of input features: {input_features}")
print(f"üìê Input shape for neural network: {(input_features,)}")
print("üí° This will be the input layer size for our neural network")

print(f"\nüìä Training Data Summary:")
print(f"üìà Training samples: {X_train.shape[0]:,}")
print(f"üìâ Testing samples: {X_test.shape[0]:,}")
print(f"üéØ Features per sample: {input_features}")
print(f"üìä Total training parameters needed: ~{input_features * 64:,} (first layer)")

üî¢ Number of input features: 12
üìê Input shape for neural network: (12,)
üí° This will be the input layer size for our neural network

üìä Training Data Summary:
üìà Training samples: 8,000
üìâ Testing samples: 2,000
üéØ Features per sample: 12
üìä Total training parameters needed: ~768 (first layer)


In [17]:
# =============================================================================
# 11. MODEL ARCHITECTURE DESIGN
# =============================================================================

# Build an optimized ANN model for churn prediction
# Architecture: Input ‚Üí Hidden Layer 1 ‚Üí Hidden Layer 2 ‚Üí Output

model = Sequential([
    # Input Layer + First Hidden Layer
    # 64 neurons with ReLU activation for non-linearity
    Dense(64, activation='relu', input_shape=(input_features,), name='hidden_layer_1'),
    BatchNormalization(),  # Normalize inputs to each layer
    Dropout(0.3),          # Prevent overfitting by randomly setting 30% neurons to 0
    
    # Second Hidden Layer
    # 32 neurons (decreasing size for hierarchical feature learning)
    Dense(32, activation='relu', name='hidden_layer_2'),
    BatchNormalization(),
    Dropout(0.2),          # Lower dropout rate for deeper layer
    
    # Output Layer
    # 1 neuron with sigmoid activation for binary classification (0-1 probability)
    Dense(1, activation='sigmoid', name='output_layer')
])

print("üèóÔ∏è Model Architecture Created!")
print("üìã Architecture: Input ‚Üí Dense(64) ‚Üí BatchNorm ‚Üí Dropout(0.3) ‚Üí Dense(32) ‚Üí BatchNorm ‚Üí Dropout(0.2) ‚Üí Dense(1)")
print("üîß Activation Functions: ReLU (hidden layers), Sigmoid (output layer)")
print("‚ö° Regularization: Batch Normalization + Dropout")


üèóÔ∏è Model Architecture Created!
üìã Architecture: Input ‚Üí Dense(64) ‚Üí BatchNorm ‚Üí Dropout(0.3) ‚Üí Dense(32) ‚Üí BatchNorm ‚Üí Dropout(0.2) ‚Üí Dense(1)
üîß Activation Functions: ReLU (hidden layers), Sigmoid (output layer)
‚ö° Regularization: Batch Normalization + Dropout
üèóÔ∏è Model Architecture Created!
üìã Architecture: Input ‚Üí Dense(64) ‚Üí BatchNorm ‚Üí Dropout(0.3) ‚Üí Dense(32) ‚Üí BatchNorm ‚Üí Dropout(0.2) ‚Üí Dense(1)
üîß Activation Functions: ReLU (hidden layers), Sigmoid (output layer)
‚ö° Regularization: Batch Normalization + Dropout


In [18]:
# Display detailed model architecture
print("üìã Detailed Model Summary:")
print("=" * 60)
model.summary()

# Calculate and display total parameters
total_params = model.count_params()
trainable_params = sum([tf.keras.backend.count_params(w) for w in model.trainable_weights])
non_trainable_params = sum([tf.keras.backend.count_params(w) for w in model.non_trainable_weights])

print(f"\nüìä Parameter Analysis:")
print(f"üî¢ Total Parameters: {total_params:,}")
print(f"üéØ Trainable Parameters: {trainable_params:,}")
print(f"üîí Non-trainable Parameters: {non_trainable_params:,}")
print("üí° More parameters = higher capacity but risk of overfitting")

# Estimate model complexity
print(f"\nüß† Model Complexity:")
print(f"üìè Model depth: {len(model.layers)} layers")
print(f"üîó Connections: {trainable_params:,} weights and biases to learn")
print(f"üíæ Memory footprint: ~{(total_params * 4) / 1024:.1f} KB (32-bit floats)")

üìã Detailed Model Summary:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 hidden_layer_1 (Dense)      (None, 64)                832       
                                                                 
 batch_normalization (Batch  (None, 64)                256       
 Normalization)                                                  
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 hidden_layer_2 (Dense)      (None, 32)                2080      
                                                                 
 batch_normalization_1 (Bat  (None, 32)                128       
 chNormalization)                                                
                                                                 
___________________________

In [19]:
# =============================================================================
# 12. MODEL COMPILATION CONFIGURATION
# =============================================================================

# Configure optimizer and loss function for training
# Using string names for better TensorFlow version compatibility

print("‚öôÔ∏è Configuring Model Training Parameters...")

# Learning rate selection
learning_rate = 0.001  # Default Adam learning rate, good starting point
print(f"üìà Learning rate: {learning_rate}")

# Create optimizer and loss function
# Using string names instead of objects for better compatibility
optimizer_name = 'adam'
loss_function = 'binary_crossentropy'

print(f"üîß Optimizer: {optimizer_name}")
print(f"üìâ Loss function: {loss_function}")
print(f"üìä Metrics: accuracy")
print("üí° Using string names for better TensorFlow version compatibility")

‚öôÔ∏è Configuring Model Training Parameters...
üìà Learning rate: 0.001
üîß Optimizer: adam
üìâ Loss function: binary_crossentropy
üìä Metrics: accuracy
üí° Using string names for better TensorFlow version compatibility


In [20]:
# Compile the model with optimized hyperparameters
model.compile(
    optimizer=optimizer_name,        # Use string name for compatibility
    loss=loss_function,              # Binary crossentropy for binary classification
    metrics=['accuracy']             # Track accuracy during training
)

print("‚úÖ Model Compiled Successfully!")
print(f"üîß Optimizer: {optimizer_name}")
print(f"üìâ Loss Function: {loss_function}")
print(f"üìä Metrics: accuracy")
print("üöÄ Model is ready for training!")


‚úÖ Model Compiled Successfully!
üîß Optimizer: adam
üìâ Loss Function: binary_crossentropy
üìä Metrics: accuracy
üöÄ Model is ready for training!
‚úÖ Model Compiled Successfully!
üîß Optimizer: adam
üìâ Loss Function: binary_crossentropy
üìä Metrics: accuracy
üöÄ Model is ready for training!


In [21]:
# =============================================================================
# 13. SETUP TRAINING CALLBACKS
# =============================================================================

# Create timestamp for unique TensorBoard log directory
log_dir = "../logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# Ensure logs directory exists
os.makedirs("../logs/fit", exist_ok=True)

# Setup TensorBoard callback for training visualization
tensorflow_callback = TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,           # Log weight histograms every epoch
    write_graph=True,           # Log the model graph
    write_images=True,          # Log model weights as images
    profile_batch=0             # Disable profiling for performance
)

print("üìä TensorBoard Configuration:")
print(f"üìÅ Log directory: {log_dir}")
print("üìà Logging: Loss, Accuracy, Weight Histograms, Model Graph")
print("üí° Use 'tensorboard --logdir ../logs/fit' to view training progress")
print("‚úÖ TensorBoard callback configured!")

üìä TensorBoard Configuration:
üìÅ Log directory: ../logs/fit/20250801-162303
üìà Logging: Loss, Accuracy, Weight Histograms, Model Graph
üí° Use 'tensorboard --logdir ../logs/fit' to view training progress
‚úÖ TensorBoard callback configured!


In [22]:
# Setup Early Stopping to prevent overfitting
early_stopping_callback = EarlyStopping(
    monitor='val_loss',              # Monitor validation loss
    patience=10,                     # Wait 10 epochs before stopping
    restore_best_weights=True,       # Restore best weights when stopping
    verbose=1                        # Print message when stopping
)

# Add Learning Rate Reduction for better convergence
lr_reducer = ReduceLROnPlateau(
    monitor='val_loss',              # Monitor validation loss
    factor=0.5,                      # Reduce LR by half
    patience=5,                      # Wait 5 epochs before reducing
    min_lr=0.0001,                   # Minimum learning rate
    verbose=1                        # Print message when reducing
)

print("‚úÖ Training callbacks configured:")
print("üõë Early Stopping: Prevents overfitting (patience=10)")
print("üìâ Learning Rate Reduction: Improves convergence (patience=5)")
print("üìä TensorBoard: Logs training metrics and visualizations")
print("üéØ All callbacks ready for training!")

‚úÖ Training callbacks configured:
üõë Early Stopping: Prevents overfitting (patience=10)
üìâ Learning Rate Reduction: Improves convergence (patience=5)
üìä TensorBoard: Logs training metrics and visualizations
üéØ All callbacks ready for training!


In [23]:
# =============================================================================
# 14. MODEL TRAINING
# =============================================================================

print("üöÄ Starting Model Training...")
print("‚è±Ô∏è This may take a few minutes depending on your hardware.")
print("=" * 60)

# Train the model with optimized parameters
history = model.fit(
    X_train, y_train,                           # Training data
    validation_data=(X_test, y_test),           # Validation data
    epochs=100,                                 # Maximum epochs
    batch_size=32,                             # Batch size for training
    callbacks=[                                # Training callbacks
        tensorflow_callback,                   # TensorBoard logging
        early_stopping_callback,               # Early stopping
        lr_reducer                             # Learning rate reduction
    ],
    verbose=1                                  # Show training progress
)

print("\n‚úÖ Model Training Completed!")
print(f"üìä Total epochs trained: {len(history.history['loss'])}")
print("üìà Check TensorBoard for detailed training metrics visualization")
print(f"üíæ Training history saved in 'history' variable")

# Display final training metrics
final_train_loss = history.history['loss'][-1]
final_val_loss = history.history['val_loss'][-1]
final_train_acc = history.history['accuracy'][-1]
final_val_acc = history.history['val_accuracy'][-1]

print(f"\nüéØ Final Training Metrics:")
print(f"üìâ Training Loss: {final_train_loss:.4f}")
print(f"üìà Validation Loss: {final_val_loss:.4f}")
print(f"üéØ Training Accuracy: {final_train_acc:.4f} ({final_train_acc*100:.2f}%)")
print(f"üéØ Validation Accuracy: {final_val_acc:.4f} ({final_val_acc*100:.2f}%)")

üöÄ Starting Model Training...
‚è±Ô∏è This may take a few minutes depending on your hardware.
Epoch 1/100
Epoch 1/100




Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 3/100
Epoch 4/100
Epoch 4/100
Epoch 5/100
Epoch 5/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
Epoch 8/100
Epoch 8/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 10/100
Epoch 11/100
Epoch 11/100
Epoch 12/100
Epoch 12/100
Epoch 13/100
Epoch 13/100
Epoch 14/100
Epoch 14/100
Epoch 15/100
Epoch 15/100
Epoch 16/100
Epoch 16/100
Epoch 17/100
Epoch 17/100
Epoch 18/100
Epoch 18/100
Epoch 19/100
Epoch 19/100
Epoch 20/100
Epoch 20/100
Epoch 21/100
Epoch 21/100
Epoch 22/100
Epoch 22/100
Epoch 22: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 23/100
  1/250 [..............................] - ETA: 0s - loss: 0.4198 - accuracy: 0.7500
Epoch 22: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 23/100
Epoch 24/100
Epoch 24/100
Epoch 25/100
Epoch 25/100
Epoch 26/100
Epoch 26/100
Epoc

In [24]:
# =============================================================================
# 15. MODEL SAVING AND EVALUATION
# =============================================================================

print("üíæ Saving Model in Multiple Formats...")

try:
    # Save in newer Keras format (recommended)
    model_keras_path = '../PickelFiles/model.keras'
    model.save(model_keras_path)
    print(f"‚úÖ Model saved in Keras format: {model_keras_path}")

    # Save in H5 format for backward compatibility
    model_h5_path = '../PickelFiles/model.h5'
    model.save(model_h5_path, save_format='h5')
    print(f"‚úÖ Model saved in H5 format: {model_h5_path}")

    print("\nüí° Both formats available:")
    print("  ‚Ä¢ model.keras - Recommended for new deployments")
    print("  ‚Ä¢ model.h5 - For backward compatibility")

except Exception as e:
    print(f"‚ùå Error saving model: {e}")
    raise

# Quick evaluation on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)

print("\nüìä Final Model Performance on Test Set:")
print("=" * 45)
print(f"üî• Test Loss:      {test_loss:.4f}")
print(f"üéØ Test Accuracy:  {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

# Calculate performance metrics
from sklearn.metrics import classification_report, confusion_matrix
y_pred = (model.predict(X_test) > 0.5).astype(int)

print(f"\nüìà Detailed Performance Report:")
print(classification_report(y_test, y_pred))

print(f"\nüéâ Model Training and Saving Complete!")
print(f"üìÅ All files saved to '../PickelFiles/' directory")

üíæ Saving Model in Multiple Formats...
‚úÖ Model saved in Keras format: ../PickelFiles/model.keras
‚úÖ Model saved in H5 format: ../PickelFiles/model.h5

üí° Both formats available:
  ‚Ä¢ model.keras - Recommended for new deployments
  ‚Ä¢ model.h5 - For backward compatibility


  saving_api.save_model(



üìä Final Model Performance on Test Set:
üî• Test Loss:      0.3325
üéØ Test Accuracy:  0.8675 (86.75%)

üìà Detailed Performance Report:
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1593
           1       0.77      0.50      0.60       407

    accuracy                           0.87      2000
   macro avg       0.83      0.73      0.76      2000
weighted avg       0.86      0.87      0.86      2000


üéâ Model Training and Saving Complete!
üìÅ All files saved to '../PickelFiles/' directory

üìà Detailed Performance Report:
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1593
           1       0.77      0.50      0.60       407

    accuracy                           0.87      2000
   macro avg       0.83      0.73      0.76      2000
weighted avg       0.86      0.87      0.86      2000


üéâ Model Training and Saving Complete!
üìÅ All files saved to '../Pick

In [25]:
# =============================================================================
# 16. TENSORBOARD VISUALIZATION
# =============================================================================

# Load TensorBoard extension for Jupyter notebooks
%load_ext tensorboard

print("üìä TensorBoard extension loaded!")
print("üöÄ You can now visualize training metrics, model architecture, and more")
print("\nüìà Available visualizations:")
print("  ‚Ä¢ Training/Validation Loss and Accuracy curves")
print("  ‚Ä¢ Model architecture graph")
print("  ‚Ä¢ Weight and bias histograms")
print("  ‚Ä¢ Learning rate changes")
print("  ‚Ä¢ Gradient distributions")

üìä TensorBoard extension loaded!
üöÄ You can now visualize training metrics, model architecture, and more

üìà Available visualizations:
  ‚Ä¢ Training/Validation Loss and Accuracy curves
  ‚Ä¢ Model architecture graph
  ‚Ä¢ Weight and bias histograms
  ‚Ä¢ Learning rate changes
  ‚Ä¢ Gradient distributions


In [26]:
# Launch TensorBoard to visualize training metrics
print("üöÄ Launching TensorBoard...")
print("üìà Interactive visualization of model training")
print("\n" + "="*50)

%tensorboard --logdir ../logs/fit

print("\nüí° TensorBoard Tips:")
print("‚Ä¢ Scalars: View loss and accuracy curves")
print("‚Ä¢ Graphs: Explore model architecture")
print("‚Ä¢ Histograms: Analyze weight distributions")
print("‚Ä¢ Images: Visualize weight matrices")
print("‚Ä¢ Use the timeline slider to see training progress")

üöÄ Launching TensorBoard...
üìà Interactive visualization of model training




üí° TensorBoard Tips:
‚Ä¢ Scalars: View loss and accuracy curves
‚Ä¢ Graphs: Explore model architecture
‚Ä¢ Histograms: Analyze weight distributions
‚Ä¢ Images: Visualize weight matrices
‚Ä¢ Use the timeline slider to see training progress


In [27]:
# =============================================================================
# 17. EXPERIMENT CONCLUSION
# =============================================================================

print("üéâ CUSTOMER CHURN PREDICTION MODEL - EXPERIMENT COMPLETED!")
print("=" * 60)
print("‚úÖ Data preprocessing completed")
print("‚úÖ Neural network model trained and saved")
print("‚úÖ Encoders and scaler saved for future predictions")
print("‚úÖ TensorBoard logs generated for analysis")

print("\nüìÅ Generated Files in '../PickelFiles/':")
print("  ‚Ä¢ model.keras - Trained neural network (recommended)")
print("  ‚Ä¢ model.h5 - Trained neural network (compatibility)")
print("  ‚Ä¢ label_encoder_gender.pkl - Gender encoder")
print("  ‚Ä¢ onehot_encoder_geo.pkl - Geography encoder") 
print("  ‚Ä¢ scaler.pkl - Feature scaler")

print("\nüìä Generated Logs in '../logs/fit/':")
print("  ‚Ä¢ TensorBoard training logs")
print("  ‚Ä¢ Model architecture graphs")
print("  ‚Ä¢ Training metrics history")

print("\nüîÑ Next Steps:")
print("  1. Use '../Notebook/prediction.ipynb' for individual predictions")
print("  2. Use '../app.py' to run the Streamlit web application")
print("  3. Analyze TensorBoard visualizations for model insights")
print("  4. Consider hyperparameter tuning for improved performance")
print("  5. Deploy to Streamlit Cloud for public access")

print("\nüöÄ Ready for Production Deployment!")
print("üí° All preprocessors and models are saved for consistent predictions")

üéâ CUSTOMER CHURN PREDICTION MODEL - EXPERIMENT COMPLETED!
‚úÖ Data preprocessing completed
‚úÖ Neural network model trained and saved
‚úÖ Encoders and scaler saved for future predictions
‚úÖ TensorBoard logs generated for analysis

üìÅ Generated Files in '../PickelFiles/':
  ‚Ä¢ model.keras - Trained neural network (recommended)
  ‚Ä¢ model.h5 - Trained neural network (compatibility)
  ‚Ä¢ label_encoder_gender.pkl - Gender encoder
  ‚Ä¢ onehot_encoder_geo.pkl - Geography encoder
  ‚Ä¢ scaler.pkl - Feature scaler

üìä Generated Logs in '../logs/fit/':
  ‚Ä¢ TensorBoard training logs
  ‚Ä¢ Model architecture graphs
  ‚Ä¢ Training metrics history

üîÑ Next Steps:
  1. Use '../Notebook/prediction.ipynb' for individual predictions
  2. Use '../app.py' to run the Streamlit web application
  3. Analyze TensorBoard visualizations for model insights
  4. Consider hyperparameter tuning for improved performance
  5. Deploy to Streamlit Cloud for public access

üöÄ Ready for Production Depl