**ANN Practical**
***Dataset***
- Churn dataset
- Output feature
    - Exited (0 or 1)
        - 0 = Person did not exit the bank, 
        - 1 = Person exited the bank
- Input features
    - Non important features: RowNumber, CustomerId, Surname
    - Important features: Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember,EstimatedSalary

***Problem Statement***
- Based on the Input features find out if the person entering the bank is going to leave the Bank or not.
- Exited = 1 (Leaves the bank)
- Exited = 0 (Does not leave the bank)

***Steps to Follow***
- Step 1) Load the dataset
- Step 2) Preprocessing the data
  - Step 2.1) Drop unnecessary columns
- Step 3 - Encoding categorical data
  - Step 3.1) Convert Gender from Male, Female to numeric values 0 and 1
  - Step 3.2) Convert Geography from Text to Numeric using OneHotEncoding
    - Step 3.2.1) Combine the encoded Geography columns back to the main data and drop the original Geography column
- Step 4 ) Save the LabelEncoder and OneHotEncoder using pickle for future use
- Step 5) Divide the data into features - dependent and independent features
- Step 6) Split the data into training and testing sets
- Step 7) Scaling using Standard Scalar
  - Step 7.1) Scale the features using StandardScaler
  - Step 7.2) Save the standardScalar in a pickle file for future use
- Step 8) Build the ANN model
  - Step 8.1) Create the Input Layers, Hidden Layers and Output Layer in our ANN model
  - Step 8.2) Optimizer
  - Step 8.3) Loss - BinaryCrossEntropy (here)
  - Step 8.4) Pass the Optimizer and Loss to the ann_model
  - Step 8.5) Callback > Setup the Tensorboard
    - Step 8.5.1) Create a directory for storing logs and define log file format
    - Step 8.5.2) Initialize the TensorBoard
  - Step 8.6) Callback > Setup Early Stopping
  - Step 8.7) Train the ANN model
  - Step 8.8) Save the model in H5 format
- Step 9) Bring up the Tensorboard
  - Step 9.1) Load Tensorboard Extension
  - Step 9.2) Launch the Tensorboard

    

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
import pickle
from sklearn.preprocessing import OneHotEncoder
import datetime

# ANN imports
from tensorflow.keras.models import load_model
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping,TensorBoard


2026-01-31 12:35:49.363744: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# - Step 1) Load the dataset
data = pd.read_csv("./resources/data/Churn_Modelling.csv")
data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
# Step 2) Preprocessing the data

# Step 2.1) Drop unnecessary columns
# axis=1 (columnwise)
data = data.drop(columns=['RowNumber', 'CustomerId', 'Surname'],axis=1)
data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
# Step 3 - Encoding categorical data
# LabelEncoder converts Categorical data into numeric data

# Step 3.1) Convert Gender from Male, Female to numeric values 0 and 1
labelEncoder = LabelEncoder()
data['Gender']=labelEncoder.fit_transform(data['Gender'])
data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,0,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,0,41,1,83807.86,1,0,1,112542.58,0
2,502,France,0,42,8,159660.8,3,1,0,113931.57,1
3,699,France,0,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,0,43,2,125510.82,1,1,1,79084.1,0


In [5]:
# Step 3.2) Convert Geography from Text to Numeric using OneHotEncoding
# Not using LabelEncoder as assigining 1,2,3 to Germany, France, Spain may mean that Spain(3) is greater than Germany(1) and may confuse the model
# sparse_output = “sparse” refers to how the encoded output matrix is stored in memory.
# - sparse_output = False = Dense numpy array - Full matrix with all zeros and ones
# - sparse_output = True = Memory-efficient, stores only non-zero values
oneHotEncoder = OneHotEncoder(sparse_output=False)
geography_encoded = oneHotEncoder.fit_transform(data[['Geography']])
oneHotEncoder.get_feature_names_out(['Geography'])
geography_encoded = pd.DataFrame(geography_encoded, columns=oneHotEncoder.get_feature_names_out(['Geography']))
geography_encoded.head()


Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0
1,0.0,0.0,1.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,0.0,0.0,1.0


In [6]:
# Step 3.2.1) Combine the encoded Geography columns back to the main data and drop the original Geography column
data = data.drop(columns=['Geography'], axis=1)
data = pd.concat([data, geography_encoded], axis=1)
data.head()

Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,0,42,2,0.0,1,1,1,101348.88,1,1.0,0.0,0.0
1,608,0,41,1,83807.86,1,0,1,112542.58,0,0.0,0.0,1.0
2,502,0,42,8,159660.8,3,1,0,113931.57,1,1.0,0.0,0.0
3,699,0,39,1,0.0,2,0,0,93826.63,0,1.0,0.0,0.0
4,850,0,43,2,125510.82,1,1,1,79084.1,0,0.0,0.0,1.0


In [7]:
# Step 4 ) Save the LabelEncoder and OneHotEncoder using pickle for future use
# wb = write binary mode
with open('./resources/pickle/label_encoder_gender.pkl', 'wb') as le_file: 
    pickle.dump(labelEncoder, le_file)  
    
with open('./resources/pickle/onehot_encoder_geography.pkl', 'wb') as ohe_file: 
    pickle.dump(oneHotEncoder, ohe_file)

In [8]:
# Step 5) Divide the data into features - dependent and independent features
# independent features = x = all columns except 'Exited' column
# dependent feature = y = 'Exited' column
x = data.drop(columns=['Exited'], axis=1)  # independent features
y = data['Exited']  # dependent feature

print('x=',x.head())
print('y=',y.head())


x=    CreditScore  Gender  Age  Tenure    Balance  NumOfProducts  HasCrCard  \
0          619       0   42       2       0.00              1          1   
1          608       0   41       1   83807.86              1          0   
2          502       0   42       8  159660.80              3          1   
3          699       0   39       1       0.00              2          0   
4          850       0   43       2  125510.82              1          1   

   IsActiveMember  EstimatedSalary  Geography_France  Geography_Germany  \
0               1        101348.88               1.0                0.0   
1               1        112542.58               0.0                0.0   
2               0        113931.57               1.0                0.0   
3               0         93826.63               1.0                0.0   
4               1         79084.10               0.0                0.0   

   Geography_Spain  
0              0.0  
1              1.0  
2              0.0  
3    

In [9]:
# Step 6) Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
print('x_train=',x_train.shape)
print('x_test=',x_test.shape)
print('y_train=',y_train.shape)
print('y_test=',y_test.shape)

x_train= (8000, 12)
x_test= (2000, 12)
y_train= (8000,)
y_test= (2000,)


In [10]:
# Step 7) Scaling using Standard Scalar

# Step 7.1) Scale the features using StandardScaler
# Scaling is done so that different features with different units(e.g. weights in kilograms, distance in meters) and different ranges do not bias the model
# These are scaled to a common scale with mean=0 and standard deviation=1
standardScalar = StandardScaler()
x_train = standardScalar.fit_transform(x_train) # fit_transform means learn and transform
x_test = standardScalar.transform(x_test)    # transform only means use the learned parameters (from x_train) to transform x_test
print('Scaled x_train=',x_train[:5])
print('Scaled x_test=',x_test[:5])

Scaled x_train= [[ 0.35649971  0.91324755 -0.6557859   0.34567966 -1.21847056  0.80843615
   0.64920267  0.97481699  1.36766974  1.00150113 -0.57946723 -0.57638802]
 [-0.20389777  0.91324755  0.29493847 -0.3483691   0.69683765  0.80843615
   0.64920267  0.97481699  1.6612541  -0.99850112  1.72572313 -0.57638802]
 [-0.96147213  0.91324755 -1.41636539 -0.69539349  0.61862909 -0.91668767
   0.64920267 -1.02583358 -0.25280688 -0.99850112 -0.57946723  1.73494238]
 [-0.94071667 -1.09499335 -1.13114808  1.38675281  0.95321202 -0.91668767
   0.64920267 -1.02583358  0.91539272  1.00150113 -0.57946723 -0.57638802]
 [-1.39733684  0.91324755  1.62595257  1.38675281  1.05744869 -0.91668767
  -1.54035103 -1.02583358 -1.05960019  1.00150113 -0.57946723 -0.57638802]]
Scaled x_test= [[-0.57749609  0.91324755 -0.6557859  -0.69539349  0.32993735  0.80843615
  -1.54035103 -1.02583358 -1.01960511 -0.99850112  1.72572313 -0.57638802]
 [-0.29729735  0.91324755  0.3900109  -1.38944225 -1.21847056  0.80843615


In [11]:
# Step 7.2) Save the standardScalar in a pickle file for future use
with open('./resources/pickle/standard_scaler_xtrain_xtest.pkl', 'wb') as scaler_file: 
    pickle.dump(standardScalar, scaler_file)    

***Artificial Neural Network (ANN) implementation***

In [12]:
# No of rows in x_train
print('x_train.shape[0] = ',x_train.shape[0])

# No of columns in x_train
print('x_train.shape[1] = ',x_train.shape[1])

# overall shape of x_train
print('x_train.shape = ',x_train.shape)

x_train.shape[0] =  8000
x_train.shape[1] =  12
x_train.shape =  (8000, 12)


In [13]:
# Step 8) Build the ANN model
# Step 8.1) Create the Input Layers, Hidden Layers and Output Layer in our ANN model

# Input Layer and Hidden Layer 1:
# - ann_model.add(Dense(units=64, activation='relu', input_shape=(x_train.shape[1],)))
# - This statement does the following two things:-
# -1) Input Layer: input_shape=(x_train.shape[1] creates the input layer with the no of neurons equal to the no of features(columns) in x_train given by x_train.shape[1]
# -2) Hidden Layer 1: units = 64, tells how many neurons to create in that layer (Hidden Layer 1)

# Keras.Dense library is used to create Hidden Layers with the specified number of neuros
# - e.g. ann_model.add(Dense(units=32, activation='relu')) # Hidden Layer 2

# Hidden Layer 2:
# units = 32 -> create 32 neurons in Hidden Layer 2
# Activation function = relu (Others that can be used in hidden layers = Leaky Relu, TanH, etc)

# Output Layer:
# units = 1 -> create 1 output neuron giving Binary Output i.e. Exited (output feature)= 0/1 


# Activation function
# Hidden Layer - Sigmoid, Tanh, Renu, LeakyRelu, etc
# Output Layer - Sigmoid (for Binary classification (here)), Softmax(for Multi-class classification)



ann_model = Sequential()
ann_model.add(Dense(units=64, activation='relu', input_shape=(x_train.shape[1],))) # Input Layer and Hidden Layer 1
ann_model.add(Dense(units=32, activation='relu')) # Hidden Layer 2
ann_model.add(Dense(units=1, activation='sigmoid')) # Output Layer

ann_model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [14]:
# Step 8.2) Optimizer
# Learning rate controls how big a step the optimizer takes when updating the weights during backward propogation
# In short the Learning rate determines how fast the model learns
# It is a constant that is initialized by us usually to a very SMALL value.

# Optimizer classes
# (1) Adadelta (2) Adafactor (3) Adagrad (4) Adam (5) AdamW (6) Adamx
# Adam in the best optimizer

# Weight updation Formula (during back propogation) = W(New) = W(Old) - (Learning Rate) * (Derivative of Loss) / (Derivative of Weight)
# - W(New) = New Weight
# - W(Old) = Old Weight
ann_optimizer=tf.keras.optimizers.Adam(learning_rate=0.01)

In [15]:

# Step 8.3) Loss - BinaryCrossEntropy(here)
# Loss function for Binary Classification - Use BinayCrossEntropy 
# Loss function for Multiclass Classification - Use SparseCrossEntropy
binary_cross_entropy_loss = tf.keras.losses.BinaryCrossentropy()


In [16]:
# Step 8.4) Pass the Optimizer and Loss to the ann_model

# Accuracy algorithms:
# Classification problems: metrics = ["accuracy"]
# Regression problems : metrics = MSE, MAE, etc
ann_model.compile(optimizer=ann_optimizer,loss=binary_cross_entropy_loss, metrics=["accuracy"])

In [17]:
# Step 8.5) Callback > Setup the Tensorboard
from tensorflow.keras.callbacks import EarlyStopping,TensorBoard

# Step 8.5.1) Create a director for storing logs and define log file format
# - These logs will be used by Tensorboard library for Visualization
logs_directory = "./resources/logs/" +datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# Step 8.5.2) Initialize the TensorBoard
tensorflow_callback = TensorBoard(log_dir=logs_directory, histogram_freq=1)

In [18]:

# Step 8.6) Callback > Setup Early Stopping
# patience = 10, means keep running the ANN till atleast 10 epochs even if the loss is not decreasing.
# After 10 epochs if the loss is not decreasing then apply Early Stopping and stop the ANN - Dont run it for the complete 100 epochs as show in Step 8.6

# - We can train our model for any number of epochs e.g. 100
# - In each epoch the Loss value should continuously decrease with the main goal to get close to the Global Minima
# - If the Loss value is not decreasing then we need NOT run the model for all epochs and can stop the model early (defined by patience attribute, e.g. patience = 10, meaning run for atleasat 10 epochs even if Loss is not decreasing)
# - This early stoppage of the model is known as Early Stopping. 

# restore_best_weights = True
# - “When training stops, roll back the model to the weights from the epoch where validation loss was lowest.”'
# - Keras keeps track of the best epoch (lowest val_loss)
# - This is the preferred approach

# restore_best_weights = False
# - Model keeps the weights of the last Epoch run
early_stopping_callback=EarlyStopping(monitor='val_loss',patience=10,restore_best_weights=True)




In [19]:
# Step 8.7) Train the ANN model
history = ann_model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=100,
callbacks=[tensorflow_callback, early_stopping_callback])

Epoch 1/100


[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.8354 - loss: 0.3963 - val_accuracy: 0.8530 - val_loss: 0.3637
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8555 - loss: 0.3581 - val_accuracy: 0.8560 - val_loss: 0.3491
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8561 - loss: 0.3495 - val_accuracy: 0.8620 - val_loss: 0.3453
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8597 - loss: 0.3472 - val_accuracy: 0.8640 - val_loss: 0.3396
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8576 - loss: 0.3438 - val_accuracy: 0.8545 - val_loss: 0.3410
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8619 - loss: 0.3369 - val_accuracy: 0.8535 - val_loss: 0.3479
Epoch 7/100
[1m250/250[0m [32m━

In [20]:
# Step 8.8) Save the model in H5 format
ann_model.save('./resources/pickle/ann_model.h5')



In [21]:
# Step 9) Bring up the Tensorboard
# Tensorboard is to visualize all your logs while training your entire model.

# Step 9.1) Load Tensorboard Extension
%load_ext tensorboard

In [22]:
# Step 9.2) Launch the Tensorboard
%tensorboard --logdir ./resources/logs/

In [23]:
# Step 10) Load the labelEncoder, onehot_encoder and standard_scaler from pickle files 
# - labelEncoder (Step 3.1) - used to convert Gender from Male, Female to numeric values 0 and 1
# - oneHotEncoder (Step 3.2) - used to convert Geography from Text to Numeric using OneHotEncoding
# - standardScalar (Step 7.2) - Scale the x_train and x_test  data  using StandardScalar so all columns have values in a similar range
# - ann_model (Step 8.8) - Trained Model using ANN (Artificial Neural Network)

with open('./resources/pickle/label_encoder_gender.pkl', 'rb') as le_file: 
    loaded_label_encoder_gender = pickle.load(le_file)  
with open('./resources/pickle/onehot_encoder_geography.pkl', 'rb') as ohe_file:               
    loaded_onehot_encoder_geography = pickle.load(ohe_file)   
with open('./resources/pickle/standard_scaler_xtrain_xtest.pkl', 'rb') as scaler_file: 
    loaded_scaler_xtrain_xtest = pickle.load(scaler_file)

### Load the trained model
model=load_model('./resources/pickle/ann_model.h5')



In [24]:
# Step 11) - Create new input_data for prediction. (JSON format)
# Step 11.1) Convert the input_data to a DataFrame
# Input Data is in JSON format when created (below)

input_data = {
    'CreditScore': 600,
    'Geography': 'France',
    'Gender': 'Male',
    'Age': 40,
    'Tenure': 3,
    'Balance': 60000,
    'NumOfProducts': 2,
    'HasCrCard': 1,
    'IsActiveMember': 1,
    'EstimatedSalary': 50000
}

# Convert the input_data to a DataFrame
input_df = pd.DataFrame([input_data])
print('Input DataFrame:')
print(input_df.shape)

Input DataFrame:
(1, 10)


In [25]:

# Step 12) - Before prediction, preprocess this data:-
# Preprocessing:-
# - Convert "Geography" to numeric using OneHotEncoding
# - Scale the data using StandardScaler
# - Convert Gender to numeric using LabelEncoder


# Step 12.1) Convert "Geography" to numeric using OneHotEncoding
# Convert the Geography of input_data( i.e. France) to Numeric using One Hot Encoder
geography_ohe_encoded = loaded_onehot_encoder_geography.transform([[input_data['Geography']]])

# Convert to DataFrame
geography_ohe_encoded_df = pd.DataFrame(geography_encoded, columns=loaded_onehot_encoder_geography.get_feature_names_out(['Geography']))

# OHE Encoded Geography DataFrame
# Output: Geography France = 1.0, Geography Germany = 0.0, Geography Spain = 0.0
geography_ohe_encoded_df



Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0
1,0.0,0.0,1.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,0.0,0.0,1.0
...,...,...,...
9995,1.0,0.0,0.0
9996,1.0,0.0,0.0
9997,1.0,0.0,0.0
9998,0.0,1.0,0.0


In [26]:
# Step 12.2) Convert "Gender" in input_data to numeric using LabelEncoder
gender_label_encoded = loaded_label_encoder_gender.transform([input_data['Gender']])

# Convert the Gender label encoded to DataFrame
gender_encoded_df = pd.DataFrame(gender_label_encoded, columns=['Gender'])

# Label Encoded Gender DataFrame
# Output: Gender: 1
gender_encoded_df

Unnamed: 0,Gender
0,1


In [27]:
# Step 12.3) Combine the encoded Geography and Gender columns back to the input_df and drop the original Geography and Gender
# Put the Gender encoded column back to input_df
input_df['Gender']=gender_encoded_df


# Drop the origional text Geography(having value France) column from input_df 
input_df = input_df.drop(columns=['Geography'], axis=1)


# Concatenate the input_df dataframe and geography numeric dataframe
input_df = pd.concat([input_df, geography_ohe_encoded_df], axis=1)

# Print
print(input_df.shape)

(10000, 12)


In [28]:
# Step 13) Scaling the input_df data using StandardScaler
input_df_scaled = loaded_scaler_xtrain_xtest.transform(input_df)

input_df_scaled

array([[-0.53598516,  0.91324755,  0.10479359, ...,  1.00150113,
        -0.57946723, -0.57638802],
       [        nan,         nan,         nan, ..., -0.99850112,
        -0.57946723,  1.73494238],
       [        nan,         nan,         nan, ...,  1.00150113,
        -0.57946723, -0.57638802],
       ...,
       [        nan,         nan,         nan, ...,  1.00150113,
        -0.57946723, -0.57638802],
       [        nan,         nan,         nan, ..., -0.99850112,
         1.72572313, -0.57638802],
       [        nan,         nan,         nan, ...,  1.00150113,
        -0.57946723, -0.57638802]])

In [29]:
# Step 14) Make Prediction using the trained model - ann_model
# Step 14.1) Predict the probability that the customer will exit the bank (Exit = 1) on the input data.
predicted_value = ann_model.predict(input_df_scaled)
print('predicted_value = ',predicted_value)

prediction_probability = predicted_value[0][0]
print('predicted_value probability = ',prediction_probability) # 0.035 = 3.5% probability


[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 720us/step
predicted_value =  [[0.0217715]
 [      nan]
 [      nan]
 ...
 [      nan]
 [      nan]
 [      nan]]
predicted_value probability =  0.021771498


In [30]:
# Step 14.2) Apply customer messaging based on the output
if prediction_probability > 0.5:
    print("The customer will exit the bank")
else:
    print("The customer is not likely to exit the bank")

The customer is not likely to exit the bank
