# ANN Practical Implementation **(Churn Prediction)**

 
In the previous notebook, we learned about **Vanishing Gradient, Dropout, Optimizers, and Loss Functions**.  

Today, we will **apply those concepts in practice** using an **Artificial Neural Network (ANN)** to predict customer churn.  
Dataset: `Churn_Modelling.csv`  

#  Importing Libraries


In [1]:
# Artificial Neural Network

# Importing the libraries
import numpy as np
import pandas as pd
import tensorflow as tf

print("TensorFlow Version:", tf.__version__)

TensorFlow Version: 2.20.0-rc0


# Data Preprocessing

We will:

1. Import dataset
2. Separate features (X) and target (y)
3. Encode categorical variables (Gender, Geography)
4. Perform Feature Scaling
5. Split into training & test sets

In [4]:
# Import dataset
dataset = pd.read_csv(r'C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Class work\Deep_lerning\Day2\Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

print("X Sample:\n", X[:3])
print("y Sample:\n", y[:10])

X Sample:
 [[619 'France' 'Female' 42 2 0.0 1 1 1 101348.88]
 [608 'Spain' 'Female' 41 1 83807.86 1 0 1 112542.58]
 [502 'France' 'Female' 42 8 159660.8 3 1 0 113931.57]]
y Sample:
 [1 0 1 0 0 1 0 1 0 0]


##  Encoding Categorical Data

- **Label Encoding** for Gender
- **OneHot Encoding** for Geography


In [5]:
from sklearn.preprocessing import LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

# Label Encoding Gender
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

# One Hot Encoding Geography
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

print("After Encoding:\n", X[:3])

After Encoding:
 [[1.0 0.0 0.0 619 0 42 2 0.0 1 1 1 101348.88]
 [0.0 0.0 1.0 608 0 41 1 83807.86 1 0 1 112542.58]
 [1.0 0.0 0.0 502 0 42 8 159660.8 3 1 0 113931.57]]


## Feature Scaling

ANNs converge faster with standardized inputs.


In [6]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X = sc.fit_transform(X)

print("After Scaling:\n", X[:3])

After Scaling:
 [[ 0.99720391 -0.57873591 -0.57380915 -0.32622142 -1.09598752  0.29351742
  -1.04175968 -1.22584767 -0.91158349  0.64609167  0.97024255  0.02188649]
 [-1.00280393 -0.57873591  1.74273971 -0.44003595 -1.09598752  0.19816383
  -1.38753759  0.11735002 -0.91158349 -1.54776799  0.97024255  0.21653375]
 [ 0.99720391 -0.57873591 -0.57380915 -1.53679418 -1.09598752  0.29351742
   1.03290776  1.33305335  2.52705662  0.64609167 -1.03067011  0.2406869 ]]


##  Splitting Dataset


In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

print("Train Shape:", X_train.shape)
print("Test Shape:", X_test.shape)

Train Shape: (8000, 12)
Test Shape: (2000, 12)


#  Building the ANN
We will start simple, then **add more layers** and compare performance.


In [8]:
# Initializing the ANN
ann = tf.keras.models.Sequential()

# Input + First Hidden Layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# Second Hidden Layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# Output Layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

#  Training the ANN

- Optimizer: **Adam**
- Loss Function: **Binary Crossentropy** (since this is binary classification)
- Metric: **Accuracy**


In [9]:
# Compile ANN
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Train ANN
history = ann.fit(X_train, y_train, batch_size = 32, epochs = 100, verbose=1)

Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.7159 - loss: 0.6086  
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7966 - loss: 0.4713  
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8035 - loss: 0.4367  
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8116 - loss: 0.4275  
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8146 - loss: 0.4242    
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8177 - loss: 0.4214  
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8191 - loss: 0.4190  
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8210 - loss: 0.4165  
Epoch 9/100
[

#  Model Evaluation

We will predict on the **Test set** and evaluate using:

- Predictions
- Confusion Matrix
- Accuracy


In [10]:
# Predicting the Test set results
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)

# Compare predictions vs actual
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)[:10])

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step  
[[0 0]
 [0 1]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [0 1]
 [1 1]]


In [11]:
# Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)

print("Confusion Matrix:\n", cm)
print("Accuracy:", acc)

Confusion Matrix:
 [[1492  103]
 [ 186  219]]
Accuracy: 0.8555


#  Experimenting with More Layers
We now add more 2 hidden layers and compare performance.


In [12]:
# Build deeper ANN
ann_deep = tf.keras.models.Sequential()

# Input + 3 Hidden Layers
ann_deep.add(tf.keras.layers.Dense(units=8, activation='relu'))
ann_deep.add(tf.keras.layers.Dense(units=8, activation='relu'))
ann_deep.add(tf.keras.layers.Dense(units=8, activation='relu'))

# Output Layer
ann_deep.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compile
ann_deep.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Train
history_deep = ann_deep.fit(X_train, y_train, batch_size = 32, epochs = 100, verbose=1)


Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.6799 - loss: 0.6088
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7980 - loss: 0.4511  
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8023 - loss: 0.4330 
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8050 - loss: 0.4224 
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8201 - loss: 0.4063
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8381 - loss: 0.3834  
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8482 - loss: 0.3664  
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8503 - loss: 0.3574  
Epoch 9/100
[1m250/25

In [13]:
# Evaluate deeper model
y_pred_deep = ann_deep.predict(X_test)
y_pred_deep = (y_pred_deep > 0.5)

cm_deep = confusion_matrix(y_test, y_pred_deep)
acc_deep = accuracy_score(y_test, y_pred_deep)

print("Confusion Matrix (Deeper ANN):\n", cm_deep)
print("Accuracy (Deeper ANN):", acc_deep)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step  
Confusion Matrix (Deeper ANN):
 [[1525   70]
 [ 198  207]]
Accuracy (Deeper ANN): 0.866


We now add 3 hidden layers and compare performance.

In [15]:
# Initializing the ANN
ann = tf.keras.models.Sequential()

# Input + Hidden Layers
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# Output Layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compile and Train
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
ann.fit(X_train, y_train, batch_size=32, epochs=100)

# Evaluate
y_pred = (ann.predict(X_test) > 0.5)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)

print("Confusion Matrix (3 Hidden Layers):")
print(cm)
print("Accuracy (3 Hidden Layers):", acc)


Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.7691 - loss: 0.5501  
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7962 - loss: 0.4785  
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7989 - loss: 0.4537  
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8030 - loss: 0.4381  
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8096 - loss: 0.4277  
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8161 - loss: 0.4196  
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8209 - loss: 0.4125
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8265 - loss: 0.4064  
Epoch 9/100
[1m25

#  Comparison of Models

We experimented with **different ANN architectures** (1, 2, and 3 hidden layers) and compared their performance.



##  Results Summary

| ANN Architecture   | Confusion Matrix                | Accuracy |
|--------------------|---------------------------------|----------|
| **1 Hidden Layer** | [[1492  103] <br> [ 186  219]] | **0.8555** |
| **2 Hidden Layers**| [[1525   70] <br> [ 198  207]] | **0.8660** |
| **3 Hidden Layers**| [[1526   69] <br> [ 210  195]] | **0.8605** |


##  Interpretation

- **Hidden Layer:**  
  - Accuracy: ~85.5%  
  - Performs reasonably well, but leaves some misclassifications.  

- **Hidden Layers:**  
  - Accuracy: ~86.6%  
  - Best overall performance in terms of accuracy.  
  - Fewer false positives compared to 1 hidden layer.  

- **Hidden Layers:**  
  - Accuracy: ~86.0%  
  - Slightly worse than 2 layers, indicating **adding more layers did not help**.  
  - More false negatives (customers leaving were predicted as staying).  


##  Conclusion

- Increasing from **1 → 2 hidden layers** improved performance.  
- Adding a **3rd hidden layer** did **not** improve accuracy — in fact, performance dropped slightly.  
- **Best Model:** ANN with **2 hidden layers**, giving **~86.6% accuracy**.  
-  More layers are not always better — they may cause **overfitting** or unnecessary complexity.  
- To further improve performance, we should explore:  
  - **Hyperparameter tuning** (units per layer, learning rate, batch size).  
  - **Regularization techniques** (Dropout, L2).  
  - **Feature engineering** (new features, removing noisy ones).  
