# ASSIGNMENT NO. 3

Given a bank customer, build a neural
network-based classifier that can determine whether they will leave or not in the next 6 months.

Dataset Description: The case study
is from an open-source dataset from Kaggle. The
dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender,
Age, Tenure, Balance, etc.

Link to the Kaggle
project: https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

Perform following steps:

a.    Read the dataset.

b.  Distinguish the feature and
target set and divide
the data set into training and test sets.

c.   Normalize the train
and test data.

d.  Initialize and build the model. Identify the points
of improvement and
implement the same.

Print the accuracy score and
confusion matrix (5 points).

## 1. Read the Dataset
First, we’ll import the required libraries and load the dataset from Kaggle.

In [5]:
import pandas as pd

# Load dataset
data = pd.read_csv("Churn_Modelling.csv")

# Display the first few rows of the dataset
data.head()


Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## 2. Distinguish the Feature and Target Set
We will distinguish between the feature set (independent variables) and the target set (dependent variable: Exited, indicating churn).

In [6]:
# Drop CustomerId, Surname, and RowNumber as they are not useful for prediction
X = data.drop(['CustomerId', 'Surname', 'RowNumber', 'Exited'], axis=1)
y = data['Exited']


## 3. Convert Categorical Features
Since the dataset contains categorical features like Geography and Gender, we need to one-hot encode these columns.

In [7]:
# One-hot encoding of categorical variables
X = pd.get_dummies(X, columns=['Geography', 'Gender'], drop_first=True)

## 4. Split Data into Training and Test Sets
We split the dataset into training and testing sets to train the model and evaluate its performance.

In [8]:
from sklearn.model_selection import train_test_split

# Split the data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 5. Normalize the Train and Test Data
Normalize the features to ensure all features contribute equally to the model.

In [9]:
from sklearn.preprocessing import StandardScaler

# Normalize the feature data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## 6. Initialize and Build the Neural Network Model
We'll use TensorFlow and Keras to build a basic neural network model.

In [10]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Initialize the neural network model
model = Sequential()

# Add input layer and first hidden layer
model.add(Dense(16, input_dim=X_train.shape[1], activation='relu'))

# Add second hidden layer
model.add(Dense(8, activation='relu'))

# Add output layer (binary classification)
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x2115d252290>

## 7. Evaluate the Model
We will now evaluate the model on the test set and print the accuracy score and confusion matrix.

In [11]:
from sklearn.metrics import accuracy_score, confusion_matrix

# Predict on the test set
y_pred = (model.predict(X_test) > 0.5).astype("int32")

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)


Accuracy: 0.8595
Confusion Matrix:
 [[1533   74]
 [ 207  186]]


## 8. Points of Improvement

To improve the model, we can:

- Tune hyperparameters (such as learning rate, number of neurons, and epochs).
- Use techniques like dropout to reduce overfitting.
- Experiment with different optimizers or architectures (e.g., adding more layers or units).

In [12]:
from tensorflow.keras.layers import Dropout

# Revised model with Dropout
model = Sequential()
model.add(Dense(16, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.3))  # Adding dropout layer
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))

# Compile and train the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=32)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x2115d4ec520>