# Artificial Neural Networks for Binary Classification

This notebook will walk you through building an ANN to predict whether customers will churn (indicated by the column `Exited`) in a banking scenario. We will use a dataset of bank customers to train our model, perform binary classification, and evaluate its performance.

## Columns Description
- **RowNumber**: The index of rows in the dataset, starting at 1.
- **CustomerId**: A unique identifier for each customer.
- **Surname**: Last name of the customer.
- **CreditScore**: The credit score, a numerical expression based on a level analysis of a person's credit files.
- **Geography**: The country where the customer resides.
- **Gender**: The gender of the customer.
- **Age**: The age of the customer.
- **Tenure**: The number of years the customer has been with the bank.
- **Balance**: The current balance in the customer's account.
- **NumOfProducts**: Number of products that the customer has purchased from the bank.
- **HasCrCard**: Indicates if the customer possesses a credit card (1 for yes, 0 for no).
- **IsActiveMember**: Indicates if the customer is an active member (1 for yes, 0 for no).
- **EstimatedSalary**: The estimated annual salary of the customer.
- **Exited**: Indicates if the customer has exited/left the bank (1 for yes, 0 for no).

## Dataset Card
- **Source**: [Kaggle](https://www.kaggle.com/datasets/adammaus/predicting-churn-for-bank-customers).
- **Total Entries**: 10,000
- **Total Columns**: 14
- **Variables of Interest**:
  - `Exited`: Target variable indicating whether the customer left the bank (1: Exited, 0: Not Exited).
  - Features include customer demographics, balance information, account details, etc.

## Preprocessing Steps

1. **Dropping irrelevant columns**: Columns such as 'RowNumber', 'CustomerId', and 'Surname' do not contribute to the model's predictive power and are removed.
2. **Encoding categorical variables**: Use OneHotEncoder for categorical variables like 'Geography' and LabelEncoder for binary categories such as 'Gender'.
3. **Data Splitting**: Split the data into training and test sets to evaluate the model's performance on unseen data.

## Model Architecture

- **Input Layer**: 12.
- **Hidden Layers**: 1 layer with ReLU activation functions to introduce non-linearity. With 8 neurals.
- **Output Layer**: Single neuron with a sigmoid activation function for binary classification.

## Training

- **Optimizer**: Adam
- **Loss Function**: Binary cross-entropy
- **Metrics**: Accuracy
- **Epochs**: 100
- **Batch Size**: 10
- **Validation Split**: 20% of the training data

## Evaluation

- **Accuracy**: This metric will help further assess the model's performance.

# Import libraries

In [1]:
!pip install tensorflow



In [2]:
# !pip install tensorflow
# !pip install pandas
# !pip install numpy
# !pip install sklearn
# !pip install matplotlib seaborn
# !pip install tensorflow.python

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Load Data and EDA

In [4]:
df = pd.read_csv("C:\\Users\\abo_O\\OneDrive\\سطح المكتب\\Tuwaiq Academy\\Tuwaiq_Academy_T5_Week_4\\Week_4_LAB_2\\Datasets\\Churn_Modelling.csv")

In [5]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [7]:
df.describe()

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


In [8]:
df.duplicated().sum()

0

In [9]:
df.isna().sum()

RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

## Preprocessing

### Drop unnecessary columns

In [10]:
df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1, inplace=True)

### One-hot-encoder and Label-encoder

In [11]:
encoding_columns = ['Geography',]

# Initialize OneHotEncoder
ohe_encoder = OneHotEncoder(sparse_output=False)

# Apply one-hot encoding to the categorical columns
one_hot_encoded = ohe_encoder.fit_transform(df[encoding_columns])

# Create a DataFrame with the one-hot encoded columns
# We use get_feature_names_out() to get the column names for the encoded data
one_hot_df = pd.DataFrame(one_hot_encoded, columns=ohe_encoder.get_feature_names_out(encoding_columns))

# Concatenate the one-hot encoded dataframe with the original dataframe
df_encoded = pd.concat([df, one_hot_df], axis=1)

# Drop the original categorical columns
df_encoded = df_encoded.drop(encoding_columns, axis=1)

df = df_encoded

In [12]:
# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Encode labels in column 'Gender'. 
df['Gender']= label_encoder.fit_transform(df['Gender']) 

## Selecting Features && Train Test Split

### Selecting Features (Target is `Exited`)

In [13]:
# Selecting the features and the target
X = df.drop('Exited', axis=1)
y = df['Exited']

### Train Test Split

In [14]:
# Splitting the dataset into the Training set and Test set, random_state=42 for reproducibility, test_size=0.2 means 20% of the dataset will be used for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Modeling

### Creating the sequential model

In [15]:
# Define the model
model = Sequential()

### Adding the input layer

In [16]:
X_train.shape[1]

12

In [17]:
# Adding the input layer, with 6 neurons, and the ReLU activation function
# The input_dim parameter should be equal to the number of features in the dataset
model.add(Dense(units=6, input_dim=X_train.shape[1], kernel_initializer='uniform',activation='relu'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Adding the hidden layers

In [18]:
# Adding the hidden layer, with 6 neurons, and the ReLU activation function
# Dense: This is the layer type. Dense is a standard layer type that works for most cases.
# Uits: number of neurons in the layer
# Activation: Activation function to use
model.add(Dense(units=6, activation='relu'))
model.add(Dense(units=6, activation='relu'))

### Adding the output layer

In [19]:
# Adding the output layer
# Units: number of neurons in the layer, in this case 1 as we are predicting a single output (true or false)
# Activation: Activation function to use
# Sigmoid is used for binary classification problems. It squashes the output between 0 and 1, which makes it easy to interpret.
model.add(Dense(units=1, activation='sigmoid'))

### Compiling the model

In [20]:

# Compiling the ANN
# Optimier: Adam, because it's a good default choice
# Loss function: binary_crossentropy, because it's a binary classification problem
# Metrics: accuracy, because it's a classification problem
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

### Display model summary

In [21]:
model.summary()

### Fitting the model

In [22]:
X_train.shape, y_train.shape

((8000, 12), (8000,))

In [23]:
# batch_size: Number of samples per gradient update
# epochs: Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided
# validation_split: Fraction of the training data to be used as validation data
model.fit(X_train, y_train, batch_size=10, epochs=100, validation_split=0.2)

Epoch 1/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.7166 - loss: 3.6427 - val_accuracy: 0.7987 - val_loss: 0.5737
Epoch 2/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.7992 - loss: 0.5621 - val_accuracy: 0.7987 - val_loss: 0.5257
Epoch 3/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8025 - loss: 0.5169 - val_accuracy: 0.7987 - val_loss: 0.5092
Epoch 4/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8006 - loss: 0.5057 - val_accuracy: 0.7987 - val_loss: 0.5045
Epoch 5/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.7858 - loss: 0.5197 - val_accuracy: 0.7987 - val_loss: 0.5027
Epoch 6/100
[1m640/640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.7890 - loss: 0.5151 - val_accuracy: 0.7987 - val_loss: 0.5020
Epoch 7/100
[1m640/64

<keras.src.callbacks.history.History at 0x1de88adb190>

## Evaluating the model

In [24]:
# Evaluate the model
model.evaluate(X_test, y_test)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8026 - loss: 0.4971


[0.49585214257240295, 0.8034999966621399]