# Artificial Neural Network

### Importing the libraries

In [5]:
# Importing the pandas library for data manipulation and analysis
import pandas as pd

# Importing the numpy library for numerical operations and array handling
import numpy as np

# Importing the tensorflow library for building and training deep learning models
import tensorflow as tf


In [6]:
# Checking the version of TensorFlow
print(tf.__version__)

2.17.0


## Part 1 - Data Preprocessing

### Importing the dataset

In [7]:
# Load the dataset from a CSV file into a pandas DataFrame
dataset = pd.read_csv('Churn_Modelling.csv')

# Extracting the feature matrix X (all rows, columns 3 to the second last column)
X = dataset.iloc[:, 3:-1].values

# Extracting the target variable y (all rows, last column)
y = dataset.iloc[:, -1].values


In [8]:
# Print the feature matrix X to check the data
print(X)


[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [9]:
# Print the target variable y to check the data
print(y)


[1 0 1 ... 1 1 0]


### Encoding categorical data

Label Encoding the "Gender" column

In [10]:
# Import the LabelEncoder from the sklearn.preprocessing module
from sklearn.preprocessing import LabelEncoder

# Create an instance of LabelEncoder
le = LabelEncoder()

# Apply the LabelEncoder to a specific column of the feature matrix X
# Here, it is assumed that the column at index 2 is categorical, such as 'Gender'
X[:, 2] = le.fit_transform(X[:, 2])

In [11]:
# Print the feature matrix X to check the data after encoding
print(X)


[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


One Hot Encoding the "Geography" column

In [12]:
# Import necessary modules for preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder


# Create a ColumnTransformer object
# This transformer will apply OneHotEncoder to the column at index 1
# and leave the rest of the columns unchanged
ct = ColumnTransformer(
    transformers=[('encoder', OneHotEncoder(), [1])],  # Apply OneHotEncoder to column index 1
    remainder='passthrough'  # Leave the other columns unchanged
)

# Fit and transform the feature matrix X using the ColumnTransformer
# The transformed X is converted to a NumPy array
X = np.array(ct.fit_transform(X))


In [13]:
# Print the feature matrix X after one-hot encoding
print(X)


[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


### Splitting the dataset into the Training set and Test set

In [14]:
# Import the train_test_split function from sklearn.model_selection
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
# 20% of the data will be used for testing, and the remaining 80% will be used for training
# random_state is set to ensure reproducibility of the split
X_train, X_test, y_train, y_test = train_test_split(
    X,           # Feature matrix
    y,           # Target variable
    test_size=0.2,  # Proportion of the data to include in the test split (20%)
    random_state=0  # Seed for the random number generator to ensure reproducibility
)



### Feature Scaling

In [15]:
# Import the StandardScaler from sklearn.preprocessing
from sklearn.preprocessing import StandardScaler

# Create an instance of StandardScaler
sc = StandardScaler()

# Fit the scaler on the training data and transform the training features
# This step standardizes the features in X_train by removing the mean and scaling to unit variance
X_train = sc.fit_transform(X_train)

# Transform the test data using the same scaler (fitted on the training data)
# This ensures that the test data is scaled in the same way as the training data
X_test = sc.transform(X_test)


## Part 2 - Building the ANN

### Initializing the ANN

In [16]:
'''
tensorflow: TensorFlow is an open-source machine learning library. 
The tf.keras submodule provides a high-level API for building and training deep learning models.

Sequential Model: The Sequential class in Keras represents a linear stack of layers. It allows you to build a model layer by layer, 
where each layer has weights that are updated during training.

Initialization: When you initialize Sequential(), you get an empty model that you can add layers to. 
It’s called "sequential" because the data flows through the network in a sequential manner from one layer to the next.

'''

# Initialize a Sequential model
ann = tf.keras.models.Sequential()



### Adding the input layer and the first hidden layer


In [17]:
'''
units=6: Specifies that this layer will have 6 neurons. Each neuron in this layer will be connected to every neuron in the previous layer.
activation='relu': Specifies the activation function for this layer. 

'''
# Adding a Dense layer with 6 neurons and ReLU activation function to the model
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))


### Adding the second hidden layer

In [18]:
# Adding a Dense layer with 6 neurons and ReLU activation function to the model
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the output layer

In [19]:
# Add the output layer with 1 neuron and sigmoid activation function
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))


## Part 3 - Training the ANN

### Compiling the ANN

Optimizer: The optimizer is responsible for updating the weights of the network during training based on the loss function.

Adam: The Adam optimizer is a popular choice because it combines the advantages of two other extensions of stochastic gradient descent:
AdaGrad and RMSProp. 
It adjusts the learning rate for each weight individually and maintains an exponentially decaying average of past gradients.

loss='binary_crossentropy':

Loss Function: The loss function measures how well the model’s predictions match the true labels. 
It is used by the optimizer to update the model’s weights.
Binary Crossentropy: This loss function is used for binary classification problems. 


metrics=['accuracy']:

Metrics: Metrics are used to evaluate the performance of the model during training and testing. Unlike the loss function, metrics are not used for updating the model weights but are used to measure the model’s performance.

Accuracy: This metric calculates the proportion of correctly classified samples. For binary classification, accuracy is the ratio of correctly predicted samples to the total number of samples.

In [20]:
# Compile the model
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


### Training the ANN on the Training set

###
X_train:
Training Features: The input data used to train the model. Each sample in X_train corresponds to a feature vector.

y_train:
Training Labels: The target values (or ground truth) for each sample in X_train. For binary classification, these will be 0 or 1.


batch_size=32:
Batch Size: The number of samples processed before the model’s internal parameters are updated. 
A batch size of 32 means the model will be updated after processing 32 samples. 
Smaller batch sizes can provide more frequent updates, while larger batch sizes might lead to more stable updates 
but can require more memory.

Effect on Training: The choice of batch size can impact the model's training performance and convergence. 
Experimenting with different batch sizes might be necessary to find the optimal value for your specific problem.
epochs=100:

Epochs: The number of times the entire training dataset is passed through the model. 
With epochs=100, the model will go through the entire dataset 100 times during training.

Effect on Training: More epochs allow the model to learn better, but too many epochs can lead to overfitting. 
It’s crucial to monitor the performance on a validation set to ensure the model generalizes well.
'''

In [21]:
# Train the model
ann.fit(X_train, y_train, batch_size=32, epochs=100)

Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.6097 - loss: 0.6774  
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 890us/step - accuracy: 0.7959 - loss: 0.4702
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 999us/step - accuracy: 0.8026 - loss: 0.4395
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8122 - loss: 0.4250
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8148 - loss: 0.4203
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8317 - loss: 0.3925
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 982us/step - accuracy: 0.8397 - loss: 0.3824
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8485 - loss: 0.3691
Epoch 9/100
[1m250/250

<keras.src.callbacks.history.History at 0x23905c53ce0>

## Part 4 - Making the predictions and evaluating the model

### Predicting the result of a single observation

**Use our ANN model to predict if the customer with the following informations will leave the bank: **

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card ? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer ?

**Solution**

In [22]:
# Make a prediction on a new sample
prediction = ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]))

# Convert the prediction to a binary class based on a threshold of 0.5
binary_prediction = prediction > 0.5

# Print the binary prediction
print(binary_prediction)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 117ms/step
[[False]]


Therefore, our ANN model predicts that this customer stays in the bank!

**Important note 1:** Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2:** Notice also that the "France" country was not input as a string in the last column but as "1, 0, 0" in the first three columns. That's because of course the predict method expects the one-hot-encoded values of the state, and as we see in the first row of the matrix of features X, "France" was encoded as "1, 0, 0". And be careful to include these values in the first three columns, because the dummy variables are always created in the first columns.

### Predicting the Test set results

In [23]:
# Predict the probabilities for the test set
y_pred = ann.predict(X_test)

# Convert the probabilities to binary predictions based on a threshold of 0.5
y_pred = (y_pred > 0.5)

# Concatenate the predicted labels and actual labels side by side for comparison
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 883us/step
[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


### Making the Confusion Matrix

In [24]:
from sklearn.metrics import confusion_matrix, accuracy_score

# Compute the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

# Compute the accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)


[[1505   90]
 [ 194  211]]
0.858
