### Artificial Neural Network
<small>

- inspired by the biological neural networks

<br>


<b>Neural Networks </b> <br>

- Structure:
    - an input layer (takes raw data)
    - hidden layers (process data)
    - an output layer (produces results)
    <br>
    - neurons in each layer are connected by weights

    <br>

- Data Flow (Forward Propagation):
    - inputs are multiplied by weights and summed
    - the result passes through an activation function (e.g., ReLU, Sigmoid) to capture non-linear patterns
    - output passed to the next layer

    <br>

- Learning Process:
    - loss calculation --> measures prediction error
    - back propagation --> adjusts weights using errors
    - gradient descent --> updates weights to minimize error

    <br>

- key points:
    - activation functions handle non-linearity (e.g., ReLU)
    - Overfitting can be mitigated with regularization

    <br>

</small>

Importing the libraries

In [108]:
import numpy as np 
import pandas as pd 
import tensorflow as tf

#### Part 1 - Data Preprocessing

<small>

- check --> is there any missing value in dataset?

</small>

Importing the dataset

In [109]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values 
y = dataset.iloc[:, -1].values

In [110]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [111]:
print(y)

[1 0 1 ... 1 1 0]


Encoding categorical data

In [112]:
# Label Encoding the "Gender" column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [113]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


In [114]:
# One Hot Encoding the "Geography" Column
# one hot encoding --> dummy variables
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X))

In [115]:
# for one hot encoding --> dummy variables become 1st column in the matrix
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


Splitting the dataset into the Training set and Test set

In [116]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Feature Scaling

In [117]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#### Part 2 - Building the ANN

Initializing the ANN

In [118]:
ann = tf.keras.models.Sequential()

Adding the second hidden layer <br>

<small>

Most Commonly used activation function for hidden layer:
- ReLU (Rectified Linear Unit)

formula --> f(x) = max(0, x)

<br>

Dense: fully connected layers <br>
Units: no of neurons

<br>

<b>Hidden Layer: </b>

1. Tabular data: 1-2 hidden layers
2. Unstructured data (images, text): 3+ layers, possibily pre-trained models for better results

- Overfitting: Adding layers can lead to overfitting on small datasets. But larger datasets allow deeper networks without overfitting
- Consider computation cost and training time with more layers

</small>

In [119]:
# fully connected layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

Adding the output layer <br>

<small>

which activation function should I choose?
- it depends on problem type
1. regression --> output unit = 1 --> linear
2. binary classification --> unit = 1 --> sigmoid
3. multi-class classification --> unit = no of classes --> softmax
4. multi-label classification --> unit = no of labels --> sigmoid

</small>

In [120]:
# for probability of posibility (positive output) --> sigmoid
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

<b>Training the ANN</b>

Compiling the ANN <br>
<small>

1. <b>optimizer:</b> controls how the model updates its weights during training to minimize the loss function
    - adam --> Adaptive Moment Estimation
        - Adagrad (adapting learning rates) + RMSprop (momentum-based optimization)
        - adjusts the learning rate for each parameter dynamically based on the gradients

            <br>

2. <b>loss:</b> measures the difference between the predicted outputs of the model and actual target labels. The goal is to minimize the loss during training
    - binary cross-entropy --> used for labels are binary (e.g. 0 or 1)
    - it penalizes the model more for incorrect predictions that are confidently wrong

        <br>

3. <b>metrics:</b> used to evaluate the model's performance during training and testing
    - accuracy --> measures the percentage of correct predictions

</small>

In [121]:
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Training the ANN on the Training set

In [122]:
ann.fit(X_train, y_train, batch_size=32, epochs=100)

Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5222 - loss: 0.7601
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7830 - loss: 0.5079
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7959 - loss: 0.4651
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8052 - loss: 0.4438
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8059 - loss: 0.4384
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8136 - loss: 0.4187
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8213 - loss: 0.4002
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8246 - loss: 0.3947
Epoch 9/100
[1m250/250[0m [32

<keras.src.callbacks.history.History at 0x202c26486e0>

#### Part 3 - Making the predictions and evaluating the model

Predicting the result of a single observation <br>

<small>

Use our ANN to predict if the customer with the following informations will leave the bank: <br>

Geography: France <br>
Credit Score: 600 <br>
Gender: Male <br>
Age: 40 years old <br>
Tenure: 3 years <br>
Balance: $60000 <br>
Number of products: 2 <br>
Does this customer have a credit card? Yes <br>
Is this customer an Active Member: Yes <br>
Estimated Salary: $50000 <br>

So, should we say goodbye to that customer ?

</small>

In [123]:
x = [[600, 'France', 'Male', 40, 3, 60000, 2, 1, 1, 50000]]
x = np.array(x)

# Label Encoding the "Gender" column
x[:, 2] = le.transform(x[:, 2])

# One Hot Encoding the "Geography" Column
# one hot encoding --> dummy variables
x = np.array(ct.transform(x))

# feature scaling
x = sc.transform(x)

'''
We can do it manually also
x = [1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]
'''

'\nWe can do it manually also\nx = [1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]\n'

In [124]:
print(x)

[[ 0.98560362 -0.5698444  -0.57369368 -0.52111599  0.91601335  0.10961719
  -0.68538967 -0.2569057   0.8095029   0.64259497  0.9687384  -0.87203322]]


In [125]:
# True: we should say goodbye to that customer
# False: we will not say goodbye to that customer

print(result := ann.predict(x))
print(result > 0.5)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 158ms/step
[[0.02999749]]
[[False]]


Predicting the Test set results

In [126]:
y_pred = ann.predict(X_test)

# for i in range(len(y_pred)):
#     y_pred[i] = 1 if y_pred[i] > 0.5 else 0

y_pred = (y_pred > 0.5)

print(y_pred)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[[False]
 [False]
 [False]
 ...
 [False]
 [False]
 [False]]


In [127]:
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), axis=1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


Making the Confusion Matrix <br>
<small>
<pre>
cm = [Actual negative
      Acutal positive]

cm = [[TF, FP]
      [FN, TP]]

TN = True Negative
FP = False Positive
FN = False Negative
TP = True Positive

It shows the percentage!
</pre>
</small>

In [128]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)

# best classification model have most accuracy_score
accuracy_score(y_test, y_pred)

[[1510   85]
 [ 190  215]]


0.8625