# Artificial Neural Network

### Importing the libraries

In [158]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [159]:
tf.__version__

'2.20.0'

## Part 1 - Data Preprocessing

### Importing the dataset

In [160]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values # write down here why we are using 3:-1 why these columns
y = dataset.iloc[:, -1].values

"""
So, the first 3 columns: row number, customer id and username are irrelevant to the prediction model as they have no inherit structure
that the model can learn. 

The first colon (:) indicates that keep all rows - we don't need to delete any rows.
The 3:-1 indicates keep all columns except the columns 0, 1, 2.
"""

print ( dataset.columns )

Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')


In [161]:
print(X)
print ( X.shape )

print ( "Geography unique:", np.unique ( X[:, 1] ) )


[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]
(10000, 10)
Geography unique: ['France' 'Germany' 'Spain']


In [162]:
print(y)

[1 0 1 ... 1 1 0]


### Encoding categorical data

Label Encoding the "Gender" column

In [163]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

X[:, 2] = le.fit_transform(X[:, 2])

In [164]:
print(X)
print ( X.shape ) # Keeping track of shape after encodings!!

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]
(10000, 10)


One Hot Encoding the "Geography" column

In [165]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer ( transformers = [ ( 'encoder', OneHotEncoder ( drop = 'first' ),  [ 1 ] ) ], remainder = 'passthrough' )

X = ct.fit_transform(X)

In [166]:
print(X)
print ( X.shape )

[[0.0 0.0 619 ... 1 1 101348.88]
 [0.0 1.0 608 ... 0 1 112542.58]
 [0.0 0.0 502 ... 1 0 113931.57]
 ...
 [0.0 0.0 709 ... 0 1 42085.58]
 [1.0 0.0 772 ... 1 0 92888.52]
 [0.0 0.0 792 ... 1 0 38190.78]]
(10000, 11)


### Splitting the dataset into the Training set and Test set

In [167]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split ( X, y, test_size = 0.2, random_state  = 42 )

### Feature Scaling

In [168]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

x_train = sc.fit_transform ( x_train )
x_test = sc.fit_transform ( x_test )

print ( x_train.shape )

(8000, 11)


## Part 2 - Building the ANN

### Initializing the ANN

In [169]:
from tensorflow.keras import Sequential 
from tensorflow.keras.layers import Dense

ann = tf.keras.Sequential ( [ 
      tf.keras.layers.InputLayer ( input_shape = ( 11, ), activation = 'relu' ),
      tf.keras.layers.Dense ( 20, activation = 'relu', kernel_initializer = 'glorot_uniform' ),
      tf.keras.layers.Dense ( 20, activation = 'relu', kernel_initializer = 'glorot_uniform' ),
      tf.keras.layers.Dense ( 1, activation = 'sigmoid' )
] )

print ( ann.summary() )



None


## Part 3 - Training the ANN

### Compiling the ANN

In [170]:
ann.compile ( optimizer = 'Adam', loss = 'binary_crossentropy' )

### Training the ANN on the Training set

In [171]:
ann.fit ( x_train ,y_train, batch_size = 32, epochs = 100, verbose = 2, validation_split = 0.2 )

Epoch 1/100
200/200 - 2s - 9ms/step - loss: 0.4919 - val_loss: 0.4309
Epoch 2/100
200/200 - 0s - 2ms/step - loss: 0.4300 - val_loss: 0.4158
Epoch 3/100
200/200 - 0s - 2ms/step - loss: 0.4148 - val_loss: 0.4033
Epoch 4/100
200/200 - 0s - 2ms/step - loss: 0.3999 - val_loss: 0.3916
Epoch 5/100
200/200 - 0s - 2ms/step - loss: 0.3849 - val_loss: 0.3799
Epoch 6/100
200/200 - 0s - 2ms/step - loss: 0.3711 - val_loss: 0.3713
Epoch 7/100
200/200 - 0s - 2ms/step - loss: 0.3613 - val_loss: 0.3652
Epoch 8/100
200/200 - 0s - 2ms/step - loss: 0.3534 - val_loss: 0.3628
Epoch 9/100
200/200 - 0s - 2ms/step - loss: 0.3481 - val_loss: 0.3612
Epoch 10/100
200/200 - 0s - 2ms/step - loss: 0.3444 - val_loss: 0.3566
Epoch 11/100
200/200 - 0s - 2ms/step - loss: 0.3414 - val_loss: 0.3540
Epoch 12/100
200/200 - 0s - 2ms/step - loss: 0.3388 - val_loss: 0.3560
Epoch 13/100
200/200 - 0s - 2ms/step - loss: 0.3371 - val_loss: 0.3530
Epoch 14/100
200/200 - 0s - 2ms/step - loss: 0.3365 - val_loss: 0.3509
Epoch 15/100
20

<keras.src.callbacks.history.History at 0x75cdf823b9a0>

## Part 4 - Making the predictions and evaluating the model

### Predicting the result of a single observation

**Extra**

Use our ANN model to predict if the customer with the following informations will leave the bank: 

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card ? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer ?

**Solution**

In [172]:
import numpy as np

row_in = np.array ( [[600,'France','Male',40,3,60000,2,1,1,50000]] )

row_in[:, 2] = le.fit_transform ( row_in[:, 2] )
row_in = ct.transform ( row_in )
row_in = sc.transform ( row_in )

print ( "So, should we say goodbye to that customer ? : ", float ( ann.predict ( row_in ) ) <= 0.5 )

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step
So, should we say goodbye to that customer ? :  True


  print ( "So, should we say goodbye to that customer ? : ", float ( ann.predict ( row_in ) ) <= 0.5 )



### Predicting the Test set results

### Making the Confusion Matrix

In [173]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

y_pred = ann.predict ( x_test )
y_pred = ( y_pred > 0.5 )

print ( "Test Accuracy:", accuracy_score ( y_test, y_pred ) )

con_matr = confusion_matrix ( y_test, y_pred )
print ( "Confusion matrix:\n", con_matr )

print ( "Classification Report:\n", classification_report ( y_test, y_pred ) )

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Test Accuracy: 0.852
Confusion matrix:
 [[1525   82]
 [ 214  179]]
Classification Report:
               precision    recall  f1-score   support

           0       0.88      0.95      0.91      1607
           1       0.69      0.46      0.55       393

    accuracy                           0.85      2000
   macro avg       0.78      0.70      0.73      2000
weighted avg       0.84      0.85      0.84      2000



write down about precision recall f1-score, why is it better than just accuracy, what are some other interesting metrics u can find

Precision = TP / ( TP + FP )
TP - True positive, FP - False positive
Precision basically tells us out of all +ves predicted, how many were actually positive. So a high precision means less false positives.

Recall = TP / ( TP + FN )
FN - False negative
Recall tells us, out of all actual positive test cases how many the model correctly identified. High recall => Less false negatives.

F1 score = harmonic mean ( Precision, Recall ). Its like a balance for pr:ecision and recall. A model cannot cheat by having just high recall/precision.

The problem with accuracy is that if the test cases are highly imbalanced towards one class, say 80% is zero class. Then a model that predicts everything as zero will have 80% accurac, which is not a good metric.

There is also a metric called specificity - how well a model can identify negatives.
specificity = TN / ( TN + FP )
TN - true negative