# Artificial Neural Networks in Python using Tensorflow

In this program we construct an Artificial Neuron Network model. The aim is to build a classification model to predict if a certain customer will leave the bank services in the six months.

**Dataset Description**

For this problem we have a Dataset composed by 10000 instances (rows) and 14 features (columns). As the objective is to predict the probability of a client will leave the bank service, in our dataset the last column corresponds to the response. The others columns are the features (independent variables) that we consider to build and training the model, these columns are: RowNumber, CustomerId, Surname, CreditScore, Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary, Exited.

# Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf # We utilize it to construct our ANN
tf.__version__

'2.3.1'

# Importing Dataset
To train this model we  consider a data set composed by many client information (feature matrix) as  row number, customer ID, surname, credit score, geography, gender, age, tenure, balance, number of bank product, credit card, active member, estimated salary. The output is the dependent variable, that is EXIT represented by 0 to stay and 1 to leave. For this model won’t consider the three first columns (row number, customer ID, surname) because they are not relevant to the model.

In [2]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3 : -1].values # We take just the relevant features
y = dataset.iloc[:, -1].values
print(X)
print(y)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]
[1 0 1 ... 1 1 0]


# Encoding Categorical Data


In [3]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2]) # Gender will be transformed to 0 (male) and 1 (female)
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


# One Hot Econding

In [4]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough') # Here we change the geography into dummy variables
X = np.array(ct.fit_transform(X))
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


# Splitting the Dataset into train and test set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
In almost ANN models we must to apply feature scale.

In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Building the ANN
To build an ANN we consider a certain number of neuron as input parameters. Trying to copy a brain, these neurons will be connected with others neurons. These neurons are hidden in a layer and the communication between the input neurons and the hidden neurons is made by an activation function. Each neuron input has a weight associated. After the first communication between the input neurons and the neurons in the first hidden layer, the communication will be between the first and the second hidden layer. And then, the second hidden layer communicates the output layer. In the output layer. This process will be repeated many times until to find the best loss function that minimize the discrepancy between the real results and the simulated results. The stochastic gradient descent is a good way to try to find a minimal in the loss function. Cross entropy is a good way to calculate the loss function.

## Initializing the ANN
The first step is to create an object to build a sequence layer. This sequence layer takes account the input layer (the parameters that we initialize the ANN), the hidden layers (Neurons) and output layer. To do it we utilize the modulus Tensor Flow (version 2.0 or high) that allow us to call the Keras modulus.

In [7]:
ann = tf.keras.models.Sequential()

## Input Layer, first hidden layer
In this part we create the input layer, it's means all independent variables (considered as input neurons), and then we create a first hidden layer with a number of neuron associate. When the input values go to the first hidden layer we must to choose an activation function, for this case we choose rectifier function (the weight of each input parameter is choose by the ann object). The number of neuron must be chosen by experimentation, there is no a general rule to do it. These layers are fully connected.

In [8]:
ann.add(tf.keras.layers.Dense(units=6,  activation='relu'))

## Second hidden layer
Here we put the second hidden layer. We can put how much we want, but for this problem we are going to consider just two hidden layers. The second hidden layer is the same to the first. But we can change it according with the problem.

In [9]:
tf.keras.layers.Dropout(0.2)
ann.add(tf.keras.layers.Dense(units=6,  activation='relu'))

## Output layer
The final step is to create an output layer, We use the same object, but we make some changes in the parameters. Like this problem has a binary response yes or no, the number of neuron corresponds to 1, but if the responses gives more then two results (0, 1, 2, for exemple), we must to put the correspondent number of output as number of neuron. The second change is on the activation function, here we consider sigmoid activation function, for one simple reason, this gives us the probability of a customer leave or not the bank service. Here we constructed our artificial brain, now we must to train it.

In [10]:
tf.keras.layers.Dropout(0.2)
ann.add(tf.keras.layers.Dense(units=1,  activation='sigmoid'))

# Training the ANN

## Compiling the ANN
Compile the ANN is one of most important step. We select a method to optimize our ANN, stochastic gradient descent, represented by adam. The lost function is also very import, because from this function we are able to verify the accuracy, precision and other relevant parameters. The lost function that we are going to choose is Binary Cross Entropy (due to have a binary response). Finally the metric, we choose accuracy to verify our model results

In [11]:
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics='accuracy')

## Training the ANN
Now we train the ANN. Here we have two important hype parameters. Batch size determines how many results will be compared with real results. Epochs makes a certain number of repetition, in each repetition the wight of each input is changed to try a better accuracy.

In [12]:
ann.fit(X_train, y_train, batch_size= 32, epochs= 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f1f5801bfd0>

# Making a single prediction
Here we make a simple prediction. Remember, Geography and Gender was changed. We must take it account.

In [13]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

[[False]]


# Predicting the test results

In [14]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5) # Here we must put it, because we have as outcome the probability, but we want a binary response.
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


# Making the confusion matrix and accuracy score

In [15]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[1522   73]
 [ 203  202]]


0.862

In [17]:
score_train = ann.evaluate(X_train, y_train)
score_test = ann.evaluate(X_test, y_test)



In [18]:
print(score_train)

[0.3296877145767212, 0.8669999837875366]


In [19]:
print(score_test)

[0.3322369456291199, 0.8619999885559082]


# Conclusion
In this program we showed a simple example of an ANN. Two important things to think are choose the number of hidden layer and the number of neuron in each hidden layer. The results are satisfactory with a good accuracy. This program can be helpfull tool for a bank in the decision about client polices.