# Part 1 ANN Binary decission
### NOTE: Part 1 uses mainly pandas and dataframes, which is good for exploratory analysis but not that good scikit-learn pre processing pipelines, if you wish to do more "real job application", then please refer to Part 2.

## 1) Importing the packages and data pre-processing

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler

In [2]:
df = pd.read_csv('Churn_Modelling.csv')

See the data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [4]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## 2) Adjust format and keep important information

In [5]:
df.drop('RowNumber', axis = 1, inplace= True)
df.drop('CustomerId', axis = 1, inplace= True)
df.drop('Surname', axis = 1, inplace= True)

genders = {'Male': 1, 'Female': 0}
df['Gender'] = df['Gender'].map(genders)

In [6]:
#Since there are only 3 possible countries, Spain, France, and Germany, we can do 2 things. Use a dictionary where Spain is closer to France than
#to Germany ... or just user OneHotEncoder and avoid assumptions. In this case I will go with the 2nd one.
geo_dummies = pd.get_dummies(df['Geography'], prefix='Geography')

df = pd.concat([df, geo_dummies], axis = 1)

df.drop('Geography', axis = 1, inplace = True)

In [7]:
df.head()

Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,0,42,2,0.0,1,1,1,101348.88,1,1,0,0
1,608,0,41,1,83807.86,1,0,1,112542.58,0,0,0,1
2,502,0,42,8,159660.8,3,1,0,113931.57,1,1,0,0
3,699,0,39,1,0.0,2,0,0,93826.63,0,1,0,0
4,850,0,43,2,125510.82,1,1,1,79084.1,0,0,0,1


## 3) Fit the data and variables that the model is going to use

In [8]:
target = df['Exited']
features = df.drop('Exited', axis = 1)

X_train, X_test, y_train, y_test = train_test_split(features, target, train_size=0.8, random_state=42)

scaler = StandardScaler() #We need to use a scaler in order to give this input to the NN.

X_train_scaled = scaler.fit_transform(X_train) #First we fit the scaling that is going to be made
X_test_scaled = scaler.transform(X_test) #Then we apply the transformation to the test set

## IMPORTANT. In order to avoid data leaking we need to fit with training set and then transform to the test one. Doing something like: 
# features_scaled = scaler.fit_transform(features) is wrong and it is going to lead to a data leaking problem.

## 4) Build the Neural Network

In [9]:
import tensorflow as tf

nn = tf.keras.models.Sequential() # Create the neural network, in this case with the Sequential model which  
#allows you to build neural network models layer by layer. It is useful when you have a linear architecture, that is, each layer has exactly one input and one output.

nn.add(tf.keras.layers.Dense(units= 12, activation='relu'))  # Add first hidden layer. The number of neurons may vary when building a NN. However, I personally, 
#like to start with the same amount of columns of "features"

nn.add(tf.keras.layers.Dense(units= 12, activation='relu')) # Add second hidden layer. We use the Rectified Linear Unit(relu) function. 
#RelU does not suffer from gradient fading problems, which makes it easier to train deep networks.

nn.add(tf.keras.layers.Dense(units= 1, activation='sigmoid')) # Add the output hidden layer. Since we he want to predict only one binary variable, the best option is 1. 
#if we would like to get something different, we would have to add more neurons. The activation function has to change, in this case, if we use the sigmoid one,
#it is going to give us the probabilitly of the variable taking the value of 1.

## 5) Train the Neural Network

First we need to compile the the NN. It meas, defining "HOW" it is going to learn. In order to do that we need to define AL LEAST 3 parameters.

a) Optimizer: Basically it is the algorithm that is going to be used. Some examples are SGD, RMSprop, or ADAM.

b) Loss Function:  It is basically the metric that is going to define how good is the model performance.

c) Metrics: It is a list of metric that are going to be evaluated and reported during the training part.

In [10]:
nn.compile(optimizer= 'adam', loss='binary_crossentropy', metrics=['accuracy']) # 1) Optimizer: Adam) we are going to use the Adaptative Moment Estimation. 
                                               #2) Loss Function:  binary_crossentropy, since we are dealing with a binary problem this is the most adecuate function
                                               #3) Metrics: for this example we are only going to use the accuracy, but we could add more
nn.fit(X_train_scaled, y_train, batch_size = 20, epochs = 200)  # the default value is batch_size = 32 but since we are dealing with a really small smaple, i want to use 20


Epoch 1/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 1ms/step - accuracy: 0.7116 - loss: 0.5841
Epoch 2/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8122 - loss: 0.4348
Epoch 3/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8168 - loss: 0.4214
Epoch 4/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8357 - loss: 0.3969
Epoch 5/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8506 - loss: 0.3633
Epoch 6/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8511 - loss: 0.3644
Epoch 7/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8589 - loss: 0.3483
Epoch 8/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8560 - loss: 0.3509
Epoch 9/200
[1m400/400[0m [32

<keras.src.callbacks.history.History at 0x204c9bd85b0>

# 6) Test the NN

In [11]:
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, f1_score,precision_score
y_pred = nn.predict(X_test_scaled)
y_pred = y_pred > 0.5 # if the possibility is higher that 50% then we say we think it is going be leave the bank. If the result is less than that, we think it is not going to.
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f'Accuracy: ' + str(round(accuracy_score(y_test, y_pred),2)) )
print(f'recall: ' + str(round(recall_score(y_test, y_pred),2)) )
print(f'f1_score: ' + str(round(f1_score(y_test, y_pred),2)) )
print(f'precision_score: ' + str(round(precision_score(y_test, y_pred),2)) )

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[[1538   69]
 [ 205  188]]
Accuracy: 0.86
recall: 0.48
f1_score: 0.58
precision_score: 0.73


Even when the accuracy is 0.86, the other metrics are quite bad. This might be due the to differences between the class 0 and class 1 in the training set. As it is obvios, there are way more class 0 examples than class 1.

# Part 2 ANN Binary decission (with arrays)

## 1) Importing the packages and data pre-processing

In [11]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
import tensorflow as tf

df = pd.read_csv('Churn_Modelling.csv')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


So we have non null values. RowNumber, CustomerID and Surnanme doesn't matter. It means we want all the features without the "exited" one.

In [6]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [8]:
print(df['Geography'].unique())
print(df['Gender'].unique())

['France' 'Spain' 'Germany']
['Female' 'Male']


So we only have 2 columns with categorical values, Gender has 2, so we can just replace them by 1 and 0. Meanwhile, Geography has 3. Since, we don't want to assume that for example: france is closer to spain rathen than germany, we need to use HotOneEncoder.

## 2) Adjust format and keep important information

In [12]:
X = df.iloc[:,3:-1].values # Since X = df.iloc[rows, columns] we want ALL the rows(:), but only columns from CreditScore(3) to the one before the last one(-1)
Y = df.iloc[:,-1].values #we want all the rows, but only the last column

# Now let's label encode the gender.
enco =  LabelEncoder()
X[:, 2] = enco.fit_transform(X[:, 2]) #The 3rd column is the Gender One

#Countries

ct =  ColumnTransformer(transformers=[('OneHotEncoding', OneHotEncoder(), [1])], remainder='passthrough') # transformers is a list 3 elements to apply the transformation.
# The first one is just the name of the transformation
# The Second one is the transforamtion that we want to apply, in this case is OneHotEncoder
# The third one, defines the columns that we want to apply the transformation, in this case is the index = 1 column (the second one)
# Finally, remainder specifies what to do with the columns that are not being transformer. In this case we just want it to pass them. If we don't specify this,
# it is going to delete the columns
X = np.array(ct.fit_transform(X))
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


In [13]:
print(Y)

[1 0 1 ... 1 1 0]


## 3) Fit the data and variables that the model is going to use

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

sc = StandardScaler() # the values have to be scaled
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled = sc.transform(X_test)

## 4) Build the Neural Network

In [16]:
import tensorflow as tf
nn = tf.keras.models.Sequential() # Create the neural network, in this case with the Sequential model which  
#allows you to build neural network models layer by layer. It is useful when you have a linear architecture, that is, each layer has exactly one input and one output.

nn.add(tf.keras.layers.Dense(units= 12, activation='relu'))  # Add first hidden layer. The number of neurons may vary when building a NN. However, I personally, 
#like to start with the same amount of columns of "features"

nn.add(tf.keras.layers.Dense(units= 12, activation='relu')) # Add second hidden layer. We use the Rectified Linear Unit(relu) function. 
#RelU does not suffer from gradient fading problems, which makes it easier to train deep networks.

nn.add(tf.keras.layers.Dense(units= 1, activation='sigmoid')) # Add the output hidden layer. Since we he want to predict only one binary variable, the best option is 1. 
#if we would like to get something different, we would have to add more neurons. The activation function has to change, in this case, if we use the sigmoid one,
#it is going to give us the probabilitly of the variable taking the value of 1.

## 5) Train the Neural Network
First we need to compile the the NN. It meas, defining "HOW" it is going to learn. In order to do that we need to define AL LEAST 3 parameters.

a) Optimizer: Basically it is the algorithm that is going to be used. Some examples are SGD, RMSprop, or ADAM.

b) Loss Function:  It is basically the metric that is going to define how good is the model performance.

c) Metrics: It is a list of metric that are going to be evaluated and reported during the training part.

In [17]:
nn.compile(optimizer= 'adam', loss='binary_crossentropy', metrics=['accuracy']) # 1) Optimizer: Adam) we are going to use the Adaptative Moment Estimation. 
                                               #2) Loss Function:  binary_crossentropy, since we are dealing with a binary problem this is the most adecuate function
                                               #3) Metrics: for this example we are only going to use the accuracy, but we could add more
nn.fit(X_train_scaled, y_train, batch_size = 20, epochs = 200)  # the default value is batch_size = 32 but since we are dealing with a really small smaple, i want to use 20

Epoch 1/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.7374 - loss: 0.5694
Epoch 2/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.7928 - loss: 0.4572
Epoch 3/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8056 - loss: 0.4354
Epoch 4/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8077 - loss: 0.4301
Epoch 5/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8206 - loss: 0.4102
Epoch 6/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8312 - loss: 0.3992
Epoch 7/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8343 - loss: 0.3949
Epoch 8/200
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8445 - loss: 0.3845
Epoch 9/200
[1m400/400[0m [32

<keras.src.callbacks.history.History at 0x21f6f0926b0>

# 6) Test the NN

In [18]:
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, f1_score,precision_score
y_pred = nn.predict(X_test_scaled)
y_pred = y_pred > 0.5 # if the possibility is higher that 50% then we say we think it is going be leave the bank. If the result is less than that, we think it is not going to.
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f'Accuracy: ' + str(round(accuracy_score(y_test, y_pred),2)) )
print(f'recall: ' + str(round(recall_score(y_test, y_pred),2)) )
print(f'f1_score: ' + str(round(f1_score(y_test, y_pred),2)) )
print(f'precision_score: ' + str(round(precision_score(y_test, y_pred),2)) )

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[[1533   74]
 [ 207  186]]
Accuracy: 0.86
recall: 0.47
f1_score: 0.57
precision_score: 0.72


Even when the accuracy is 0.86, the other metrics are quite bad. This might be due the to differences between the class 0 and class 1 in the training set. As it is obvios, there are way more class 0 examples than class 1.