
The points distribution for this case is as follows:
1. Read the dataset in a new python notebook.
2. Drop the columns which are unique for all users like IDs (2.5 points)
3. Distinguish the feature and target set (2.5 points)
4. Divide the data set into Train and test sets
5. Normalize the train and test data (2.5 points)
6. Initialize &amp; build the model (10 points)
7. Optimize the model (5 points)
9. Predict the results using 0.5 as a threshold (5 points)
10. Print the Accuracy score and confusion matrix (2.5 points)
 


### Given a Bank customer, can we build a classifier which can determine whether they will leave or not using Neural networks?


In [1]:
import tensorflow as tf
print(tf.__version__)

2.0.0


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
import pandas as pd
import numpy as np

In [3]:
#Read the dataset in a new python notebook
df=pd.read_csv("bank.csv")

In [4]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


### Drop the columns which are unique for all users like IDs

In [6]:

df=df.drop(columns=['RowNumber','CustomerId','Surname'])

- Looking at the features we can see that row number, customer ID,surname will have no relation with a customer with leaving the bank.

In [7]:
df.columns

Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Exited'],
      dtype='object')

### Distinguish the feature and target set 

In [8]:
X = df.iloc[:, 0:10].values

In [9]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ...,
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

In [10]:
Y = df.iloc[:, 10].values

In [11]:
Y

array([1, 0, 1, ..., 1, 1, 0], dtype=int64)

In [12]:
#from sklearn.preprocessing import LabelEncoder
#categorical_classes_list = ['Geography','Gender'] 
#encode features that are cateorical classes
#encoding_list = []
#for column in categorical_classes_list:
    #le = LabelEncoder()
    #le.fit(df[column])
    #encoding_list.append(df[column].unique())
    #encoding_list.append(list(le.transform(df[column].unique())))
    #df[column] = le.transform(df[column])
    #test_set[column] = le.transform(test_set[column])

In [13]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])


In [14]:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


### Divide the data set into Train and test sets

In [15]:
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(X,Y,test_size=.30,random_state=2)

In [16]:
train_x =np.array(train_x).astype('float32')
test_x = np.array(test_x).astype('float32')
train_y =np.array(train_y).astype('float32')
test_y = np.array(test_y).astype('float32')

### Normalize the train and test data

In [17]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_x = sc.fit_transform(train_x)
test_x = sc.transform(test_x)

In [18]:
train_x.shape

(7000, 11)

In [19]:
test_x.shape

(3000, 11)

In [20]:
train_y.shape

(7000,)

In [21]:
train_y=train_y.reshape(-1,1)

In [22]:
train_y.shape

(7000, 1)

### Initialize & build the model

In [23]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import regularizers
from tensorflow.keras import optimizers
from tensorflow.keras.backend import backend

In [24]:
#Adding the input layer and the first hidden layer…
classifier = Sequential()
classifier.add(Dense(10, activation = 'relu', input_dim = 11))


In [25]:
#Adding the second hidden layer…
classifier.add(Dense(10, activation = 'relu'))

In [26]:
#Adding the output layer…
classifier.add(Dense(1, activation = 'sigmoid'))

### Optimize the model 

In [27]:
#Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [28]:
classifier.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=30,batch_size = 32)

Train on 7000 samples, validate on 3000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x273ce0b10f0>

### Predict the results using 0.5 as a threshold

In [29]:
# Predicting the Test set results

y_pred = classifier.predict(test_x)
y_pred = (y_pred > 0.5)

- if y_pred is larger than 0.5 it returns true(1) else false(2)

In [30]:
y_pred

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

### Print the Accuracy score and confusion matrix 

In [31]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(test_y, y_pred)
print(cm)

[[2318   97]
 [ 332  253]]


- Out of 3000 customers 2318+253=2571 were predicted accurately and 332+97=429 customers were predicted inaccurately.


In [32]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(test_y, y_pred)
print('Accuracy: %f' % accuracy)

Accuracy: 0.857000
