**Customer Classification**

We have a dataset consisting of Bank Customer information, so we build a classifier which will tell us if a customer will exit the bank or not.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import os
print(os.listdir("../input"))

We start by encoding the categorical values:

In [None]:
# Multiple Columns Label Encoder
from sklearn.preprocessing import LabelEncoder
class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns 

    def fit(self,X,y=None):
        return self

    def transform(self,X):
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

**Data Preprocessing**
after, we pass to the preprocessing phase, in this part, we separate training data and test, and we standardize the data with the *MinMaxScaler* function instead of the *StandardScaler* method because the range of values must be between 0 and 1.

In [None]:
# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
Churn_Modelling = pd.read_csv("../input/bank-customer-churn-modeling/Churn_Modelling.csv")
X = Churn_Modelling.iloc[:,3:-1]
y = Churn_Modelling.iloc[:,-1]
X = MultiColumnLabelEncoder(columns = ['Geography','Gender']).fit_transform(pd.DataFrame(X))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
sc = MinMaxScaler(feature_range=(0,1))
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
X

**Making the RNN (LSTM)**

In [None]:
# LSTM Implementation
import keras
from subprocess import check_output
from keras.layers.core import Dense, Activation, Dropout
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import time
trainX = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
testX = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

A hurestic tip is that the amount of nodes (dimensions) in your hidden layer should be the average of your input and output layers, which means that since we have **11** dimensions and we are looking for a binary output, we calculate this to be  **(11+1)÷2=6** .

**The breakdown of the inputs for the output layer is as follows:**

**optimizer:** *adam* The algorithm we want to use to find the optimal set of weights in the neural networks. Adam is a very efficeint variation of Stochastic Gradient Descent.

**loss:** *binary_crossentropy* This is the loss function used within adam. This should be the logarthmic loss. If our dependent (output variable) is Binary, it is binary_crossentropy. If Categorical, then it is called categorical_crossentropy

**metrics:** *[accuracy]* The accuracy metrics which will be evaluated(minimized) by the model. Used as accuracy criteria to imporve model performance.

In [None]:
from numpy import newaxis
model = Sequential()

model.add(LSTM(input_shape=(1,10),units=6,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32,return_sequences=True))
model.add(LSTM(32))
model.add(Dropout(0.1))
model.add(Dense(activation="sigmoid", units=1))

start = time.time()
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
print ('compilation time : ', time.time() - start)

**Fitting the RNN**

This is where we will be fitting the RNN to our training set.

The breakdown of the inputs for compiling is as follows:
**trainX** The independent variable portion of the data which needs to be fitted with the model.

**y_train** The output portion of the data which the model needs to produce after fitting.

**batch_size:** How often we want to back-propogate the error values so that individual node weights can be adjusted.

**epochs:** The number of times we want to run the entire test data over again to tune the weights. This is like the fuel of the algorithm.

**validation_split:** 0.1 The fraction of data to use for validation data.

In [None]:
history=model.fit(trainX,y_train,batch_size=500,epochs=1000,validation_split=0.1)

The output network should converge to an accuracy of around 86%

In [None]:
trainPredict = model.predict(trainX)
print(trainPredict)
print(model.summary())

**Testing the RNN**
Predicting the Test set results
This shows the probability of a customer leaving given the testing data. Each row in X_test corresponds to a row in Y_test

In [None]:
plt.plot(np.array(history.history['accuracy']) * 100)
plt.plot(np.array(history.history['val_accuracy']) * 100)
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.legend(['train', 'validation'])
plt.title('Accuracy over epochs')
plt.show()

In [None]:
y_pred = model.predict(testX)
print(y_pred[:5])

To use the confusion Matrix, we need to convert the probabilities that a customer will leave the bank into the form true or false. So we will use the cutoff value 0.5 to indicate whether they are likely to exit or not.

In [None]:
y_pred = (y_pred > 0.5).astype(int)
print(y_pred[:5])

**Making the Confusion Matrix**

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

**Significance of the confusion matrix value:**

This means that we should have about  *(1547+184)=1731*  correct classifications out of our total testing data size of  2000 . This means that our accuracy for this trial was  *1731÷2000=0.8655* , which matches the classifier's prediction

In [None]:
print (((cm[0][0]+cm[1][1])*100)/(len(y_test)), '% of testing data was classified correctly')