https://medium.com/@pushkarmandot/build-your-first-deep-learning-neural-network-model-using-keras-in-python-a90b5864116d

Prerequisite: I am using Spyder by Anaconda, so you also need to install same (from here). You need to install Tensorflow, Theano and Keras libraries in spyder, follow the steps in my other blog over here. You must be thinking that we are performing analysis using Keras then why do we need Tensorflow and Theano? The fact of the matter is Keras is built on top of Tensorflow and Theano so this 2 insane library will be running in back-end whenever you run the program in Keras.

Step 1: Importing data. Pandas DataFrame gives massive functionality to work on data thus, here we are using pandas to import data.

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

from sklearn.linear_model import LogisticRegression

# For saving ML models
import pickle

# Importing the dataset
dataset = pd.read_csv('Datasets/Churn_Modelling.csv')

In [2]:
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


Step 2: Create matrix of features and matrix of target variable. In this case we are excluding column 1 & 2 as those are ‘row_number’ and ‘customerid’ which are not useful in our analysis. Column 14, ‘Exited’ is our Target Variable

# Preprocess

In [3]:
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

In [4]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ...,
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

Step 3: Let’s make analysis simpler by encoding string variables. Country has string labels such as “France, Spain, Germany” while Gender has “Male, Female”. We have to encode this strings into numeric and we can simply do it using pandas but here I am introducing new library called ‘ScikitLearn’ which is strongest machine learning library in python. We will use ‘LabelEncoder’. As the name suggests, whenever we pass a variable to this function, this function will automatically encode different labels in that column with values between 0 to n_classes-1.

In [5]:
# Encode labels and save to pickle file

labelEncoder = LabelEncoder()
X[:, 1] = labelEncoder.fit_transform(X[:, 1])
filename = 'labelEncoder1.pickle'
pickle.dump(labelEncoder, open(filename, 'wb'))

labelEncoder = LabelEncoder()
X[:, 2] = labelEncoder.fit_transform(X[:, 2])
filename = 'labelEncoder2.pickle'
pickle.dump(labelEncoder, open(filename, 'wb'))

In [6]:
X

array([[619, 0, 0, ..., 1, 1, 101348.88],
       [608, 2, 0, ..., 0, 1, 112542.58],
       [502, 0, 0, ..., 1, 0, 113931.57],
       ...,
       [709, 0, 0, ..., 0, 1, 42085.58],
       [772, 1, 1, ..., 1, 0, 92888.52],
       [792, 0, 0, ..., 1, 0, 38190.78]], dtype=object)

https://en.wikiversity.org/wiki/Dummy_variable_(statistics)
Step 4: How to create dummy variable in python? (dummy var details in article) We will use the same ScikitLearn library but this time we will use another function called as ‘OneHotEncoder’, yeah it is seriously hot. We just need to pass the column number and whoosh your dummy variable is created.

Step 5: We will make use of ScikitLearn’s ‘train_test_split’ function to divide our data. Roughly people keep 80:20, 75:25, 60:40 as their train test split ratio. Here we are keeping it as 80:20.

In [7]:
# Splitting the dataset into the Training set and Test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Step 6: ‘StandardScaler’ is available in ScikitLearn. In the following code we are fitting and transforming StandardScaler method on train data. We have to standardize our scaling so we will use the same fitted method to transform/scale test data.

In [8]:
# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Save to pickle
filename = 'scaler.pickle'
pickle.dump(sc, open(filename, 'wb'))

# Define Model

In [9]:
log_reg = LogisticRegression(solver='lbfgs')
log_reg.fit(X_train, y_train)

# Save to pickle file
filename = 'log_reg_model.pickle'
pickle.dump(log_reg, open(filename, 'wb'))

# Evaluate Model

In [10]:
y_pred = log_reg.predict(X_test)

In [11]:
print(f'confusion matrix: \n{metrics.confusion_matrix(y_pred, y_test)}')
print(f'accuracy: {metrics.accuracy_score(y_pred, y_test)}')
print(f'recall: {metrics.recall_score(y_pred, y_test)}')

confusion matrix: 
[[1569  317]
 [  54   60]]
accuracy: 0.8145
recall: 0.5263157894736842


# Test to see if we can load the model and predict again

In [12]:
filename = 'log_reg_model.pickle'
model = pickle.load(open(filename, 'rb'))

In [13]:
y_pred = model.predict(X_test)
metrics.accuracy_score(y_pred, y_test)

0.8145

### Success!

Time to build Flask API