# APPLIED MACHINE LEARNING
## LAB ACTIVITIES (LAB 6)
### 07/11/19
## DEEP LEARNING WITH KERAS

## 1.Why Keras?

The TensorFlow project has adopted Keras as the high-level API for the TensorFlow 2.0 release.The biggest reasons to use Keras stem from its guiding principles, primarily the one about being user friendly. Beyond ease of learning and ease of model building, Keras offers the advantages of broad adoption, support for a wide range of production deployment options, integration with at least five backend engines (TensorFlow, CNTK, Theano, MXNet, and PlaidML), and strong support for multiple GPUs anddistributed training. Plus, Keras is backed by Google, Microsoft, Amazon, Apple, Nvidia, Uber, and others.

Keras is a lightweight API and rather than providing an implementation of the required mathematical operations needed for deep learning. Itprovides a consistent interface to efficient numerical libraries called backends. Keras does not do its own low-level operations, such as tensor products and convolutions; it relies on a backend engine for that. Even though Keras supports multiple backendengines, its primary (and default) backend is TensorFlow, and its primary supporter is Google. The Keras API comes packaged in TensorFlow as 𝑡𝑓.𝑘𝑒𝑟𝑎𝑠, which isthe primary TensorFlow API as of TensorFlow 2.0.

In the following labs, we will mainly use Keras with TensorFlow. TensorFlow is an open source library for fast numerical computing. It was created and is maintained by Google and released under the Apache 2.0 open source license. The API is nominally for the Python programming language,although there is access to the underlying C++ API. Unlike other numerical libraries intended for use in Deep Learning like Theano, TensorFlow was designed for use both in research and development and in production systems, not least RankBrain in Google search and the fun DeepDream project. It can run on single CPU systems, GPUs as well as mobile devices and large-scaledistributed systems of hundreds of machines.

Your lab PCs have both TensorFlowand Theanoinstalled, you can configure the backend usedby Keras.

## 2.First Deep Learning with Multi-layered Perceptron (MLP)

Load DataWhenever we work with machine learning algorithms that use a stochastic process (e.g. random numbers), as you have been doing this, it is a good idea to initialise the random number generator with a fixed seed value. This is so that you can run the same code again and again and get the same result. This is useful if you need to demonstrate a result, compare algorithms using the same source of randomness or to debug a part of your code. You can initialise the random number generator with any seed you like, for example:


In [1]:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

import tensorflow as tf
print(tf.__version__)

# Set CPU as available physical device
#my_devices = tf.config.experimental.list_physical_devices(device_type='CPU')
#tf.config.experimental.set_visible_devices(devices= my_devices, device_type='CPU')

# To find out which devices your operations and tensors are assigned to
#tf.debugging.set_log_device_placement(True)

2.0.0


In [11]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.utils import to_categorical
import numpy
import pandas as pd
# fix random seed for reproducibility
numpy.random.seed(7)

In [12]:
!pwd

/home/bernard/Documents/BBK/CourseWork/ML/Landsat


In [13]:
# load pima indians dataset
alldata = pd.read_csv("sat.all.csv")
traindata = alldata[alldata['TrainTest'] == 'train']
testdata = alldata[alldata['TrainTest'] == 'test']
# split into input and output variables
X = traindata.iloc[:,0:36].to_numpy()
Y = traindata.iloc[:,36].to_numpy()

for i in range(1,8):
    Y[Y==i] = i-1
Y[Y==6] = 5

Y = to_categorical(Y, 6)


## Define Model

Models in Keras are defined as a sequence of layers. We create a 𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙model and add layers one at a time until we are happy with our network topology. The first thing to get right is to ensure the input layer has the right number of inputs. This can be specified when creating the first layer with the input dim argument and setting it to 8 for the 8 input variables.

How do we know the number of layers to use and their types?This is a very hard question. There are heuristics that we can use and often the best network structure is found through a process of trial and errorexperimentation. Generally, you need a network large enough to capture the structure of the problem if that helps at all. In this example we will use a fully-connected network structure with three layers.

Fully connected layers are defined using the Denseclass.We can specify the number of neurons in the layer as the first argument and specify the activation function using the 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛argument. We will use the rectifier (𝑟𝑒𝑙𝑢) activation function on the first two layers and the sigmoidactivation function in the output layer. It used to be the case that 𝑠𝑖𝑔𝑚𝑜𝑖𝑑and 𝑡𝑎𝑛ℎactivation functions were preferred for all layers. These days, better performance is seen using the 𝑟𝑒𝑙𝑢activation function. We use a 𝑠𝑖𝑔𝑚𝑜𝑖𝑑activation function on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5. We can piece it all together by adding each layer. The first hidden layer has 12 neurons and expects 8 input variables (e.g. 𝑖𝑛𝑝𝑢𝑡𝑑𝑖𝑚=8). The second hidden layer has 8 neurons and finally the output layer has 1 neuron to predict the class (onset of diabetes or not).

In [15]:
# create model
model = Sequential()
model.add(Dense(36, input_dim=36, activation='relu')) 
model.add(Dense(30, activation='relu')) 
model.add(Dense(24, activation='relu')) 
model.add(Dropout(0.2))
model.add(Dense(18, activation='relu')) 
model.add(Dense(6, activation='softmax'))

The figure below provides a depiction of the network structure. 

Visualisation of Neural Network Structure.

## Compile Model

Now that the model is defined, we can compile it. Compiling the model uses the efficient numerical libraries under the covers (i.e.backend) such as TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware. When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to make predictions for this problem.

We must specify the loss functionto use to evaluate a set of weights, the optimizerused to search through different weights for the network andany optional metrics we would like to collect and report during training. In this case we will use 𝑙𝑜𝑔𝑎𝑟𝑖𝑡ℎ𝑚𝑖𝑐𝑙𝑜𝑠𝑠, which for a binary classification problem is defined in Keras as 𝑏𝑖𝑛𝑎𝑟𝑦_𝑐𝑟𝑜𝑠𝑠𝑒𝑛𝑡𝑟𝑜𝑝𝑦. We will also use theefficient gradient descent algorithm 𝑎𝑑𝑎𝑚for no other reason that it is an efficient default. Learn more about the Adam optimisation algorithm in the paper Adam: A Method for StochasticOptimization. See below. 

https://arxiv.org/abs/1412.6980Finally, 

because it is a classification problem, we will collect and report the classification accuracy as the metric.

In [16]:
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## Fit Model

We have defined our model and compiled it ready for efficient computation. Now it is time to execute the model on some data. We can train or fit our model on our loaded data by calling the 𝑓𝑖𝑡()function on the model.

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the epochs argument. We can also set the number of instances that are evaluated before a weight update in the network is performed called the batch size and set using the batch size argument. For this problem we will run for a small number of epochs (50) and use a relatively small batch size of 10. Again, these can be chosen experimentally by trial and error.

In [17]:
# fit the model
model.fit(X,Y, epochs=500, batch_size=256)

Train on 4435 samples
Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500

<tensorflow.python.keras.callbacks.History at 0x7f057445f908>

## Evaluate Model

We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset. This will only give us an idea of how well we have modelledthe dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new data. We have done this for simplicity, but ideally, you could separate your data into train and test datasets for the training and evaluation of your model.

You can evaluate your model on your training dataset using the 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑖𝑜𝑛()function on your model and pass it the same input and output used to train the model. This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

In [9]:
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


accuracy: 79.95%


## Data Splitting

The large amount of data and the complexity of the models require very long training times. As such, it is typical to use a simple separation of data into training and test datasets or training and validation datasets. Keras provides two convenient ways of evaluating your deep learningalgorithms this way:

1.Use an automatic verification dataset. 

2.Use a manual verification dataset.

### Automatic Verification Dataset.
Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset each epoch. You can do this by setting the validation split argument on the 𝑓𝑖𝑡()function to a percentage of the size of your training dataset. For example, a reasonable value might be 0.2 or 0.33 for 20% or 33% of your training data held back for validation. The code below demonstrates the use of using an automatic validation dataset on the Pima Indians onset of diabetes dataset.

In [10]:
# MLP with automatic validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy

# fix random seed for reproducibility
numpy.random.seed(7)

# load pima indians dataset
dataset = numpy.loadtxt("../../Data/AML/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 

# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
E

<tensorflow.python.keras.callbacks.History at 0x7f71c83d2978>

### Manual Verification Dataset.
Keras also allows you to manually specify the dataset to use for validation during training. In this example we use the handy train test 𝑠𝑝𝑙𝑖𝑡()function from the Python scikit-learn machine learning library to separate our data into a training and test dataset. We use 67% for training and the remaining 33% of the data for validation. The validation dataset can be specified to the 𝑓𝑖𝑡()function in Keras by the validation data argument. It takes a tuple of the input and output datasets.

In [11]:
# MLP with automatic validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("../../Data/AML/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=150, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
E

<tensorflow.python.keras.callbacks.History at 0x7f7140349470>

### Manual k-Fold Cross-Validation.

The gold standard for machine learning model evaluation is k-fold cross-validation. It provides a robust estimate of the performance of a model on unseen data. However, cross-validation is often not used for evaluating deep learning models because of the greater computational expense. For example k-fold cross-validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, greatly adding to the evaluation time of a model. Nevertheless, when the problem is small enough or if you have sufficient compute 
resources, k-fold cross-validation can give you a less biased estimate of the performance of your model.

In the example below we use the handy 𝑆𝑡𝑟𝑎𝑡𝑖𝑓𝑖𝑒𝑑𝐾𝐹𝑜𝑙𝑑class from the scikit-learn Python machine learning library to split up the training dataset into 10 folds. See below. 

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html

The folds are stratified, meaning that the algorithm attempts to balance the number of instances of each class in each fold. The example creates and evaluates 10 models using the 10 splits of the data and collects all of the scores. The verbose output for each epoch is turned off by passing verbose=0 to the 𝑓𝑖𝑡()and 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒()functions on the model. The performance is printed for each model and it is stored. The average and standard deviation of the model performance is then printed at the end of the run to provide a robust estimate of model accuracy.

In [12]:
# MLP for Pima Indians Dataset with 10-fold cross validation
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("../../Data/AML/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
    # fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    # evaluate the model
    scores = model.evaluate(X[test],Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 
    cvscores.append(scores[1] * 100)
    
print("%.2f%% (+/-%.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

accuracy: 76.62%
accuracy: 71.43%
accuracy: 74.03%
accuracy: 76.62%
accuracy: 66.23%
accuracy: 72.73%
accuracy: 72.73%
accuracy: 67.53%
accuracy: 69.74%
accuracy: 75.00%
72.27% (+/-3.39%)


## 3.Use Keras with Scikit-Learn

The scikit-learn library is the most popular library for general machine learning in Pythonwhile Keras is a popular library for deep learning in Python. However the focus of the Keras library is deep learning, not all of machine learning. In fact it strives for minimalism, focusing on only what 
you need to quickly and simply define and build deep learning models. The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully featured library for general purpose machine learning and provides many utilities that are useful in the development of deep learning models. Not least:

•Evaluation of models using resampling methods like k-fold cross-validation. 

•Efficient search and evaluation of model hyperparameters.

The Keras library provides a convenient wrapper for deep learning models to be used as classification or regression estimators in scikit-learn. In this section we will work through examples of using the 𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟wrapper for a classification neural network created in Keras and used in the scikit-learn library. 

### Evaluate Models with Cross-Validation

The 𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟and 𝐾𝑒𝑟𝑎𝑠𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟classes in Keras take an argument 𝑏𝑢𝑖𝑙𝑑_𝑓𝑛which is the name of the function to call to create your model. You must define a function called whatever you like thatdefines your model, compiles it and returns it. In the example below we define a function 𝑐𝑟𝑒𝑎𝑡𝑒_𝑚𝑜𝑑𝑒𝑙()that create a simple multilayer neural network for the problem.We pass this function name to the 𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟class by the 𝑏𝑢𝑖𝑙𝑑_𝑓𝑛argument. We also pass in additional arguments of 𝑒𝑝𝑜𝑐ℎ𝑠=150and 𝑏𝑎𝑡𝑐ℎ𝑠𝑖𝑧𝑒=10. These are automatically bundled up and passed on to the 𝑓𝑖𝑡()function which is called internally by the 𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟class. In this example we use the scikit-learn 𝑆𝑡𝑟𝑎𝑡𝑖𝑓𝑖𝑒𝑑𝐾𝐹𝑜𝑙𝑑to perform 10-fold stratified cross-validation. This is a resampling technique that can provide a robust estimate of the performance of a machine learning model on unseen data. We use the scikit-learn function 𝑐𝑟𝑜𝑠𝑠_𝑣𝑎𝑙_𝑠𝑐𝑜𝑟𝑒()to evaluate our model using the cross-validation scheme and print the results.

In [13]:
# MLP for Pima Indians Dataset with 10-fold cross validation via sklearn
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
import numpy

# create a function to build amodel, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu')) 
    model.add(Dense(8, activation='relu')) 
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("../../Data/AML/pima-indians-diabetes.data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, epochs=150, batch_size=10, verbose=0)

# evaluate using 10-fold cross validation
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())


0.7226418316364288


## Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from Keras and use it in functions from the scikit-learn library. In this example we go a step further. We already know we can provide arguments to the 𝑓𝑖𝑡()function. The function that we specify to the 𝑏𝑢𝑖𝑙𝑑_𝑓𝑛argument when creating the 𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟wrapper can also take arguments. We can use these arguments to further customise the construction of the model.

In this example we use a grid search to evaluate different configurations for our neural network model and report on the combination that provides the best estimated performance. The create 𝑚𝑜𝑑𝑒𝑙()function is defined to take two arguments 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟and 𝑖𝑛𝑖𝑡, both of which must have default values. This will allow us to evaluate the effect of using different optimisation algorithms and weight initialisation schemes for our network. After creating our model, we define arrays of values for the parameter we wish to search, specifically:

•Optimizers for searching different weight values.

•Initializers for preparing the network weights using different schemes.

•Number of epochs for training the model for different number of exposures to the training dataset.

•Batches for varying the number of samples before weight updates.

The options are specified into a dictionary and passed to the configuration of the 𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐ℎ𝐶𝑉scikit-learn class. This class will evaluate a version of our neural network model for each com-bination of parameters (2 × 3 × 3 × 3) for the combinations of optimizers, initializations, epochs and batches). Each combination is then evaluated using the default of 3-fold stratified cross-validation.

That is a lot of models and a lot of computation. This is not a scheme that you want to use lightly because of the time it will take to compute. It may be useful for you to design small experiments with a smaller subset of your data that will complete in a reasonable time. This experiment is reasonable in this case because of the small network and the small dataset (less than 1,000 instances and 9 attributes). Finally, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters.

In [14]:
# MLP for Pima Indians Dataset with grid search via sklearn
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import numpy

# create a function to build amodel, required for KerasClassifier
def create_model(optimizer='rmsprop', init='glorot_uniform'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) 
    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("../../Data/AML/pima-indians-diabetes.data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# grid search epochs, batch size and optimizer
optimizers = ['rmsprop', 'adam']
inits = ['glorot_uniform', 'normal', 'uniform']
epochs = [50, 100, 150]
batches = [5, 10, 20]
param_grid = dict(optimizer=optimizers, epochs=epochs, batch_size=batches, init=inits) 
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, Y)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.752604 using {'batch_size': 20, 'epochs': 150, 'init': 'uniform', 'optimizer': 'adam'}
0.691406 (0.003189) with: {'batch_size': 5, 'epochs': 50, 'init': 'glorot_uniform', 'optimizer': 'rmsprop'}
0.656250 (0.043848) with: {'batch_size': 5, 'epochs': 50, 'init': 'glorot_uniform', 'optimizer': 'adam'}
0.694010 (0.016367) with: {'batch_size': 5, 'epochs': 50, 'init': 'normal', 'optimizer': 'rmsprop'}
0.709635 (0.022628) with: {'batch_size': 5, 'epochs': 50, 'init': 'normal', 'optimizer': 'adam'}
0.692708 (0.018136) with: {'batch_size': 5, 'epochs': 50, 'init': 'uniform', 'optimizer': 'rmsprop'}
0.695312 (0.016573) with: {'batch_size': 5, 'epochs': 50, 'init': 'uniform', 'optimizer': 'adam'}
0.680990 (0.051855) with: {'batch_size': 5, 'epochs': 100, 'init': 'glorot_uniform', 'optimizer': 'rmsprop'}
0.696615 (0.016367) with: {'batch_size': 5, 'epochs': 100, 'init': 'glorot_uniform', 'optimizer': 'adam'}
0.739583 (0.033502) with: {'batch_size': 5, 'epochs': 100, 'init': 'normal', 'opt