<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#practical_plan">Practical Overview</a></li>
        <li><a href="#reading_data">Preparation: importing packages and loading data </a></li>
        <li><a href="#keras_naive">Building a Feedforward Classifier Using Keras</a></li>
        <li><a href="#keras_tuned">Your Task: Using Different Hyperparameters </a></li>
    </ol>
</div>
<br>
<hr>


<h2 id="practical_plan">Practical and Data Overview</h2>

- <b> Aim: </b> using Keras to Build a Simple Neural Network for Classification 
    - Keras is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models.
    - Keras is now part of Tensorflow
    - Keras wraps the efficient numerical computation libraries performed in Tensorflow and allows you to define and train neural network models in just a few lines of code by specifying the design of the network's layers. 
    - Keras was build by Francois Chollet. Book: https://www.manning.com/books/deep-learning-with-python-second-edition
    - In this tutorial, you will discover how to create your first deep learning neural network model in Python using Keras.
    - We will use the same dataset (prima indians diabetes) in this tutorial. 
- <b>Prerequisites: </b>

    - You need Tensorflow to run this tutorial. You should have installed Tensorflow during a previous practical. However, if you have missed the session, please refer to Jeff Heaton's Youtube blog for instructions: 

        - Windows 10:	
https://www.youtube.com/watch?v=RgO8BBNGB8w&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN

        - MacOS
https://www.youtube.com/watch?v=MpUvdLD932c&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN

    - Alternatively, you can use Google Colab: https://colab.research.google.com/

- <b> Practical Steps: </b> 
    - Load Data.
    - Define Keras Model.
    - Compile Keras Model.
    - Fit Keras Model.
    - Evaluate Keras Model.
    - Make Predictions
    - Repeat the same steps with a more tuned model and examine the performance difference. 


<h2 id="read_data">Loading the data into appropriate variables. </h2>

As in the first practical, we can directly use numpy to load the dataset into two arrays: X (2D array/matrix) and y (1D array). We will need to import the loadtxt library from numpy. 


In [29]:
from numpy import loadtxt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import classification_report


dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')

# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

X.shape

ImportError: cannot import name 'Adam' from 'keras.optimizers' (/home/changhyun/King's College London/ml_bioinfo_hi/mlBioHealth/lib/python3.8/site-packages/keras/optimizers.py)

Note, the dataset has 9 columns and the range 0:7 will select columns from 0 to 7, stopping before index 8. If this is new to you, then you can learn more about array slicing and ranges in this blog: 

https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

<h2 id="keras_naive">Building a Feed-forward Classifier with Keras</h2>

- We are now ready to define our neural network model.
- Models in Keras are defined as a sequence of layers.

- We create a Sequential model (feed forward network) and add layers one at a time until we are happy with our network architecture.
- The first thing to get right is to ensure the input layer has the right number of input features. This can be specified when creating the first layer with the input_dim argument and setting it to 8 for the 8 input variables.

- How do we know the number of layers and their types?

- This is a very hard question. There are heuristics that we can use and often the best network structure is found through a process of trial and error experimentation. Generally, you need a network large enough to capture the structure of the problem.

- In this example, we will use a fully-connected network structure with three layers.

- Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the activation argument.

- We will use the rectified linear unit (Relu) activation function on the first two layers and the Sigmoid function in the output layer.

- It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function. We use a sigmoid on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.

- We can piece it all together by adding each layer:
    - The model expects rows of data with 8 variables (the input_dim=8 argument)
    - The first hidden layer has 12 nodes and uses the relu activation function.
    - The second hidden layer has 8 nodes and uses the relu activation function.
    - The output layer has one node and uses the sigmoid activation function.


##### First, we need to import the layers from the keras library

In [38]:
from keras.models import Sequential  ##We are building a sequential (feed forward) model
from keras.layers import Dense  
from keras.wrappers.scikit_learn import KerasClassifier##Dense layers (we are adding fully connected layers)

In [18]:
# Create the keras model and add layers one by one, indicating the number of neurons and activation function. 

model = Sequential()
model.add(Dense(20, input_dim=8, activation='relu'))  ##12 = number of neurons, input_dim = number of features
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

<b>Note</b>: the most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer. This means that the line of code that adds the first Dense layer is doing 2 things, defining the input or visible layer and the first hidden layer.

####  Compile Keras Model
Now that the model is defined, we can compile it.

Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.

When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights and biases to map inputs to outputs in our dataset.

We must specify the loss function (i.e. the error) to use to evaluate a set of weights, the optimizer is used to search through different weights for the network and any optional metrics we would like to collect and report during training.

In this case, we will use cross entropy as the loss argument. This loss is for a binary classification problems and is defined in Keras as <b>binary_crossentropy</b>. You can learn more about choosing loss functions based on your problem here:

https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/

We will define the optimizer as the efficient stochastic gradient descent algorithm <b>adam</b>. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. To learn more about the Adam version of stochastic gradient descent see the post:

https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
Finally, because it is a classification problem, we will collect and report the classification accuracy, defined via the <b>metrics</b> argument.

The file code to compile network becomes: 

In [20]:
# compile the keras model
import tensorflow as tf
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[tf.keras.metrics.Precision()])



- A good explanation of the difference between the loss and metrics arguments: 

https://stackoverflow.com/questions/48280873/what-is-the-difference-between-loss-function-and-metric-in-keras#:~:text=The%20loss%20function%20is%20used,do%20with%20the%20optimization%20process.

#### Fit Keras Model
We have defined our model and compiled it ready for efficient computation.

Now it is time to execute the model on some data.

We can train or fit our model on our loaded data by calling the fit() function on the model.

Training occurs over epochs and each epoch is split into batches.

- Epoch: One pass through all of the rows in the training dataset.
- Batch: One or more samples considered by the model within an epoch before weights are updated.

One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for many epochs. For more on the difference between epochs and batches, see the link: 

https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the <b>epochs</b> argument. We must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size and set using the <b>batch_size</b> argument.

For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.

These configurations can be chosen experimentally by trial and error. We want to train the model enough so that it learns a good (or good enough) mapping of rows of input data to the output classification. The model will always have some error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.

In [5]:
# fit the keras model on the dataset
model.fit(x_train, y_train, epochs=150, batch_size=10)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

Epoch 84/150
Epoch 85/150
Epoch 86/150
Epoch 87/150
Epoch 88/150
Epoch 89/150
Epoch 90/150
Epoch 91/150
Epoch 92/150
Epoch 93/150
Epoch 94/150
Epoch 95/150
Epoch 96/150
Epoch 97/150
Epoch 98/150
Epoch 99/150
Epoch 100/150
Epoch 101/150
Epoch 102/150
Epoch 103/150
Epoch 104/150
Epoch 105/150
Epoch 106/150
Epoch 107/150
Epoch 108/150
Epoch 109/150
Epoch 110/150
Epoch 111/150
Epoch 112/150
Epoch 113/150
Epoch 114/150
Epoch 115/150
Epoch 116/150
Epoch 117/150
Epoch 118/150
Epoch 119/150
Epoch 120/150
Epoch 121/150
Epoch 122/150
Epoch 123/150
Epoch 124/150
Epoch 125/150
Epoch 126/150
Epoch 127/150
Epoch 128/150
Epoch 129/150
Epoch 130/150
Epoch 131/150
Epoch 132/150
Epoch 133/150
Epoch 134/150
Epoch 135/150
Epoch 136/150
Epoch 137/150
Epoch 138/150
Epoch 139/150
Epoch 140/150
Epoch 141/150
Epoch 142/150
Epoch 143/150
Epoch 144/150
Epoch 145/150
Epoch 146/150
Epoch 147/150
Epoch 148/150
Epoch 149/150
Epoch 150/150


<keras.callbacks.History at 0x7ff89474fa30>

####  Evaluate Keras Model
We have trained our neural network on the training dataset and we can evaluate the performance of the network on the test dataset. Let's first use the model to predict the actual classes for the test set

Making predictions is as easy as calling the predict() function on the model. We are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. We can easily convert them into a crisp binary prediction for this classification task by rounding them.

ou can visualise your predictions using any of the sklearn capabilities. Example below shows the classification report.

For example: 


In [6]:
# make probability predictions with the model
predictions = model.predict(x_test)
# round predictions 
rounded = [round(x[0]) for x in predictions]
print(classification_report(y_test, rounded))

              precision    recall  f1-score   support

         0.0       0.71      0.92      0.80        48
         1.0       0.73      0.38      0.50        29

    accuracy                           0.71        77
   macro avg       0.72      0.65      0.65        77
weighted avg       0.72      0.71      0.69        77



<h2 id="keras_tuned">Changing the Model's Hyperparameters</h2>

 - The NN above shows a very bad performance with respect to the positive class. 
 - Can we do better? let's try differnt parameters: 
     - batch_size
     - epochs
     - metrics (try: metrics=tf.keras.metrics.Precision())


In [59]:
### Your solution here ###

def create_model(neuron1, neuron2):
    model = Sequential()
    model.add(Dense(neuron1, input_dim=8, activation='relu'))
    model.add(Dense(neuron2, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

In [60]:
param_grid = {
    'neuron1': [10, 20],
    'neuron2': [10, 20],
    'batch_size': [5,10],
}

In [61]:
model = KerasClassifier(create_model)
validate = GridSearchCV(estimator=model, param_grid=param_grid, verbose=10)
validate_fit = validate.fit(x_train, y_train)
validate_fit.best_params_

  model = KerasClassifier(create_model)


Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV 1/5; 1/8] START batch_size=5, neuron1=10, neuron2=10........................
[CV 1/5; 1/8] END batch_size=5, neuron1=10, neuron2=10;, score=0.468 total time=   0.6s
[CV 2/5; 1/8] START batch_size=5, neuron1=10, neuron2=10........................
[CV 2/5; 1/8] END batch_size=5, neuron1=10, neuron2=10;, score=0.572 total time=   0.7s
[CV 3/5; 1/8] START batch_size=5, neuron1=10, neuron2=10........................
[CV 3/5; 1/8] END batch_size=5, neuron1=10, neuron2=10;, score=0.601 total time=   0.6s
[CV 4/5; 1/8] START batch_size=5, neuron1=10, neuron2=10........................
[CV 4/5; 1/8] END batch_size=5, neuron1=10, neuron2=10;, score=0.399 total time=   0.6s
[CV 5/5; 1/8] START batch_size=5, neuron1=10, neuron2=10........................
[CV 5/5; 1/8] END batch_size=5, neuron1=10, neuron2=10;, score=0.551 total time=   0.6s
[CV 1/5; 2/8] START batch_size=5, neuron1=10, neuron2=20........................
[CV 1/5; 2/8] 

[CV 4/5; 5/8] END batch_size=10, neuron1=10, neuron2=10;, score=0.529 total time=   0.6s
[CV 5/5; 5/8] START batch_size=10, neuron1=10, neuron2=10.......................
[CV 5/5; 5/8] END batch_size=10, neuron1=10, neuron2=10;, score=0.551 total time=   0.6s
[CV 1/5; 6/8] START batch_size=10, neuron1=10, neuron2=20.......................
[CV 1/5; 6/8] END batch_size=10, neuron1=10, neuron2=20;, score=0.576 total time=   0.5s
[CV 2/5; 6/8] START batch_size=10, neuron1=10, neuron2=20.......................
[CV 2/5; 6/8] END batch_size=10, neuron1=10, neuron2=20;, score=0.529 total time=   0.5s
[CV 3/5; 6/8] START batch_size=10, neuron1=10, neuron2=20.......................
[CV 3/5; 6/8] END batch_size=10, neuron1=10, neuron2=20;, score=0.594 total time=   0.5s
[CV 4/5; 6/8] START batch_size=10, neuron1=10, neuron2=20.......................
[CV 4/5; 6/8] END batch_size=10, neuron1=10, neuron2=20;, score=0.341 total time=   0.5s
[CV 5/5; 6/8] START batch_size=10, neuron1=10, neuron2=20....

{'batch_size': 5, 'neuron1': 20, 'neuron2': 10}

In [62]:
## the model still performs badly, especially w.r.t. the positive class. There's a big class imbalance, see here: 
import numpy as np

values, counts = np.unique(y, return_counts=True)
print(counts)
##there is double the number of samples in the negative class. 
## NNs are very data hungry. they would require much more data about the positive class to make good predictions!
## We will learn how to deal with class imbalance next week! :) 

[500 268]
