# Welcome
In these labs we will be using a the programming language of Python and some of the core packages and frameworks most commonly used in scientific computing such as numpy, tensorflow, scikit packages and keras.

**If you require a short introduction to Python please make sure to visit the following tutorials.**

Introduction to Python: https://www.w3schools.com/python/python_intro.asp

Introduction to Numpy: http://cs231n.github.io/python-numpy-tutorial/

After you are familiar with Python and its syntax you can continue this lab that will introduce you to Jupyter Notebooks and the Keras framework in more detail.

# Introduction to Jupyter Notebooks

We will be using Jupyter Notebooks for the ease of showcasing and developing our solutions. As this is a web solution you will find it easy to use and easy to save, export and import your own solutions to the exercises. 

To help you save some time here are some of the keyboard shortcuts that you may find will help you speed up your work when using Jupyter Notebooks
- **shift + enter** run cell, select below
- **ctrl + enter** run cell
- **A** insert cell above
- **B** insert cell below
- **C** copy cell
- **V** paste cell
- **D** delete selected cell
- **shift + M** merge selected cells
- **I** interrupt kernel
- **0** restart kernel (with dialog)

## Resource handling

Every time you open an `ipnyb` (a iPython Notebook) a corresponding python Kernel will be launched. This, over time and when using more complex models in the future, can resolve in issues with the computer resources.

The more notebooks you open, the more kernels will get launched and you may find yourselves in a situation where you are getting `our of memory` error codes from the console. To make sure you are only using the memory for the active notebooks you may want to make sure to `shutdown` the kernels that you are not using.

![Kernels](data/kernels.png)

# Introduction to Keras

In these demonstrations we will be using Keras framework, a modern AI framework that is built on top of the older Tensorflow. Keras allows us to demonstrate concepts in a more efficient way without losing too much control.

You can read the official documentation and explore the details that we will not have time to cover in these sessions on the following URL: https://keras.io/

## Creating a sequential model

Models can be created in both sequential and non-sqeuantial manner. For the start we will begin with the sequential model and how to create one.

One of the first things that we would like you to do is to experience the ease with which you can create a simple sequential model. Following the official documentation explore the code below to familiarise yourselves.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

We import the relevant parts from Keras such as models and layers. We will explain these in future demonstrations.

With a single line we have specified that the new model will be sequential model. The next thing we need to do is add the first `layer` and to specify the shape of the input.

In [None]:
model.add(Dense(32, input_dim=100))

The first layer of the network needs to get the input shape information. This will be the input tensor that our model will process and will be processed on a layer to layer basis.

After adding the first layer, we will specify the Activation layer with the following simple command.

In [None]:
model.add(Dense(1, activation='sigmoid'))

Before we can train the model we need to configure the learning process. The method to use is the `compile` method. In here we specify the `optimizer`, the `loss` function and the `metrics`.

In [None]:
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

For our introductory example we will focus on a 2 class - **binary** - classification. 
The model will train on Numpy arrays, so we will generate some dummy data that our network can "train on".

For this purpose we need to import numpy with the `import <module> as <alias>` syntax.

In [None]:
import numpy as np

data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

np.max(data)

Now that we have generated random data with two possible labels (remember we are working on binary classification only) we are ready to start the training of our network.

In [None]:
model.fit(data, labels, epochs=10, batch_size=32)

In the output above you can see the verbose output of the training process. 

We can observe:
- How long it took to finish each epoch
- What was the loss function value
- What was the accuracy 

### Exercise

- Experiment with the number of `epochs` and the `batch_size` and examine the training time, loss function and the accuracy of your created model.

In [None]:
# Create a new model just as explained above

## Properties of the Model

We can inspect the properties of the model by the use of a handy function `.summary()` that we can call from our model as you can see below.

In [None]:
model.summary()

## Creating a more complex Model

It is very easy to add layers to our model in Keras with the simple call of the `add()` method. In the example below we create a more "complex" (for lack of a better word) model that will be used for our binary classification problem.

In [None]:
model = Sequential()
model.add(Dense(32, input_dim=100))
model.add(Dense(1, activation='sigmoid'))
# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
model.fit(data, labels, epochs=50, batch_size=32)

## Visualising our Model

To better understand our own or any model design we can visualise the layers directly in Jupyter Notebooks. This requires additional libraries as you can see in the `import` below. These are not relevant to the contents of this module so we will be using them `as-is`.

In [None]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

SVG(model_to_dot(model).create(prog='dot', format='svg'))

## Try to make it better

**Exercise**: 

With the knowledge you have aquired try to complete the following tasks:
- Create a new model with multiple layers
- Display the summary of the model properties
- Try to adjust the number of epochs and batch size to improve the model accuracy
- As an extra task you can generate additional random dataset with more training data
- Visualise the model you have created

In [None]:
# (Optional) Generate more data
# Create new model
# Display the summary of the model
# Train
# Visualise the model

# AI 'Hello World' - Loading MNIST dataset
One of the most important (if not THE most important) things in Machine Learning is the dataset you are using to solve a particular problem. It has become a de facto "hello world" of AI and ML to use the MNIST dataset.

What is MNIST? It is a dataset of hand-written digits in grayscale with the resolution of 28 by 28 pixels. The dataset consists of 60 000 images and 10 000 images in the test set. The way you split your dataset is very important as discussed later in this course.

Luckily because MNIST is such a basic and overused dataset, Keras has a simple way to load the dataset.

In [None]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

As you can see the dataset is separated to Train and Test groups.
The `mnist.load_data()` returns 2 tuples:

- x_train, x_test: uint8 array of grayscale image data with shape (num_samples, 28, 28).
- y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape (num_samples,).

We have discussed shape before. We can look at the shape of the data we have loaded by simply calling `.shape` on a single element from the dataset. This is a good debugging technique when working on more complex solutions.

In [None]:
x_train[0].shape
x_train.shape

So how does the "real" data of a single digit look like? Simply print out to the screen a single element from the train dataset as follows.

In [None]:
x_train[0]

We can visualise any element from the dataset using `matplotlib`, yet another library that is very handy in python.

In [None]:
from matplotlib import pyplot as plt
import random as rng
%matplotlib inline

plt.imshow(x_train[rng.randint(0,9)], cmap='gray')

**Exercise**:
- Print out the shape of a certain element from the training set and also its array as you did before
- Visualise, using `matplotlib` the digit that you displayed

# Data encoding

## Categorical data -> Numerical data

In many cases, our data is not represented as numerical data, but rather a set of items (categories). 

Example of numerical data:
- `weight, price, score`

Example of categorical data:
- `pet ["dog", "cat", "parrot"]`
- `place ["London", "Ireland", "Guildford"]`

Because neural networks and many other algorithms cannot handle direct text/categorical input, we need a way to represent categorical data as numerical data. 

### Integer Encoding 
We assign each element a number based on `index` in `array`.

`place ["London", "Ireland", "France", "Italy", "Guildford"]`  
`London = 0, Ireland = 1, France = 2, Italy = 3, Guildford = 4`

This method can lead to **issues**, because we cannot apply normal mathematical operations.  
`4/2 = 2` but equation `Guildford/France = France` does not make any sense. 

### One Hot Encoding
In this method we create array of zeros with size of all possible options and set index of element we chose to 1. 

Example:  
`place ["London", "Ireland", "France", "Italy", "Guildford"]
London =    [1,0,0,0,0]
Ireland =   [0,1,0,0,0]
France =    [0,0,1,0,0]
Italy =     [0,0,0,1,0]
Guildford = [0,0,0,0,1]`

This way we can represent each "category" or "class" as a single 1 in an array of 0s.

Even though many frameworks include methods to do this for us, for the sake of clarification we will implement our own version of one hot encoding.

In [None]:
import numpy as np

def to_one_hot(num_of_options, hot_index):
    """Utils function for creating one hot array (array of zeros with only one value set to 1)
    
    later on, we can use to_categorical function from keras library (from keras.utils import to_categorical)
    instead of this function
    
    Arguments:
        num_of_options {int} -- Number of possible items in the array
        hot_index {int} -- Index of element that should be set to 1
    
    Raises:
        ValueError -- Hot index exceeds the length of overall array
    
    Returns:
        np.ndarray -- One_hot array
    """

    if hot_index >= num_of_options:
        raise ValueError("Hot index exceeds the length of overall array")
    
    array = np.zeros((num_of_options))
    array[hot_index] = 1
    return array


In [None]:
all_posibilities = ["cat", "dog", "elephant", "human"]

selected_element = "elephant" #try changing this to any element from all_posibilities
selected_index = all_posibilities.index(selected_element) #Find the index of our selected element
print(selected_index)

to_one_hot(len(all_posibilities),selected_index) #Use the method created above to create one_hot representation

# GPU or CPU?

When using Tensorflow or Keras, it is advisable to use a dedicated GPU (unless you have too much free time) and for this reason it is good practice before doing any kind of training to check that your local installation is detecting your GPU.

If you do not have a dedicated GPU you may want to consider creating solutions for less complex problems and adjusting your training parameters.

In [None]:
# Checking if we can access GPU
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

# Binary Classifier

We have already created a binary classifier in the earlier section of this demonstration. 
This time we will be re-creating it for the purposes of classifying a number from the MNIST dataset.

First, we need to make sure that our dataset has the correct shape. As discussed earlier the shape of our pictures is `28 by 28` pixels.

In [None]:
x_train = x_train.reshape((60000, 28*28))

In [None]:
y_train_1 = (y_train == 1)
y_test_1 = (y_test == 1)

We implement **Linear Classifier** with Stochastic Gradient Descent (SGD) learning: *the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).*

For more information please make sure to read the documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

In [None]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(max_iter=5, tol=-np.infty, random_state=42)
sgd_clf.fit(x_train, y_train_1)

Once again, visualising the prediction is a good way to get an idea what is actually happening *under the hood* of the trained model. 

With the knowledge you now have your next **Exercise** is to:
- import matplotlib library
- import the random library
- make sure that matplotlib works inline (hint: `%matplotlib inline`)
- select a random number from the whole dataset
- use the model to `predict` if the random number is classified correctly and display the results using `print` method
- display the number using matplotlib (hint: remember to `reshape()` the image

In [None]:
from matplotlib import pyplot as plt
import random as rng
%matplotlib inline

number = rng.randint(0,60000)
print(sgd_clf.predict([x_train[number]]))

number_to_display = x_train[number].reshape((28, 28))
plt.imshow(number_to_display, cmap='gray')

**Exercise**:
- Change the binary classifier and train it to detect a different digit

**Bonus Exercise**:
- Can you create a classifier for multiple digits?

# Cross-Validation

Cross validation using **K-fold cross-validation** splits at random the training set to `k` amount of distinct subsets. These are also known as **folds**.

In [None]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(sgd_clf, x_train, y_train_1, cv=10, scoring="accuracy")
rmse_scores = np.sqrt(-scores)

print("Scores: ", scores)
print("Mean: ", scores.mean())
print("Standard deviation: ", scores.std())

From the output above we can see that our model is trained quite well as the standard deviation is pretty low. If the standard deviation would be too high we may have an issue with overfitting. We will discover these in more detail next week.

## Confusion Matrix

Another way how to measure performance of our classifier is to use Confusion Matrix. Image is worth thousand words so an illustration will be most helpful.  
![Confusion Matrix](data/binary_confusion_matrix.png)

We count the number of times our classifier has misclassified and we compare them to actual class of a set of predictions

In [None]:
from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(sgd_clf, x_train, y_train_1, cv=10)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_train_1, y_train_pred)

**Erexcise**: Based on the output of the confusion matric and the image illustration provided
- Identify the number of TP, FN, FP and TN

# Precision vs Recall

Precision is measured witht the following equation:  
`precision = TP / (TP + FP)`

In combination with recall we can get our True Positive Rate (TPR). Recall is calculated as follows:  
`recall = TP / (TP + FN)`

In [None]:
from sklearn.metrics import precision_score, recall_score

precision_score(y_train_1, y_train_pred)

In [None]:
recall_score(y_train_1, y_train_pred)

We can combine precision and recall to a `F1` score.  
We calculate the F1 score with the following formula:  
![F1 Score](data/f1.png)

Of course Scikit-Learn can help is in calculating this value.

In [None]:
from sklearn.metrics import f1_score
f1_score(y_train_1, y_train_pred)

Always keep in mind the **precision/recall tradeoff** - Increasing precision reduces recall and vice versa.

This can be illustrated using the following plot.

In [None]:
y_scores = cross_val_predict(sgd_clf, x_train, y_train_1, cv=3,
                             method="decision_function")

In [None]:
from sklearn.metrics import precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_train_1, y_scores)

In [None]:
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
    plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
    plt.xlabel("Threshold", fontsize=12)
    plt.legend(loc="upper left", fontsize=12)
    plt.ylim([0, 1])

plt.figure(figsize=(8, 4))
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.xlim([-600000, 600000])
plt.show()

In [None]:
def plot_precision_vs_recall(precisions, recalls):
    plt.plot(recalls, precisions, "b-", linewidth=2)
    plt.xlabel("Recall", fontsize=12)
    plt.ylabel("Precision", fontsize=12)
    plt.axis([0, 1, 0, 1])

plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)
plt.show()

**Exercise**:  
- At roughly what precision rate does the recall fall most significantly?

# Receiver Operating Characteristic (ROC) curve

This is yet another tool used with binary classifiers. The difference is that unlike precision vs recall the ROC plots a curve of `True Positive Rate against the False Positive Rate (FPR)`.

**FPR**: The ratio of negative instances that are incorrectly classified as positive.

We first need to compute the TPR and FPR using the `roc_curve()` method

In [None]:
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_train_1, y_scores)

And now we can plot FPR against TPR using our old friend - MatPlotLib

In [None]:
plt.plot(fpr, tpr, linewidth=2, label=None)
plt.plot([0,1], [0,1], 'k--')
plt.axis([0,1,0,1])
plt.xlabel('FPR')
plt.ylabel('TPR')

To compare classifiers we can measure the **area under the curve (AUC)**.  
An ideal classifier will have a ROC AUC == 1, and a completely random one will have the value == 0.5  

We can compute ROC AUC using Scikit-Learn pretty easily, as follows:

In [None]:
from sklearn.metrics import roc_auc_score
roc_auc_score(y_train_1, y_scores)

# Multilabel Classification

If we want to predict more than two classes (binary classification) we need to train multiclass classifiers (multinominal classifiers).

Since Support Vector Machine or Linear classifiers are binary classifiers, we have to perform multiclass classification using multiple binary classifiers.

In [None]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 28*28)
sgd_clf.fit(x_train, y_train)
sgd_clf.predict([x_train[1601]])

**Exercise**:  
- Confirm that the prediction is correct by visualising the predicted number using matplotlib

## Multilabel Classification

If we need to recognise multiple labels in one picture, for example three different people we create a multilabel classification system.

A simple implementation follows

In [None]:
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit(x_train, y_multilabel)

We have created our `y_multilabel` array that contains two targets - Large numbers (greater than 7) and Odd numbers.

### KNeighborsClassifier

Next we create a `KNeighborsClassifier` instance and we train it on the multiple target array. The output should now produce two labels.

In [None]:
knn_clf.predict([x_train[1200]])

**Exercise**:
- Visualise the predicted image to verify the correctness of the prediction

We will take a look at the performance of our multilabel classifier using the known `F1` score.

In [None]:
y_train_knn_pred = cross_val_predict(knn_clf, x_train, y_multilabel, cv=3, n_jobs=-1)
f1_score(y_multilabel, y_train_knn_pred, average="macro")