
## MNIST
![](https://i.imgur.com/iyArf9S.png)
MNIST is the hello world of machine learning.

Here are two examples of MNIST. One using Keras and one using Tensorflow.
Run the models and see what happens if you change hyperparameters.

# This is a Jupyter notebook. 

It's an interactive way to write Python code. Most of machine learning is done using Python because Python is great for data analysis and the biggest machine learning libraries (Tensorflow and Keras) are made in Python.

Here you can learn how to get started with Jupyter Notebooks.
http://jupyter.org/


In [None]:
#Example for running MNIST in Keras
from __future__ import print_function
import keras
# Imports the MNIST data
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
# The keras backend is either Tensorflow, CNTK or Theano. The default (which we are using) is Tensorflow.
# Keras is now a part of Google and officially a part of Tensorflow.
# The Keras API is also becoming a standard way of writing deep learning models, for instance in Tensorflow.js

![](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)
## Keras: The Python Deep Learning library
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.


In [8]:
# Keras Cheat Sheet
from IPython.display import IFrame
IFrame("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf", width='100%', height=800)

## Batch size
The number of training examples used for updating the weights. The higher the batch size, the more memory space you'll need.
## Epochs
One forward pass and one backward pass of all the training examples

## Example
Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

![](https://i.imgur.com/JxXEnRv.jpg)

In [2]:

# the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
batch_size = 128

# one forward pass and one backward pass of all the training examples
epochs = 12

# Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.


![](https://i.imgur.com/58UHDeA.png)
## Classification
The MNIST is a classification problem with 10 classes.

## Regression
Machine learning models can also be used for regression. In the case of regression, the output is a continous variable.

In [None]:
# number of classes in our output (10, the digits, 0, 1 ,2, 3, 4, 5, 6, 7, 8, 9)
num_classes = 10

# The 784 dimensional vector made from the image
# input image dimensions
img_rows, img_cols = 28, 28

# Train and test data
In the case of MNIST the data is available through the mnist keras package.
With real datasets we need to manually divide our dataset into train and test set.

## Train
The training data is used to train our model. This is done by updating the model weights using backpropagation.

## Validation
The validation set is used to tune the hyperparameters during training. 

## Test
The test data is used to test our model once training in complete.

### Overfitting
A common problem in machine learning is overfitting. This is caused by the model learning the entire training dataset instead of learning the transformation between x and y.

### Dropout
One technique used to help reduce overfitting is dropout. This removes some neurons from trainings at each epoch which reduces the likelyhood of overfitting.

### Cross-validation
Techniques such as cross-validation can be used to improve the accuracy of our validation

In [None]:
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()


# Working with tensors
The input to a neural network is an array of n-dimensions, also called an tensor. One common problem in machine learning is making sure that the input to the neural network is often in the right format to be fed into the network.

This is usually a delicate process and it's good to know that many problems while working with neural networks are related to not sending in a model in the right dimensions.

This because trickier when working with more complex models such as images and time-series data.

In [5]:


# making sure the tensors are in the correct order
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# Normalisation of data
Normalisation of data is done to remove the risk caused by extreme values.
Here we make sure all values are floats (so we can compute them properly)
Then we divide each value of the color (0, black, 255, white) to get a value between 0-1 to represent each color.

In [6]:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


# Building a model
Keras has a sequential model for building neural networks that used a sandwhich like format.
You start by defining your input as sequential and then define your input layer.

## Activation function
![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Rectifier_and_softplus_functions.svg/495px-Rectifier_and_softplus_functions.svg.png)

## RELU activation function
In the context of artificial neural networks, the rectifier is an activation function defined as the positive part of its argument:
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/bb2c32931fad595832c8e66f2f73760ebcbc0096)
where x is the input to a neuron. 

RELU has strong biological and mathematical justifications.

The rectifier is, as of 2018, the most popular activation function for deep neural networks

Rectified linear units are used in many areas of deep learning including computer vision and speech recognition using deep neural nets.

# Conv2d
![](./files/convolution.gif)
Conv2d is a convolutional layer used for image analysis. These are inspired by how the brain handles visual stimuli.
# Convolutional neural networks
![](https://cdnpythonmachinelearning.azureedge.net/wp-content/uploads/2017/09/lenet-5-825x285.png?x31195)
Convolutional architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

## Neurons for regions of the input space
Every entry in the 3D output volume can also be interpreted as an output of a neuron that looks at only a small region in the input and shares parameters with all neurons to the left and right spatially (since these numbers all result from applying the same filter).

## Maxpooling
The maxpooling layer is a form of convolutional network that outputs the maximum value of the analysed area.

## Flatten
The flatten layer converts the maxpooled tensor into a format that can be used wiht a fully connected layer.

## Dense
A dense layer is a classic fully connected "vanilla" neural network that connects all the input neurons to output neurons in the next layer.

## Dropout
Dropout removes neurons from each layer to reduce the posibility of overfitting.

## Output layer
The output layer gives us a classification of (0-9). 

### Softmax activation
The softmax activation allows us to interpret the output as a probability where the sum of all probabilities is 1.


In [None]:
# Our neural network
# Neural networks are usually quite small in terms of code size 
# with networks being around 10 lines of code in keras. 

model = Sequential()
# relu activation function
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

## Compiling and running our model
Tensorflow is a static computational graph.
First you declare your model.
Then you compile your model (similar to low language code)
Then you run your model (fit your model)

## Compiling your model

### Cross entropy
Cross entropy can be used to define the loss function in machine learning and optimization. The true probability is the true label, and the given distribution is the predicted value of the current model.

### Optimizer
The optimizer is the way to calculate gradient descent, how our model moves towards the optimal solution based on error graidents.

Here you can learn more about optimizers.
https://keras.io/optimizers/

In [9]:

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12

KeyboardInterrupt: 

## Hyperparameter tuning
Hyperparameter tuning is the process of changing our hyperparameters to improve our model. This is a complex topic and an unsolved problem in machine learning. 

## Change your hyperparameters to see if you can improve your models
Try out changing hyperparameters to see if you can improve the accuracy of your models.

## List of hyperparameters

### Numbers of layers
### Numbers of neurons in each layer
### Activation function for each layer
