# Attacking MNIST
## Setup

On your local machine, make sure you have `tensorflow` and `keras`. If you do not, use `pip` to install them:

```
pip install tensorflow
pip install keras
```

On **azure**, `tensorflow` is already available but you do need to run a cell with the following line in it:

```
!pip install keras
```

In [0]:
# On azure, run this cell
!pip install keras

### Importing the building blocks from Keras

Keras is an extremely convenient API for tensorflow with very good [documentation](https://keras.io/applications/). It's **the** tool you should turn to if you want to quickly test a model. 

Like `sklearn` it comes with lots of useful functions, let's import all of the ones we will need.

In [0]:
from keras.datasets    import mnist             # api to download the mnist dataset
from keras.models      import Sequential        # class of neural networks with one layer after the other
from keras.layers.core import Dense, Activation # type of layers
from keras.optimizers  import SGD               # Optimisers, here the stochastic gradient descent 
from keras.utils       import np_utils          # extra tools

### Loading the MNIST dataset

Using the `mnist` functionality and its `load_data` function, load the MNIST data and store it into `(images_train, labels_train), (images_test, labels_test)`.

In [0]:
# your code here...


Have a look at one point! pick an image ID and use `plt.imshow` with `cmap="gray"` to visualise the corresponding image. Show the attached label as well. Remember to load `matplotlib`

In [0]:
# your code here...


That seems pretty reasonable. 

### Reshaping the dataset

We now need to reshape the dataset as Keras' architecture expects to get flattened vectors not square matrices as input. Also it expects `float32` and not `uint8` (which is the type currently used for memory efficiency and since a gray pixel does not need more than a uint8 to be stored). The following cell does all of this, it's not an exercise as it's not particularly interesting.

So we have 60,000 training samples, 10,000 test samples and the dimension of the samples (instances) are 28x28 arrays. We need to reshape these instances as vectors (of 784=28x28 components).

In [0]:
images_train = images_train.reshape(60000, 784) 
images_test  = images_test.reshape(10000, 784)

images_train = images_train.astype('float32') 
images_test  = images_test.astype('float32')

images_train /= 255 # normalising on (0,1) 
images_test  /= 255 # normalising on (0,1)

The labels are stored as integer values from 0 to 9. You need to tell Keras that these form the output categories via the function `to_categorical` from `np_utils` (check the documentation `?np_utils.to_categorical` if needed). 

Call that `labels_train` and `labels_test`.

In [0]:
# your code here...


### Declaring the MLP architecture

As you now know, a Multilayer Perceptron is constituted of a sequence of layers of artificial neurons. Each layer receives a vector of inputs and converts these into some output. The interconnection patter is "dense" meaning each layer is fully connected to the next one. Note that the first hidden layer needs to specify the size of the input which amounts to implicitly having an input layer. 

1. declare an instance of `Sequential` call it `model`
2. add a `Dense` layer with 500 neurons, the input is a vector of 784 components (see `?model.add` and `?Dense`)
3. add an `Activation` layer with `relu` units (see `?Activation`)
4. add another `Dense` layer with 300 neurons 
5. add another `Activation` layer with `relu` units
6. add a final `Dense` layer with 10 neurons (the 10 classes)
7. add a final `Activation` layer with `softmax` units

If you're lost, have a look at [the keras documentation](https://keras.io/getting-started/sequential-model-guide/). 


In [0]:
# add your code here...


### Declaring the optimiser and fitting the model

Here you will define a standard optimiser using SGD and the Adam stepping scheme. Have a look at `?model.compile` and specify:

* `loss='categorical_crossentropy'`
* `optimizer='adam'`
* `metrics=["accuracy"]`

In [0]:
# your code here...


At this stage, you are ready to launch the learning (fit the model). The `model.fit` function takes all the necessary arguments and trains the model. We describe below what these arguments are:

- the training set (images and labels)
- the batch size, which you can set to `100` (number of instances per noisy gradient)
- the number of **epochs** which you can set to `10` this is a measure of computational effort in terms of how many "full gradients" the computational effort amounts to (knowing that each full gradient does a complete pass over the data)
- whether or not we want to show output during the learning (set `verbose=2`)
- the test set (points and labels)

For more information, `?model.fit` as usual!