In [None]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
# import 'Sequential' is a linear stack of neural network layers. Will be used to build the feed-forward CNN
from keras.models import Sequential 
# import the "core" layers from Keras (these are the most common layers)
from keras.layers import Dense, Dropout, Activation, Flatten
# import the convolutional layers that will help us efficiently train on image data
from keras.layers import Conv2D, MaxPooling2D
# these utilities will help us transform our data
from keras.utils import np_utils
%load_ext autoreload
%autoreload 2

## [Keras tutorial](https://elitedatascience.com/keras-tutorial-deep-learning-in-python) on mnist dataset
**N.B. I used tensorflow not theano as backend.** The only difference is in the shape that is (28, 28, 1) in tensorflow and (1, 28, 28) in theano.

#### 1. load data

In [None]:
np.random.seed(123)  # for reproducibility

In [None]:
from keras.datasets import mnist

# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# images are 28x28 pixels
X_train.shape, y_train.shape, X_test.shape, y_test.shape

In [None]:
# just show an image
plt.imshow(X_train[0], cmap='Greys_r')

#### 2. Preprocess the input
- You must explicitly declare a dimension for the depth. For example, a full-color image with all 3 RGB channels will have a depth of 3, while our images have depth of 1. We want to transform our dataset from having shape (n, width, height) to (n, width, height, depth).

In [None]:
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train.shape, X_test.shape

- Convert our data type to float32 and normalize our data values to the range [0, 1].

In [None]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

#### 3. Preprocess class labels
We want 10 different classes, one for each digit.

In [None]:
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
Y_train.shape, Y_test.shape

In [None]:
y_train[0], Y_train[0]

#### 4. Define model architecture
Let's start by declaring a sequential model format. Each layer has an input shape and an output shape. The input shape is automatically set as the output shape from the previous layer but we need to declare the input shape of the first layer and the output shape of the last layer.

In [None]:
model = Sequential()
# 32 convolution filters
# 3 rows in convolution kernel
# 3 columns in convolution kernel
# (1,28,28) is the shape of one input
# strides=(1, 1) by default (step size?)
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))

# MaxPooling2D is a way to reduce the number of parameters in our model by sliding 
# a 2x2 pooling filter across the previous layer and taking the max of the 4 values 
# in the 2x2 filter.
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # a method for regularizing our model in order to prevent overfitting.

# a convolutional neural network always ends with a fully connected layer followe by the ouput
# layer
# first flatten the weights of the convolution
model.add(Flatten())
# 128 = output size of the dense layer
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
# 10 = output size of the output dense layer (we have 10 classes!)
model.add(Dense(10, activation='softmax'))

model.output_shape

**More about 32, (3, 3)**: 32 is the number of filters that scan the image on a window 3×3 pixels. Why 32 filters? We stack 32 of these filters to allow more complexity in the model, i.e. tey will learn different patterns during training.
 
Read more about [dropout](https://www.quora.com/How-does-the-dropout-method-work-in-deep-learning-And-why-is-it-claimed-to-be-an-effective-trick-to-improve-your-network). Dropout(25) is a layer that drops 25% of its inputs.

#### 4. Compile the model
When we compile the model, we declare the loss function and the optimizer (SGD, Adam, etc.).

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Read more about [loss functions](https://keras.io/losses/) and [optimizers](https://keras.io/optimizers/) options in keras.

#### 5. Fit the model on the training data

In [None]:
# we have to declare the batch size and number of epochs to train for
model.fit(X_train, Y_train, 
          batch_size=32, epochs=2, verbose=1)

You can also use a variety of [callbacks](https://keras.io/callbacks/) to set early-stopping rules, save model weights along the way, or log the history of each training epoch.

#### 6. Evaluate the model on the test data

In [None]:
score = model.evaluate(X_test, Y_test, verbose=0)
score

In [None]:
prediction = model.predict(X_test[:1])
print("Prediction: ", np.argmax(prediction))
plt.imshow(X_test[0].reshape((28, 28)), cmap='Greys_r')

Look at some [example models in keras](https://github.com/fchollet/keras/tree/master/examples).