# Introduction to deep-learning \#1#

# 1. Initialization

In this class we briefly introduce the fundamentals of modern Deep Learning techniques by means of the [**Keras Library**](https://keras.io/).

Deep Learning methods require **many** matrix multiplications, which can be greatly accelerated using GPUs. There are four major Deep Learning libraries out there, that are optimized to implement these techniques while leveraging GPU computing capabilities:

* **TensorFlow**: https://www.tensorflow.org/
* **Theano**: http://deeplearning.net/software/theano/
* **Caffe**: http://caffe.berkeleyvision.org/
* **Torch**: http://torch.ch/

**Keras** is a thin wrapper that works on top of either Theano or TensorFlow. As TF does not work easily on Windows systems, we will be using Theano as the backend for Keras in these two sessions.

Keras was developed and is maintained by François Chollet from Google Inc. It is completely written in Python, leaving to Theano/TF the GPU computing part (that is written in CUDA - a C++ extension for NVIDA grpahic cards). With Keras, we do not need to know how to program the GPU to develop and optimize new Deep Learning models. The focus is on rapid **prototyping**, easy **extendibility**, **modularity** and **minimalism**.

Let us quickly verify that we are running the Theano backend. As these computer do not have GPUs, we will be running our examples on the CPU (**much slower**). You can gain cheap access to GPU computing capabilities on the Amazon Web Services (AWS).

In [None]:
import theano, keras
print theano.config.device 
print keras.backend._BACKEND

# 2. Multilayer Perceptrons

A standard neural network is composed of a bunch of **neurons**, **weights** and **activation functions**. They are all combined to build a complex non-linear function that can be fit to our **input** data by an efficient training method called **backpropagation**.

<img src="im_1.png", width=400>

# 3. Implementing a Neural Network classifier with Keras

Let us load the MNIST dataset, a standard toy dataset for image classification. flatten the images, convert the class labels, and scale the data.

In [None]:
%pylab inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Some visual inspection
plt.imshow(X_train[0].reshape(28,28), cmap='Greys_r')
plt.axis('off')
plt.title('This number label: ' + str(y_train[0]))
plt.show()

To feed a neural net, we need to flatten the images, convert the class labels, and scale the data.

In [None]:
from keras.utils import np_utils

X_train = X_train.reshape(60000, 28**2).astype('float32') / 255
X_test = X_test.reshape(10000, 28**2).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In Keras, the core data structure is called a **model**, and it dictates how layers in your Deep Neural Network are organized. The most common model is a `Sequential`, standing for a sequential (linear) stack of layers. 

To give content to a model, you create a **`Sequential`**, and add layers to it that determine the computations that will be done. When finished, you compile your model (including specifying which loss function and optimizer you want to employ), and voilà: you have a sk-learn-style object, that can be fit to your data with a one-liner **`model.fit()`**. Once the model is fit to your data, you can make predictions on new unseen data, using **`model.predict()`**, and assess the quality of your model with **`model.evaluate()`**.

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, input_shape=(28 * 28,) , init = 'uniform', activation = 'sigmoid'))
model.add(Dense(10, init = 'uniform', activation = 'sigmoid'))

In [None]:
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

In [None]:
history = model.fit(X_train, Y_train, batch_size=64, nb_epoch=10,
          verbose=1, validation_split=0.1)

In [None]:
print("Test classification rate %0.05f " % model.evaluate(X_test, Y_test)[1])

### Some result visualizations

In [None]:
plt.plot(history.history['val_acc'])
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.legend(['validation accuracy'], loc = 'upper left')
plt.show()

Predict classes on the test set:

In [None]:
import pandas as pd

y_hat = model.predict_classes(X_test)
pd.crosstab(y_hat, y_test)

In [None]:
test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]

plt.figure(figsize=(15, 15))
for ind, val in enumerate(test_wrong[:20]):
    plt.subplot(10, 10, ind + 1)
    im = 1 - val[0].reshape((28,28))
    plt.axis("off")
    plt.text(0, 0, val[2], fontsize=14, color='green') # correct 
    plt.text(8, 0, val[1], fontsize=14, color='red')  # predicted
    plt.imshow(im, cmap='gray')
plt.show()

## 4. Keras and sklearn

DNNs come with a number of hyperparameters: 
* Which optimizer do I use? Learning rate? Batch size?
* How many neurons in each layer?
* How many layers?
* Which kind of regularization? And their associated parameters
* Which initialization

Most of your time will probably be invested in tweaking these :(

Fortunately, we can import some functionalities of sklearn to tune hyperparameters. To that end, we will be employing the `KerasClassifier`:

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier

This that takes as argument a function that creates and compiles a model with a fixed hyperparameter configuration. We thus need to define that model-generating function.

In [None]:
def create_new_model():
    model = Sequential()
    model.add(Dense(64, input_shape=(28 * 28,) , init = 'uniform', activation = 'sigmoid'))
    model.add(Dense(10, init = 'uniform', activation = 'sigmoid'))
    
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

Now we can feed the `create_new_model` function to `KerasClassifier`:

In [None]:
model = KerasClassifier(build_fn = create_new_model, nb_epoch = 10, batch_size=32, verbose = 1)

Note the parameters `nb_epoch` and `batch_size`, that are passed. They will be passed to the `fit()` function inside the `KerasClassifier` class.

With this, we may for instance evaluate a model using k-fold cross-validation from `scikit learn`.

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train[:10000]
y_train = y_train[:10000]


X_train = X_train.reshape(10000, 28**2).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10)


kfold = StratifiedKFold(y_train, n_folds = 2, shuffle = True, random_state=1)

results = cross_val_score(model, X_train, Y_train, cv = kfold)

print '\n', results.mean(), ' +/- ', results.std()

We can also perform hyperparameter tuning with grid search from `scikit learn`

In [None]:
def create_new_model_2(init = 'uniform'):
    model = Sequential()
    model.add(Dense(64, input_shape=(28 * 28,) , init = init, activation = 'sigmoid'))
    model.add(Dense(10, init = init, activation = 'sigmoid'))
    
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

Now we can feed the `create_new_model_2` function to `KerasClassifier`:

In [None]:
model = KerasClassifier(build_fn = create_new_model_2, nb_epoch = 8, batch_size=32, verbose = 1)

In [None]:
from sklearn.grid_search import GridSearchCV

init = ['uniform', 'normal']
#batches = np.array([32, 48])

# hyper_parameters_grid = dict(optimizer=optimizers, nb_epochs = epochs, batch_size = batches, init = init)

hyper_parameters_grid = dict(init = init) #Turn off cv for speed

grid = GridSearchCV(estimator = model, param_grid = hyper_parameters_grid, cv=2)

In [None]:
grid_result = grid.fit(X_train, Y_train)

Now we can inspect the results:

In [None]:
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

# 4.1 Other things: Learning rate, regularization: Dropout, L², etc.

Need to skip this, but we comment on it a little bit

# 5. Convolutional Neural Networks in Keras

Recall this step in dataset loading above: 

`X_train = X_train.reshape(60000, 28**2).astype('float32') / 255`

Things to reflect about:
* This loses all the two-dimensional image structure!
* Images are not usually $32\times32$.
* Many parameters connecting far away areas in an image. 

The solution to this is to build **Convolutional Neural Networks**, in which each group of neurons only look to a localized region on the image, and are not connected to the rest.

There are three main kind of layers that compose a typical Convolutional Neural Network:
1. Convolutional layers
2. Pooling Layers
3. Fully-connected Layers

### Convolutional Layers

<img src="im_2.png", width=400>

### Pooling Layers

<img src="im_3.png", width=500>

### Fully-connected Layers

<img src="im_1.png", width=400>

And, the same as above, they are stacked in a sequential manner:

<img src="im_4.png", width=1000>

These and many more are implemented in Keras and ready to use. Read the docs:

 [**Keras Layers**](https://keras.io/layers/core/).

### Clasifying MNIST with a CNN

Not really interesting stuff, setting things up again

In [None]:
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras import backend as K

In [None]:
batch_size = 64
nb_classes = 10

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# Reduce dataset size for convenience here
X_train = X_train[:10000]
y_train = y_train[:10000]
X_test = X_test[:2000]
y_test = y_test[:2000]
       
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

Now, let us define a convolutional net model:

In [None]:
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

In [None]:
nb_epoch = 4

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=1)

print('Test accuracy:', score[1])

## Dataset Augmentation

Deep CNNs have tons of weights to be tuned; as a consequence these models are really data-hungry. 

In order to avoid overfitting your data, a popular strategy consists of ''generating'' new data from what you have. This is really easy in Keras.

Keras provides the `ImageDataGenerator` class that defines the configuration for image data preparation and augmentation. This includes capabilities such as:
* Sample-wise standardization.
* Feature-wise standardization.
* ZCA whitening.
* Random rotation, shifts, shear and flips.
* Dimension reordering.
* Save augmented images to disk.

An augmented image generator can be created as follows:

`from keras.preprocessing.image import ImageDataGenerator`

`datagen = ImageDataGenerator()`

Rather than performing the operations on your entire image dataset in memory, the API is designed to be iterated by the deep learning model fitting process, creating augmented image data on the fly. This reduces memory overhead, but adds some additional time cost during model training.

After you have created and configured your ImageDataGenerator, you must fit it on your data. This will calculate any statistics required to actually perform the transforms to your image data. You can do this by calling the fit() function on the data generator and pass it to your training dataset.

`datagen.fit(train)`

The data generator itself is in fact an iterator, returning batches of image samples when requested. We can configure the batch size and prepare the data generator and get batches of images by calling the flow() function.

`X_batch, y_batch = datagen.flow(train, train, batch_size=32)`

Finally we can make use of the data generator. Instead of calling the fit() function on our model, we must call the fit_generator() function and pass in the data generator and the desired length of an epoch as well as the total number of epochs on which to train.

`fit_generator(datagen, samples_per_epoch=len(train), nb_epoch=10)`

[**More info**](https://keras.io/preprocessing/image/).

In [None]:
# Random Rotations
from keras.preprocessing.image import ImageDataGenerator

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(rotation_range=45)
# fit parameters from data
datagen.fit(X_train)

In [None]:
f, axarr = plt.subplots(1,5,figsize=(15, 15))
im = np.zeros((28,28))
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=10):
    for ind in range(5):
        axarr[ind].imshow(X_batch[ind].reshape(28, 28), cmap=pyplot.get_cmap('gray'))    
        axarr[ind].axis("off")

    plt.show()
    break

A nice example of taking advantage of dataset augmentation to train a dog-cat classifier with few examples can be found [**here**](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).