## Overfitting and validation homework

In this homework, we will cover how to implement the regularization techniques we learned in class:
- Early Stopping
- Weight Decay (L2 Regularization)
- Dropout
- Data Augmentation

In [1]:
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
from keras.datasets import mnist, cifar10
from keras import regularizers
import matplotlib.pyplot as plt

from numpy.random import seed
seed(1)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
def create_model():
    """ Creates a new convolutional neural network and returns it """
    seed(1)
    input_shape = (32, 32, 3)
    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    # initiate RMSprop optimizer
    opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    # Let's train the model using RMSprop
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

### Image preprocessing

This code block loads our dataset and does some initial preprocessing. We are using the `MNIST` dataset of hand written digits. You can read about MNIST here: http://yann.lecun.com/exdb/mnist/

In [None]:
(x_train, y_train), (x_val, y_val) = cifar10.load_data()
batch_size = 128
num_classes = 10
epochs = 150

# input image dimensions
img_rows, img_cols = 32, 32

num_train = 1000
num_val = 500  # We choose such a large validation set to illustrate the effects of overfitting without noise

x_train = x_train[:num_train]
y_train = y_train[:num_train]
x_val = x_val[:num_val]
y_val = y_val[:num_val]

x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
x_train /= 255
x_val /= 255

print('x_train shape:', x_train.shape)
print('x_val shape:', x_val.shape)
print(x_train.shape[0], 'train samples')
print(x_val.shape[0], 'validation samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)

# This function will print the training
def evaluate_acc(model):
    train_score = model.evaluate(x_train, y_train, verbose=0)
    print('Train loss:', train_score[0])
    print('Train accuracy:', train_score[1])
    val_score = model.evaluate(x_val, y_val, verbose=0)
    print('Validation loss:', val_score[0])
    print('Validation accuracy:', val_score[1])

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
 15081472/170498071 [=>............................] - ETA: 1:22

### No regularization
First, lets just try seeing how the model performs with no regularization. We are going to train for 150 epochs on 

This should take around 3-5 minutes to train. Go take a break and grab a snack!

In [None]:
model = create_model()
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_val, y_val),
          shuffle=True)
evaluate_acc(model)

### Early Stopping
With no regularization, you should get training accuracy around 0.783, and validation accuracy around 0.37. This is a pretty big gap, one which we cannot improve with just training. Let's first try to add early stopping! Add here some code that would run `model.fit` using early stopping.

Hints: 
Check out how EarlyStopping is implemented in Keras and how you would go about adding it to your model here: https://keras.io/callbacks/.
You may actually get worse performance initially by adding EarlyStopping. Try tuning the `patience` argument (read up on it in the keras docs). However, while it makes training faster, you likely won't see much improvement in validation accuracy (I get around 2% increase in validation accuracy). Note that while training accuracy may be lower, your validation accuracy will be higher. This is because the model is not super overfit to the training data yet, and has learned some understandable features.

In [None]:
from keras.callbacks import EarlyStopping
'''
Your code goes here.
'''

### Weight Decay

Early stopping has helped us not train longer than we need to, but in this case doesn't help much with validation accuracy. This means that the gap between validation and training accuracy is always large, and there are never any "peaky" moments where the distance is decreased. Let's next try to add L2 Weight Decay to our model.

Hint: As example code, you can use the following code to declare a Dense layer with 64 hidden neurons and l2 regularization with weight decay strength 0.01. You can add this to *all* of the *learnable* layers in the model in `create_model`. Check out the documentation on regularizers here: https://keras.io/regularizers/

```
from keras import regularizers

Dense(64, input_dim=64, kernel_regularizer=regularizers.l2(0.01))
```

Add it now to the model in `create_model`, and retrain it here to see your new results. Note that you can tweak your weight decay strength as a hyperparameter. However, I have personally done this and found best results with `0.01`. Can you see if you can outperform this? I personally see about a 1% improvement, and a much smaller gap between training accuracy and validation accuracy. This is due to the fact that while we are less overfit to our training data, we still learning valuable features of the images.

In [None]:
def create_model_weight_decay():
    """ Creates a new convolutional neural network with weight decay and returns it """
    seed(1)
    input_shape = (32, 32, 3)
    ### Beginning of your code
    ''' Add a deep CNN model using weight decay here '''
    ### End of your code
    
    # initiate RMSprop optimizer
    opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    # Let's train the model using RMSprop
    model.compile(loss='categorical_crossentropy',
                  optimizer=opt,
                  metrics=['accuracy'])
    return model

Now run your model here and see how well you do!

In [None]:
early_stopping = EarlyStopping(patience=)#TODO: Add your favorite patience value here)
model = create_model_weight_decay()
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_val, y_val),
          shuffle=True,
          callbacks=[early_stopping])
evaluate_acc(model)

### Dropout
We next add dropout layers to our model.

In [None]:
def create_model_dropout():
    """ Creates a new convolutional neural network with weight decay and returns it """
    seed(1)
    input_shape = (32, 32, 3)
    
    ### Beginning of your code
    ''' Add a deep CNN model using weight decay here '''
    ### End of your code

    # initiate RMSprop optimizer
    opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    # Let's train the model using RMSprop
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

Next, train your model!

In [None]:
early_stopping = EarlyStopping(patience=)#TODO: Add your favorite patience value here)
model = create_model_dropout()
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_val, y_val),
          shuffle=True,
          callbacks=[early_stopping])
evaluate_acc(model)

I get around 41% validation accuracy using dropout.