### Another simple notebook example.

Here we are using the CIFAR dataset:
    
https://www.cs.toronto.edu/~kriz/cifar.html
    
*"The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.*

*The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class."*

We will explore utilizing both a more conventional CNN, as well as a residual network based architecture.

In [None]:
%matplotlib inline

#As usual we start by importing the necessary modules
from keras.models import Sequential, Model
from keras.datasets import cifar10
from keras.layers import Dense, Activation, Flatten, Input, Add, Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras.callbacks import EarlyStopping
from keras.utils.vis_utils import model_to_dot, plot_model
from keras.optimizers import Adam
import numpy as np
from IPython.display import SVG
from matplotlib import pyplot as plt


### Load Data

Note that we have subsampled the data. Try testing different sampling strategies... or train on the entire dataset.

Ask yourself, how does this affect performance?

In [None]:
batch_size = 256
nb_classes = 10
nb_epoch = 20
nb_filter = 10

img_rows, img_cols = 32, 32
img_channels = 3

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

#subsample to speed up training (remove following four lines to try training on everything)
X_train = X_train[0::10,:,:,:] #every 10 examples for train
y_train = y_train[0::10]
X_test = X_test[0::5,:,:,:] #every 5 example for test (i.e. validation)
y_test = y_test[0::5]

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

### Define basic CNN architecture

In [None]:
model = Sequential()
model.add(Conv2D(nb_filter, (3, 3), input_shape=(img_rows, img_cols, img_channels),
                 padding="same", activation="relu"))
model.add(Conv2D(nb_filter, (3, 3), padding="same", activation="relu"))
model.add(Conv2D(nb_filter, (3, 3), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(nb_filter, (3, 3), padding="same", activation="relu"))
model.add(Conv2D(nb_filter, (3, 3), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dense(nb_classes, activation="softmax"))

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr = 1.E-3),
              metrics=['accuracy'])

### Define resnet type architecture

Note the addition of the line:
    
```python
x = Add()([x, y])
```

In [None]:
in_img = Input(shape=(img_rows, img_cols, img_channels))
x = Conv2D(nb_filter, (3, 3), padding="same", activation="relu")(in_img)
for _ in range(2):
    y = Conv2D(nb_filter, (3, 3), padding="same", activation="relu")(x)
    y = Conv2D(nb_filter, (3, 3), padding="same")(y)
    x = Add()([x, y])
    x = Activation("relu")(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)

x = Flatten()(x)
x = Dense(512, activation="relu")(x)
x = Dense(nb_classes, activation="softmax")(x)


residual = Model(inputs=in_img, outputs=x)

residual.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr = 1.E-3),
              metrics=['accuracy'])

### Train the basic CNN model

In [None]:
cnn = model.fit(X_train, Y_train, batch_size=batch_size,
                epochs=nb_epoch, validation_data=(X_test, Y_test))

### Train the resnet type model

In [None]:

resi = residual.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, validation_data=(X_test, Y_test))

### Plot the learning curves for both models

In [None]:

x = range(nb_epoch)
plt.plot(x, cnn.history['accuracy'], label="cnn train")
plt.plot(x, cnn.history['val_accuracy'], label="cnn val")
plt.plot(x, resi.history['accuracy'], label="resi train")
plt.plot(x, resi.history['val_accuracy'], label="resi val")
plt.title("accuracy")
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

plt.plot(x, cnn.history['loss'], label="cnn train")
plt.plot(x, cnn.history['val_loss'], label="cnn val")
plt.plot(x, resi.history['loss'], label="resi train")
plt.plot(x, resi.history['val_loss'], label="resi val")
plt.title("loss")
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

Now try visualizing the test set images and the classifications that the network made (similar to what we did in the MNist example).

Other things to try:
    - Train the networks longer
    - Increase the network learning rate
    - Increase the number of filters in each convolution layer
    - Add regularization (e.g., dropout)