### 01 - Fitting a Convulutional Neural Network
#### Working from example at https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

In this notebook I will fit a convulutional neural network on the CIFAR-10 images. 

In [1]:
%run __init__.py

Using TensorFlow backend.


X_train: (50000, 32, 32, 3), y_train: (50000,)
X_test: (10000, 32, 32, 3), y_test: (10000,)
Class labels: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


This will be my first try at building a CNN using keras. I will rely heavily on work done by others, but will seek to experiment with many different combinations with the hope that my experimentation will lead to knowledge about how these models work.

Specifically, for this notebook I will work on implementing the neural network given as an example here: https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

Currently, the labels vector y is a single vector with values ranging from zero to 9. From reading other models, I see that target should be a sparse matrix with one hot encoding instead. 

The following code uses built in Keras functionality to transform the y vector into a sparse matrix. 

In [5]:
y_train[0:5]

array([6, 9, 9, 4, 1])

In [6]:
y_test[0:5]

array([3, 8, 8, 0, 6])

In [7]:
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

In [8]:
y_train.shape

(50000, 10)

In [9]:
y_train[0:5]

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [10]:
y_test[0:5]

array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]])

Keras includes two kinds of models: Sequential and KerasFunctionalAPI. Sequential is the simpler implementation and adds model layers in a linear fashion. The FunctionalAPI is the more complex model and allows the user to create complex architectures to "build arbitrary graphs of layers." (Keras documentation, https://keras.io/)

For this notebook, I will build a simple Sequential model and experiment by adding layers in a somewhat unstructured manner, with the goal of blindly training better models as I go. 

In [11]:
model = Sequential()

Adding layers to the neural network is accomplished by using model.add. 

The first layer I will add will be a Convulutional Neural Network, 2D, for 2 dimensional image. 

filters=32 defines the dimensionality of the output space.

kernel_size=(3,3) specifies the width and height of the 2D convolution window. 

padding='same' ... I'm not sure what this does. 

input_shape=(32,32,3) specifies the shape the data will be input. I am working with 32x32 RBG images, so the input shape is (32,32,3). 

In [12]:
model.add(Conv2D(filters=32, kernel_size=(3,3), padding='same', input_shape=X_train.shape[1:]))

Next I will add an activation layer with a rectified linear unit, or 'ReLu'.

The 'ReLu' function is defined as:

f(x) = max(0,x)

In this case, the function will only activate if the value of x is positive. 

In [13]:
model.add(Activation('relu'))

So far I have a single layer neural network. Since I am implementing neural networks here for the first time, I will compile this "network" and use its performance results as a baseline implementation. Afterwords I will add more layers to the next work and refit it to study model performance with complexity.
The compiler requires an optimizer object. As in the example, I will use an RMSprop optimizer. However, I will leave all of its arguments default since I am using this as a baseline model.

Note: As my model was not running and I was receiving errors that were beyond my ability to debug, I opted to simply copy and paste a majority of the code below with the simple primary goal of at least getting a model to run/fit. This will not be my final model, but I look at it as simply a learning experience to implement someone elses code could as a first, blind stab at fitting a neural network. 

In [14]:
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [15]:
opt = keras.optimizers.RMSprop()
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [16]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f8509715eb8>

In the end, in order to get the model to function, I ended up copying a majority of the code from the example. While I tried to implement a more barebones solution, I wasn't able to get it to run and so opted to at least get a solution that will perform.

Even then, the model only fit to an accuracy score of about .10. In the first model, which is no longer part of this notebook, I left the numpy array encoded as values from 0 to 255. Changing the type to float32 and dividing by 255 to encode them as 0.0 to 1.0 values significantly boosted the model performance from .10 to .46 (.638 early on in the training epochs). Clearly, feature selection is paramount to a CNN's ability to successfully classify images. 


However, looking at the accuracy scores above, I notice that accuracy increases over the first 5 epochs of training, but surprisingly begins to decrease afterward. Could this be a vanishing gradient? 

My next objective will be to gain a better understand of the basics of neural networks. What are the different layers? How do they work? What, really, is back propagation? How do I begin to tune this model and boost performance? 