### 01 - Fitting a Convulutional Neural Network
#### Working from example at https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

In this notebook I will fit a convulutional neural network on the CIFAR-10 images. 

In [4]:
!pip install keras cython h5py

Collecting keras
  Downloading Keras-2.1.2-py2.py3-none-any.whl (304kB)
[K    100% |################################| 307kB 2.5MB/s ta 0:00:01
[?25hCollecting cython
  Downloading Cython-0.27.3-cp35-cp35m-manylinux1_x86_64.whl (3.0MB)
[K    100% |################################| 3.0MB 515kB/s eta 0:00:01
Collecting pyyaml (from keras)
  Downloading PyYAML-3.12.tar.gz (253kB)
[K    100% |################################| 256kB 4.8MB/s eta 0:00:01
Building wheels for collected packages: pyyaml
  Running setup.py bdist_wheel for pyyaml ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
Successfully built pyyaml
Installing collected packages: pyyaml, keras, cython
Successfully installed cython-0.27.3 keras-2.1.2 pyyaml-3.12


In [5]:
%run __initremote__.py

Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [7]:
from keras.callbacks import TensorBoard, EarlyStopping
es = EarlyStopping(patience=3)
tb = TensorBoard(log_dir='./logs/')

In [12]:
X_train = x_train
X_test = x_test

In [1]:
%run __init__.py

Using TensorFlow backend.


FileNotFoundError: [Errno 2] No such file or directory: 'cifar-10-batches-py/data_batch_1'

This will be my first try at building a CNN using keras. I will rely heavily on work done by others, but will seek to experiment with many different combinations with the hope that my experimentation will lead to knowledge about how these models work.

Specifically, for this notebook I will work on implementing the neural network given as an example here: https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

Currently, the labels vector y is a single vector with values ranging from zero to 9. From reading other models, I see that target should be a one hot encoding instead. 

The following code uses built in Keras functionality to transform the y vector into a sparse matrix. 

In [6]:
y_train[0:5]

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [7]:
y_test[0:5]

array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]])

In [4]:
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

In [5]:
y_train.shape

(50000, 10)

In [8]:
y_train[0:5]

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [9]:
y_test[0:5]

array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]])

Keras includes two kinds of models: Sequential and KerasFunctionalAPI. Sequential is the simpler implementation and adds model layers in a linear fashion. The FunctionalAPI is the more complex model and allows the user to create complex architectures to "build arbitrary graphs of layers." (Keras documentation, https://keras.io/)

For this notebook, I will build a simple Sequential model and experiment by adding layers in a somewhat unstructured manner, with the goal of blindly training better models as I go. 

In [10]:
model = Sequential()

Adding layers to the neural network is accomplished by using model.add. 

The first layer I will add will be a Convulutional Neural Network, 2D, for 2 dimensional image. 

filters=32 defines the dimensionality of the output space.

kernel_size=(3,3) specifies the width and height of the 2D convolution window. 

padding='same' ... I'm not sure what this does. 

input_shape=(32,32,3) specifies the shape the data will be input. I am working with 32x32 RBG images, so the input shape is (32,32,3). 

In [13]:
model.add(Conv2D(filters=32, kernel_size=(3,3), padding='same', input_shape=X_train.shape[1:]))

Next I will add an activation layer with a rectified linear unit, or 'ReLu'.

The 'ReLu' function is defined as:

f(x) = max(0,x)

In this case, the function will only activate if the value of x is positive. 

In [14]:
model.add(Activation('relu'))

So far I have a single layer neural network. Since I am implementing neural networks here for the first time, I will compile this "network" and use its performance results as a baseline implementation. Afterwords I will add more layers to the next work and refit it to study model performance with complexity.
The compiler requires an optimizer object. As in the example, I will use an RMSprop optimizer. However, I will leave all of its arguments default since I am using this as a baseline model.

Note: As my model was not running and I was receiving errors that were beyond my ability to debug, I opted to simply copy and paste a majority of the code below with the simple primary goal of at least getting a model to run/fit. This will not be my final model, but I look at it as simply a learning experience to implement someone elses code could as a first, blind stab at fitting a neural network. 

In [15]:
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))


model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [16]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [17]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 30, 30, 32)        9248      
_________________________________________________________________
activation_2 (Activation)    (None, 30, 30, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 15, 15, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 64)        18496     
__________

In [19]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True,
              callbacks=[tb])

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f0275774710>

In the end, in order to get the model to function, I ended up copying a majority of the code from the example. While I tried to implement a more barebones solution, I wasn't able to get it to run and so opted to at least get a solution that will perform.

Even then, the model only fit to an accuracy score of about .10. In the first model, which is no longer part of this notebook, I left the numpy array encoded as values from 0 to 255. Changing the type to float32 and dividing by 255 to encode them as 0.0 to 1.0 values significantly boosted the model performance from .10 to .46 (.638 early on in the training epochs). Clearly, feature selection is paramount to a CNN's ability to successfully classify images. 


However, looking at the accuracy scores above, I notice that accuracy increases over the first 5 epochs of training, but surprisingly begins to decrease afterward. Could this be a vanishing gradient? 

My next objective will be to gain a better understand of the basics of neural networks. What are the different layers? How do they work? What, really, is back propagation? How do I begin to tune this model and boost performance? 

In [20]:
model.evaluate(X_test, y_test)



[0.7084161211967468, 0.75509999999999999]

In [17]:
model.save('example_cnn.h5')

In [1]:
from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

import os

ImportError: No module named 'keras'

In [3]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [4]:
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

In [5]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

In [10]:
X_train = x_train
X_test = x_test