#Lecture 4.3: Classifying CIFAR with CNN
This lecture will focus on adapting the CNN we've been using for MNIST to processing color images.  The class will use the Cifar-10 images as an example

##Learning Objectives
1.  Become familiar and comfortable with reshaping tensors
2.  Use Theano to build an image classifier

##Order of topics covered
1.  Review storage of vectors and matrices (and tensors)
2.  Work some numpy examples 
3.  Modify the CNN code to train on Cifar-10 data
4.  Start the training process on Cifar-10

##Pre-reading
https://en.wikipedia.org/wiki/Row-major_order


##How vectors and matrices are stored
Why should you learn about how vectors and matrices are stored?  You've seen in the CNN code for classifying MNIST digits that the theano reshape function was used several times.  Each example image was stored as a one-dimensional array in the data file.  That was fine for a fully connected input layer, but for a convolutional layer it had to be converted to a 28x28 image-shaped matrix.  Then the final 128x3x3 tensor had to be converted back to a vector for input to the first fully connected layer.  In order to get the pixels where you want them you need to understand how matrices and tensors get reshaped.  The easiest way to understand reshaping is to understand how matrices and tensors get reshaped into vectors, which is how they get stored.  Visualizing the reshape process as converting from the first shape to a vector and then from a vector to the second shape will help you keep things straight.  


It's easy to understand how a vector might be stored.  The first element is written into a memory location.  The second element is written next to the first and each successive element is written next to the last one.  A matrix is written into memory the same way.  After the first element is written, each successive element is written next to the last.  It is natural to start with the 1,1 element, but what element comes next?  It could either be 1,2 or 2,1.  Storing the 1,2 element next would be serializing the matrix row-by-row.  Taking 2,1 would be serializing column.  Another way to look at it is that the matrix has two indices (j, k).  Does the first index count faster or does the second.  Taking 1,2 as the second element (or storing row-by-row) would be counting through the second index fastest.  In fact different languages take different approaches.  See https://en.wikipedia.org/wiki/Row-major_order.  As you can see in that wikipedia article Fortran uses column-major storage where the first index runs fastest whereas C uses row-major storage where the second index runs faster.  How can you determine the order for Python?  

##In-class Coding Exercises
1.  Form a one-dimensional numpy array that is filled with counting numbers.  Reshape the resulting one-dimensional array into a matrix to observe whether Python is row-major or column-major.  

2.  Confirm that reshaping operations can always be understood by reference to the way the original and the reshaped matrices (tensors) would be stored as vectors.  Do this by starting with a vector that you can reshape into more than one matrix shape.  Reshape it into one of these shapes.  Reshape the result into the second and confirm that you'd have gotten the same result by reshaping the starting vector into the second shape.  

3.  Go through the same exercise as 2 where one of the two shapes is a 3-dimensional tensor. 
4.  Adapt the convolution neural net that you saw last lecture to process the Cifar-10 image collection.  The code is repeated below.  The function used to read in the MNIST data has been changed to read Cifar-10 data.  You'll need to make the other required changes.  

In [None]:
import theano
from theano import tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
import numpy as np
from cifarHandler import cifar
from theano.tensor.nnet.conv import conv2d
from theano.tensor.signal.downsample import max_pool_2d

srng = RandomStreams()

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def rectify(X):
    return T.maximum(X, 0.)
    #return T.maximum(X, 0.01*X)  #leaky rectifier

def softmax(X):
    e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
    return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')

def dropout(X, p=0.0):
    if p > 0:
        retain_prob = 1 - p
        X *= srng.binomial(X.shape, p=retain_prob, dtype=theano.config.floatX)
        X /= retain_prob
    return X

def RMSprop(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
    grads = T.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        acc = theano.shared(p.get_value() * 0.)
        acc_new = rho * acc + (1 - rho) * g ** 2
        gradient_scaling = T.sqrt(acc_new + epsilon)
        g = g / gradient_scaling
        updates.append((acc, acc_new))
        updates.append((p, p - lr * g))
    return updates

def model(X, w, w2, w3, w4, p_drop_conv, p_drop_hidden):
    l1a = rectify(conv2d(X, w, border_mode='full'))
    l1 = max_pool_2d(l1a, (2, 2))
    l1 = dropout(l1, p_drop_conv)

    l2a = rectify(conv2d(l1, w2))
    l2 = max_pool_2d(l2a, (2, 2))
    l2 = dropout(l2, p_drop_conv)

    l3a = rectify(conv2d(l2, w3))
    l3b = max_pool_2d(l3a, (2, 2))
    l3 = T.flatten(l3b, outdim=2)
    l3 = dropout(l3, p_drop_conv)

    l4 = rectify(T.dot(l3, w4))
    l4 = dropout(l4, p_drop_hidden)

    pyx = softmax(T.dot(l4, w_o))
    return l1, l2, l3, l4, pyx


xTrain, yTrain, xTest, yTest = cifar(nData=2, Normalize=True)
xTrain = xTrain.reshape(-1, 1, 28, 28)
xTest = xTest.reshape(-1, 1, 28, 28)

X = T.ftensor4()
Y = T.fmatrix()

w = init_weights((32, 1, 3, 3))
w2 = init_weights((64, 32, 3, 3))
w3 = init_weights((128, 64, 3, 3))
w4 = init_weights((128 * 3 * 3, 625))
w_o = init_weights((625, 10))

noise_l1, noise_l2, noise_l3, noise_l4, noise_py_x = model(X, w, w2, w3, w4, 0.2, 0.5)
l1, l2, l3, l4, py_x = model(X, w, w2, w3, w4, 0., 0.)
y_x = T.argmax(py_x, axis=1)


cost = T.mean(T.nnet.categorical_crossentropy(noise_py_x, Y))
params = [w, w2, w3, w4, w_o]
updates = RMSprop(cost, params, lr=0.001)

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)

for i in range(100):
    for start, end in zip(range(0, len(xTrain), 128), range(128, len(xTrain), 128)):
        cost = train(xTrain[start:end], yTrain[start:end])
        #a, b, c, d, e = model(floatX(trX[start:end]), w, w2, w3, w4, 0., 0.)
        #print a.eval().shape, b.eval().shape, c.eval().shape, d.eval().shape
    print np.mean(np.argmax(yTest, axis=1) == predict(xTest))