## LeNet-5
LeNet is a convolutional network architecture developed in 1998, that was primarily used for handwritten digit detection. LeNet was widely used to read the handwriting on checks for the U.S. Treasury. 

### Architecture
LeNet consists of 8 layers, 1 being the input, 3 of which are convolutional, 2 are average pooling, and two fully connected layers. By today's standards, this is a relatively simple and straightforward architecture. Our implementation of LeNet's architecture using Tensorflow is below.

In [None]:
import tensorflow as tf
import numpy as np

#First layer is the input, was originally 28x28 by padded on each side by 2 pixels.
#This is the only layer to use any padding, so the overall size of the layers shrink at every stage
X_inp = tf.placeholder(tf.float32, shape=(None, 32, 32, 1))
y_inp = tf.placeholder(tf.int32, (None))

#First convolutional layer. The arguments for tensorflows conv2d function from layers that we use are:
#input: the input layer
#filters: the number of output maps generated
#kernel_size: the width and height of the kernel. In this network, the width and height are always the same,
#so we just specify one number
#strides: the horizontal and vertical stride, a tuple generally, but a single number represents same width + height
#padding: "valid" = no padding, may ignore some rows and columns at bottom of image
#         "safe" = padding, adds rows and columns if necessary based on the stride
#activation: the activation function used by the layer

#Conv1 input size: 32 x 32, output size: 28 x 28 x 6
conv1 = tf.layers.conv2d(X_inp, filters=6, kernel_size=5,  strides=1, padding="valid", activation=tf.nn.relu) 


### Average Pooling
LeNet uses average pooling as as opposed to max pooling. The idea is the same, only that the average value within the pool is taken for the layer instead of the maximum. Similar to a kernel with 1 / (pool_width) * 1 / (pool_height) for its entries.

In [None]:
#average_pooling2d args: 
#input: input layer
#pool_size: width, height of pool size, single number means same width + height
#strides: horiz and vertical stride, 1 number = same width + height

#avg_pool input size: 28 x 28 x 6, output size: 14 x 14 x 6
#pool size of (2,2) and stride size of (2,2) halves the dimensions of the previous layer
avg_pool = tf.layers.average_pooling2d(conv1, pool_size=2, strides=2)

#Conv2 input size: 14 x 14 x 6, output size: 10 x 10 x 16
#because of padding valid and kernel size (5,5), only the first 10 / 14 pixels of each row and column are used
#in this layer
conv2 = tf.layers.conv2d(avg_pool, filters=16, kernel_size = 5, strides=1,
                       padding="valid", activation = tf.nn.relu)
#avg_pool2 input size: 10 x 10 x 16, output size: 5 x 5 x 16
avg_pool2 = tf.layers.average_pooling2d(conv2, pool_size = 2, strides=2)

#Conv3 input size: 5 x 5 x 16, output size: 1 x 1 x 120
#A single kernel of size 5x5 results in 16 1x1 outputs, with 120 filters applied to each of them
conv3 = tf.layers.conv2d(avg_pool2, filters=120, kernel_size = 5, strides=1,
                      padding="valid", activation = tf.nn.relu)

#Reshape the convolutional layer to have size of 1920x120 for use in fully connected layer
flat = tf.reshape(conv3, [-1, 120])

#fully connected dense layer. size: 120x84
dense = tf.layers.dense(inputs=flat, units=84, activation=tf.nn.relu)

#output layer: inputs to softmax. size: 84 x 10
logits = tf.layers.dense(dense, units=10)

softmax = tf.nn.softmax(logits)
predict = tf.argmax(softmax, axis=1)

y_labels = tf.one_hot(y_inp, 10)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y_labels)
#loss function is the mean cross entropy for softmax applied to the output layer
loss = tf.reduce_mean(cross_entropy)
#use adam optimizer
opt = tf.train.AdamOptimizer(learning_rate=0.001)
train_op = opt.minimize(loss)

### LeNet class
We made a class that wraps lenet as a sklearn estimator, and example of using it in a grid search is below.

In [None]:
import tensorflow as tf
import numpy as np
from cnn_wrapper import CNNClassifier
from sklearn.model_selection import GridSearchCV

X = np.load("x_mnist1000.npy")
X = X.reshape((-1, 28, 28, 1))
y = np.load("y_mnist1000.npy")

np.random.seed(1)

indices = np.random.permutation(len(X))
train_indices = indices[:800]
valid_ind = indices[800:900]
test_ind = indices[900:]

X_train = X[train_indices]
y_train = y[train_indices]

X_validation = X[valid_ind]
y_validation = y[valid_ind]

X_test = X[test_ind]
y_test = y[test_ind]

# pad for lenet
pad_dims = ((0, 0), (2, 2), (2, 2), (0, 0))
X_train = np.pad(X_train, pad_dims, "constant")
X_validation = np.pad(X_validation, pad_dims, "constant")
X_test = np.pad(X_test, pad_dims, "constant")

N = len(X_train)
BATCH_SIZE = N // 10 

clf = CNNClassifier(batch_size=BATCH_SIZE)

param_grid = {
    "verbose":  [2],
    "activation": [tf.nn.tanh, tf.nn.relu, tf.nn.elu],
}

gs = GridSearchCV(clf, param_grid=param_grid, n_jobs=-1)
gs.fit(X_train, y_train, n_epochs=10, X_valid=X_validation, y_valid=y_validation)
print("best score, params:", gs.best_score_, ",", gs.best_params_)