# Neural nets: learning features, then using them to classify

## Fully Connected Neural Nets

### Structure:
* Based on human neurons
![](imgs/neuron.png)

* We can pass features into a "neuron" with weights and an activation function like so:

![](imgs/perceptron_schematic.jpg)

* We can connect these into layers, and connect layers to other layers

![](imgs/nn.jpeg)

* The idea: each successive layer is a set of new features learned from the previous layer Each feature is weighted

* Weights get tuned by a cost minimizing algorithm, where the cost is some function that compares predicted labels with actual labels (same as other supervised learning techniques)

* Tuned weights and learned features are then used to classify new examples



### Forward propogation:
* The net starts off with real features (e.g., pixels from an image, unraveled into a single array)

* The net passes these features (weighted by their respected weights) to a set of activation nodes (the first layer). Each activation node has an activation function that will transform the input into a single number 
   - Two activation functions: Sigmoid, ReLu

* These new numbers are now new features that the rest of the neural net will use. They are passed into the next layer and the process is repeated until the output layer is reached.

* The output layer has one activation node for each class being predicted. The node with the highest number indicates the winning class.

* The process of pushing data forward through the net to make a classification is called *forward propogation*

### Back propogation:
* The training examples are each pushed through the net and the resulting values are compared with the labels for each label, which are represented with *one-hot encoding*. The error then recorded is passed back though the net using the *back propogation* algorithm, which adjusts the weights at each layer to minimize the error. This is done multiple times until the weights are tuned. 

![](imgs/one-hot.png)
One-hot encoding

### Predictions:
The net, with tuned weights, can now be used to make predictions for new examples.

## Convolutional Neural Networks
### Convolution
* These networks are more complex than the simple fully connected networks, though they still use forward and backward propogation. They are used mainly on images, as 2d relationships of the features are retained (we do not unravel the pixel information into single arrays)

* Layers are not fully-connected (each node in one layer connects to every other layer in the next layer).

* The process: When an image (for instance) is passed into the net, the first layer consists in passing a filter (small matrices with numbers in them) over the image (usually in strides of 2 or so pixels). The numbers in the filter are multiplied by the numbers in the image underneath (like Sobel filters in face detection). 

* An empty matrix the same size (or slightly smaller, depending on what we do with edges) is created.

* Take the filter and put it over the upper left corner of original image.

* The resulting products are added (multiply each filter number times the image number underneath). Get the spot the center of the filter is over in the original image. Take the number we just calculated and put it the corresponding spot of the new matrix.

* Slide the filter over and do this again, until the image is completely covered.

![](imgs/filter.jpg)

* Usually there are multiple filters, these are then stacked into a 3d structure.
* Note: these filters are, at first, just given random small numbers. These numbers will be adjusted in a process similar to adjusting the weights in a fully-connected neural net, that is, they will be learned!

* The process of doing this constitutes a *convolutional layer*
* A convolutional neural net may have many convolutional layers, each with a different number of filters, and different size outputs.

### Max Pooling

Experience has shown that it is good to downsample the images produced by convolutional layers by taking the max pixel value of a selected region, and replacing that region with that pixel, as illustrated in the image below.

![](imgs/max_pool.png)

### Activation functions
#### Sigmoid
#### ReLU
#### Softmax


### Dropout



## Coding example: NMIST digit recognition 



In [14]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [15]:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

In [16]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

![](imgs/mnist_deep.png)

In [17]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [18]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

In [19]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

In [20]:
x_image = tf.reshape(x, [-1, 28, 28, 1])

In [21]:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

In [22]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

In [23]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

In [24]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

In [26]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

In [28]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.initialize_all_variables())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.14
step 100, training accuracy 0.82
step 200, training accuracy 0.98
step 300, training accuracy 0.84
step 400, training accuracy 0.98
step 500, training accuracy 0.9
step 600, training accuracy 1
step 700, training accuracy 0.96
step 800, training accuracy 0.9
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 0.96
step 1200, training accuracy 0.94
step 1300, training accuracy 0.96
step 1400, training accuracy 0.94
step 1500, training accuracy 0.98
step 1600, training accuracy 0.98
step 1700, training accuracy 0.88
step 1800, training accuracy 0.98
step 1900, training accuracy 1
step 2000, training accuracy 0.94
step 2100, training accuracy 0.94
step 2200, training accuracy 0.98
step 2300, training accuracy 1
step 2400, training accuracy 0.94
step 2500, training accuracy 1
step 2600, training accuracy 1
step 2700, training accuracy 1
step 2800, training accuracy 0.96
step 2900, training accuracy 1
step 3000, trainin