# Using Neural Nets to Recognize Handwritten Digits

Humans are naturally very good at processing images. Our experiences allow us to recognize and classify many things, very quickly.

How could we write a program which could read hand written numbers? How do you program that a nine is a circle with a line coming down from the different sides, and what about all of the different variations?

Neural Networks approach this from a different side, the idea is to take a large number of labeled training examples and have the computer teach itself what is correct.\
The NN teaches itself rules to infer the differences.

### What is a Neural Network

**Perceptron** - The most basic form of a neuron, takes several binary inputs and produces a single binary output.\

* Can have more or less inputs
* Weights are multiplied to each input before summing the total
* If the summed value is greater than the threshold value, the output is 1, else 0
* Basically makes a decision by weighing up evidence


![image.png](attachment:image.png)

Perceptrons can be used in multiple layers to make more sophisticated decisions.

![image.png](attachment:image.png)

Bias is added to the dot products of the inputs (x) and the weights(w), the bias is how easy for the perceptron to fire or result in a 1.

Basic Perceptrons can be used to create elementary functions such as AND, OR, and NAND

![image.png](attachment:image.png)

A big problem with perceptrons is the the only outputs of 1 and 0. When trying to build learning models this doesn't allow for minute changes of weights and bias to make small differences in outputs. Small changes can result in output going from 0 to 1, which may not get us closer to the end result.

### Sigmoid Neuron

Similar to perceptrons except small changes in their weights and bias only cause a small change in their output.
This allows sigmoid neurons to go through small variations of changes of weights and biases and to "learn" to get closer to outcome we are looking for (seeing handwritten digits)

**σ** - Sigmoid Function - Very common activation function for neural networks.

In sigmoid functions the input can be any value between 0 and 1, ie .531


The sigmoid functions results in the same outputs of 0 ir 1 in all extremes, but in intermediary values does differ from perceptrons

![image.png](attachment:image.png)

The smoothness of this change from 0 to 1 outputs allows small changes in weights and bias to produce a small change in output.

So for the example of trying to determine whether a character was a 9 or not, it would seem like we would want an output that was a 1 or a 0. But instead with a sigmoid function we can interpret the outputs of greater 0.5 as a 9 and any outputs of less than 0.5 as a 9. 

### Neural Networks

**Hidden Layers** - Any layers which aren't the input or output layers 

![image.png](attachment:image.png)

**Feedforward Networks** - Like the network above, the information is always fed forward to the next layer, there are no loops.

**Recurrent Neural Networks** - Artificial Neural Networks where feedback loops are possible

### Building a Digit Classifier

The first thing we need to think about is how are our inputs and outputs going to be organized.
1. Our inputs all need to be the same size so the model has something to learn from (28 x 28 = 784 Pixels)
2. The model can then associate input values based on each pixel
3. Each pixel can be white, black, or something in between, which will give that pixel an input value.
4. The pixels are then sent through a hidden layer with a variable weight which then determines a likelihood of being any given digit.
5. On the outputs side, we have 10 outputs, 0-9, and each input character is assigned an output to each, the one with the highest value is the most likely correct output.

![image.png](attachment:image.png)

Now that we know how we want the model to perform, we need test data.

We need both data to train the model, and data which the model will not see to test the model.

Both sets of the data must be correctly labeled, so the model can learn from itself over the iterations.

When we think about training data vs test data, we usually want different data for the test data to see if the models actually work. For example if we used characters written by the same person in the training data and the test data, we don't know if the model actually learned a pattern or just memorized that specific person's handwriting.

**Cost Function or Loss Function** - quantifies how well we are achieving the goal of the function.

The goal of our algorithm is to minimize the cost or loss.

We minimize loss using a **gradient descent algorithm**.

The gradient descent algorithm works by repeatedly computing the gradient ∇C, and then moving in the opposite direction.\
This goes on until the hypothetical ball rolls down into the global minima of the valley.

![image.png](attachment:image.png)

Gradient descent is used to find the weights and biases which minimize the cost or loss.

Gradient descent can be very slow when we have a very large data set of inputs.

In this case we would use **Stochastic Gradient Descent**

Stochastic Gradient Descent lets us estimate the gradient ∇C by computing ∇C(x) for a small sample of randomly chosen training inputs. By averaging over this small mini-batch, we can get a good estimate of the true gradient.

In training a NN, an **epoch** is going through the data we have, mini-batch by mini-batch into our stochastic gradient descent, until we have gone through all the data once.

### Deep Neural Networks

Deep Neural Networks are NN with more and more layers, allowing bigger problems to be broken down into more layers and allowing more complex problems to be solved.

Think identifying a face:
1. First Layer
    * Is there hair on top?
    * Is there a left eye?
    * Is there a right eye?
    * Is there a nose?
    * Is there a mouth?
2. Second layer - Individual Eye
    * Is their an eyebrow?
    * Are their eyelashes?
    * Shape of the eye
3. More Layers 
    * Breaks it down all the way into individual shapes and even pixels