# Building a Neural Network From Scratch with NumPy

The goal of this Notebook is to give a hands-on explanation of how Artificial Neural Networks work. I intentionally avoid frameworks like PyTorch or Tensoflow because I wanted to build a better understanding of what Machine Learning models actually are, what Neural Networks actually are, and how they can be made. This Notebook is a collection of information I wish I had when I began this journey. It touches on a little bit of the math, but I don't re-explain the math. I try to link out to more explanatory sources where I think it's valuable. Note: I am not a Machine Learning engineer, nor am I a Data Scientist. I'm a Software Engineer that turned into a political operative (lol). I wrote this for an audience of Software Engineers. Also: I don't have a GPU and I don't want to spend a bunch of money renting one from Amazon. This model can be trained and deployed on a modern CPU in a matter of minutes.

## What We'll Be Doing

We're going to build a Neural Network for multi-class classification. All that means is we're going to make a model takes in images and attempts to label them from a set of options. In our case, we're going to create a Neural Network that works with the [MNIST database of handwritten digits](https://webcache.googleusercontent.com/search?q=cache:yann.lecun.com/exdb/mnist/). This database contains 70,000 images of handwritten digits, 0 - 9, and corresponding labels of which digit the handwritten image is. We'll use 60,000 of the images to train our Neural Network, and 10,000 to test its accuracy. I've included the data with this Notebook in the `data/` directory.

Neural Networks are particularly handy for image classification tasks. There are many other types of Machine Learning out there, but we won't spend any attention on those.

## Background Concepts

### Shape of a Neural Network

First of all, let's demystify one thing: Neural Networks are just graphs. Just nodes and edges. If you've studying any Computer Science or have a background in Software Engineering, this is probably a familiar conecpet to you. The exact shape of any given Neural Network is dependant on how you build it, but that's something we get to decide. The graph has an input layer that is usually one node per input feature. In our case, a pixel of an image is a feature. Next, there are one or more hidden layers. This is the part that makes this Deep Learning. The presence of one or more hidden layers is the "deep" in Deep learning. There's no standard rule for the size of a hidden layer, or how many you should have. Finally, there's an output layer. Each node in the output layer corresponds to one label. For example, if a possible label to an image is "cat" then one node in the output layer represents "cat". We're going to make a Neural Network that has a bunch of input layer nodes, a single hidden layer with ten nodes, and an output layer with ten nodes, one for each digit 0 - 9.

Here's are drawing of a Neural Network with three input nodes, a hidden layer with four nodes, and an output layer with two nodes. This might be how you would construct a Neural Network that does binary classification: a model that tries to label inputs to one of two options for outputs.

| ![Artificial Neural Network](./img/artificial_neural_network.svg)|
|:--:|
|en:User:Cburnett, [CC BY-SA 3.0](http://creativecommons.org/licenses/by-sa/3.0/), via Wikimedia Commons|

If you're looking for more explanation of the structure of Neural Networks, [But what is a Neural Network?](https://www.3blue1brown.com/lessons/neural-networks) by 3Blue1Brown is excellent.

### How the Neural Network Learns

Neural Network start out very stupid. As we'll see, they begin with no more "intelligence" than random guessing. Our goal is to iteratively adjust the network's Weights and Biases to make it smarter. We do this in two steps: **Forward Propagation** and **Back Propagation**.

#### Forward Propagation

Think of this step as showing the Neural Network some input, and asking it to classify it. At the beginning, it's very likely to get it wrong. But, like humans, we need to get things wrong before we know how to get them right. In Forward Propagation, we simply push all our features (pixels) through the Neural Network and ask, "what did you see?" The output is all the answers to that question.

#### Back Propagation

Think of this step as showing the Neural Network how right or wrong its answers were. We take all its answers to, "what did you see?" and come up with a measure of how wrong they were. We'll see below that we can assign a numeric value to the accuracy of its answer. From that numerica value, we can work backwards on all the neurons (nodes in the graph) to tell it, "you were X wrong, and this specific neuron contributed to Y amount of that error; adjust this neuron's Weights and Biases by Z amount and try again."

3Blue1Brown has another excellent video on the conecepts of Back Propagation: [What is backpropagation really doing?](https://www.3blue1brown.com/lessons/backpropagation)

#### Training

And that's it! Our Neural Network learns by repeatedly making guesses, seeing how wrong it was, and adjusting its Weights and Biases. We repeat this over and over until it is good at the task! This is a lot like how people learn. Show a small child pictures of various farm animals over and over and ask them to name the animals. At first they're very bad at it, and over time they get very good at it. There's a lot of research out there that our artifical Neural Network is structured and operates like human brain neurons.

### Gradient Descent

Gradient Descent is the most math-y piece of all this. Again, 3Blue1Brown has a great video: [Gradient descent, how neural networks learn](https://www.3blue1brown.com/lessons/gradient-descent). This is the piece that is most choose-your-own-adventure of how much you want to actually understand. I recommend diving in at least a little bit.

Imagine being at a point graph and you wanted to find which steps to take to get to the minimum value. If you've taken any calculus before, you know that you can take the slope at the current point to tell you which way to go, and by how much. If you do this over and over, with small steps, you will approach a local minimum. That's a 1-dimensional gradient descent. Our plan is to work with lots of repeated steps to get to a minumum of our "cost" function — the function telling us how bad our predictions are.

| ![Gradient Descent](./img/GradientDescentGradientStep.svg)|
|:--:|
|Reducing Loss: Gradient Descent, [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), via [Google Developers](https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent)|

You can do this in two and three domensions as well. In fact, you can do it in as many dimensions as you need, which is very handy, because we're going to be working in a lot of dimensions.

| ![Gradient Descent](./img/GradientDescent.gif)|
|:--:|
|[CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/deed.en), via Wikimedia Commons|

Ultimately, we keep moving downward in our many-dimensional "cost" function to find a minimum value. The lower the cost, the better the prediction.

## Environment Setup

As stated, we're going to build and train a fully functioning Neural Network using only **NumPy**. That said, I'm also going to install **matplotlib** just so that we can visualize some of the work as we go. It's completely unnecessary to use matplotlib. Both of these libraries are set in `requirements.txt`.

It's also worth pointing out that I'm developing this in Python 3.10. Other versions of Python 3 probably work, too.

In [2]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.
