In [1]:
# Importing packages
using Printf

# Neural Networks From Scratch
The simplest neural network possible can be implemented in a few lines of code:

In [2]:
function neural_network(input, weight)
    prediction = input * weight
    return prediction
end

neural_network (generic function with 1 method)

All a neural network does is it takes some input data and manipulates it in some way. Really, they're just mathematical functions. In this case, it's a straight line, that goes through origin (in the form of $y=mx+b$).

Before we continue, it helps to lay down some context for our neural network. Every neural network has a purpose. Some networks are used to categorize images, predict stock prices, or identify tumors in patients. That being said, it helps to define some metric of utility for our neural network.

Let's try to use our neural network to predict whether a student will pass or fail a test based on amount of sleep they got the night before.

In [3]:
# The input data
sleep_hours = [6.50, 8.25, 8.00, 5.75, 4.00, 9.00]

# Defining the input and weight
input = sleep_hours[1]
weight = 0.1

# Making a prediction
prediction = neural_network(input, weight)

# Printing the result
@printf "Prediction: %.2f" prediction

Prediction: 0.65

So what is this saying? We can interpret that prediction result as saying "a student who slept 6.5 hours the night prior to the test has a 65% chance of passing it". We intuitively know this statement by itself is inaccurate. The weight was set arbitrarily at `0.1` so any input we give our network will simply multiply it by 0.1. Also, there must certainly be other factors that influence the chance of passing a test, such as study time, subject difficulty, etc. While what we have right now is technically a neural network, it's very clear, it's not a very good one.

When building a neural network, it is not only important to consider what inputs you provide, but also, how hard it is to really measure it. For example, measuring time spent sleeping is pretty straightforward. Start a timer before you sleep and stop the timer when you wake up. Of course, you can take a more analytical approach and measure the amount of time spent in each sleep cycle stage (non-REM, REM, etc). But sleeping and waking are relatively discrete events and the time spent between both is easily quantifiable. 

Now, consider another metric that may influence the results of a test, like subject interest. Generally, one might think that a person who is more interested in a subject will do better on a test. But how do you measure interest? It's a pretty complicated problem because interest level is not easily defined. It's like defining happiness, or something.

And back to our network. In theory, if we provide more data, the accuracy for our network should increase. We're going to include data for the following:
- The total number of hours studied
- Daily average hours spent playing video games

In [4]:
# Additional input data
study_hours      = [2.25, 7.00, 0.50, 12.0, 10.5, 4.75]
video_game_hours = [3.25, 5.50, 4.50, 2.00, 1.50, 9.25]

6-element Vector{Float64}:
 3.25
 5.5
 4.5
 2.0
 1.5
 9.25

Because we added more metrics, we need to change the shape of our input and the network. Let's `zip` our input data so we get a tuple of every metric for each entry:

In [5]:
# Creating a list of tuples for each entry
input_list = zip(sleep_hours, study_hours, video_game_hours)

zip([6.5, 8.25, 8.0, 5.75, 4.0, 9.0], [2.25, 7.0, 0.5, 12.0, 10.5, 4.75], [3.25, 5.5, 4.5, 2.0, 1.5, 9.25])

And to index the first entry, we do something like this:

In [6]:
# Indexing first entry
collect(input_list)[1]

(6.5, 2.25, 3.25)

An entry represents a single person's data. So the first entry (above) describes a person who slept 6.5 hours the night before, studied a total of 2.25 hours, and plays about 3.25 hours of video games per day.

Now, let's modify our network:

In [7]:
function neural_network(entry, weights)
    @assert length(entry) == length(weights) "The entry tuple and weight tuple should have the same length."
    
    weighted_sum = 0
    # x, m, as in, y = mx + b
    for (x, m) in zip(entry, weights)
        weighted_sum += x * m
    end
    
    return weighted_sum
end

neural_network (generic function with 1 method)

Our network looks very different from our old version. So instead of returning the product of a single weight value and input value, our neural network returns the sum of products of the inputs with their respective weight values.

Note that because we increased our input size, we also need to increase the number of weights in our network. Like last time, we're going to assign them arbitrarily:

In [8]:
weights = [0.5, 0.0, -0.2]

3-element Vector{Float64}:
  0.5
  0.0
 -0.2

And now let's just pass in an entry and the weights into our neural network and see what happens:

In [9]:
entry = collect(input_list)[1]

# Making a prediction
prediction = neural_network(entry, weights)

# Printing the result
@printf "Prediction: %.2f" prediction

Prediction: 2.60

This number is still pretty meaningless to us because of the arbitrary weights. It also goes out of our expected prediction range of 0.0 and 1.0. But that's okay. The important thing at this stage is that our network takes multiple weights and inputs and spits out _some_ number without throwing an error.

## The Dot Product
But let's take a moment to think about why it makes sense to use the dot product operation when constructing this neural network. In general terms, the dot product is an operation that takes multiple numbers and returns a single number. This is more or less what we want. We want to provide our network with a set of inputs and receive a single output. And now the next question to ask is: why do we multiply the inputs by a set of weights?

The weights give us a mathematical way of manipulating the inputs. More specifically, they allow us to increase or decrease emphasis of an input. Let's consider the following weights and inputs:

$$
\text{inputs} = 
\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
$$