<a href="https://colab.research.google.com/github/SaraSolace88/CSC386/blob/main/introtoneuralprediction_forwardpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Neural Prediction and Forward Propagation
Presented by Austin O'Brien with figures and examples taken from *Grokking Deep Learning* by Andrew Trask.

* We'll begin by observing how the most basic parts of a neural network behave.
* Every "node" inside of a neural network is fed input, which is then processed, and then the node gives an output.
* When using a single node, the input is the data from our dataset itself and the output is the "prediction" the neural net makes in the form of a numeric value.

* The value itself may be an unbounded numeric value or it may represent a percentage.
* Before looking at an entire neural network, let's observe a singlar node.


In [2]:
import numpy as np

weight = 0.1

def neural_network(input, weight):
    prediction = input * weight
    return prediction

num_of_toes = np.array([8.5,9.5,10,9])
input = num_of_toes[2]
pred = neural_network(input, weight)
print(pred)

1.0


* We can see the input is a value from the real world. More specifically, information about the real world that we will use to make a prediction, which is the output.
* We "scale" the input by multiplying it by the weight.
  * Multiplying by anything > 1 will increase the output.
  * Multiplying by a value between 0 and 1 essentially "divides" the input to make the output smaller.
  * What does multiplying by 1 do?
  * What does multiplying by 0 do?
* The output typically isn't correct when we first get started.
  * The weights often start as randomized values.
  * We will help the neural network tweak the weight to more accurate outputs using training data.
* Tweaking the weight up or down to give more accurage predictions is what we mean by "learning".
* As a thought experiement, what if we want the output to be the negative of the input. As an example, if we have:
  * Input: The elevation.
  * Output: The probability of spotting a giant squid.
  * How should we manipulate the weight?

# Making a prediction with multiple inputs

* Above, we were only using a single data point to try and make a prediciton.
* Often times, a single predictor isn't enough to make a good prediction, but rather we would like to use multiple predictors at once.

* Now, we'll perform a very similar action where we scale each data point by a weight.
  * Each data point will have it's own weight associated with it.
* We scale each input, and then add all of the products together, which is the output.
  * The result of adding the products together is called the *weighted sum of the input*, or **weighted sum** for short. Also referred to as the *dot product*.


In [12]:
# Code Together
import numpy as np

weights = np.array([0.1, 0.2, 0])

def w_sum(inputs, weights):
  assert(len(inputs) == len(weights))
  output = 0
  for i in range(len(inputs)):
    output += inputs[i] * weights[i]

  return output

def neural_network(inputs, weights):
  prediction = w_sum(inputs, weights)
  return prediction

toes = np.array([8.5, 9.5, 10, 9])
wlrec = np.array([0.65, .8, .8, .9])
nFans = np.array([1.2, 1.3, 0.5, 1.0]) # scaled from hundreds of thousands

team_num = 2
inputs = np.array([toes[team_num], wlrec[team_num], nFans[team_num]])
pred = neural_network(inputs, weights)
print(pred)

1.1600000000000001


* We're entering the territory of linear algebra, specifically vectors. Don't let yourself be intimidated, a vector is just a list of numbers.
  * Vectors are great for performing mathematics on groups of numbers.
  * Most vector mathematics require the vectors to be of the same length, but not always.
* The weighted sum of two vectors is where we mutiply the elements in the same position (position 0 with 0, position 1 with 1, and so on) and then add these products together.
* Anytime you perform math operations on two vectors of equal length and pair up the values by position, the operation is called an *elementwise* operation.
  * Elementwise addition sums two vectors.
  * Elementwise multiplication multiplies two vectors.

In [26]:
# Challenge Exercises - Test Your Might!!

vec_a = np.array([1, 2, 3])
vec_b = np.array([4, 5, 6])

def elementwise_multiplication(vec_a, vec_b):
  assert(len(vec_a) == len(vec_b))
  output = [0, 0, 0]
  for i in range(len(vec_a)):
    output[i] = vec_a[i] * vec_b[i]
  return np.array(output);

def elementwise_addition(vec_a, vec_b):
  assert(len(vec_a) == len(vec_b))
  output = [0, 0, 0]
  for i in range(len(vec_a)):
    output[i] = vec_a[i] * vec_b[i]
  return np.array(output);

def vector_sum(vec_a):
  output = 0
  for i in range(len(vec_a)):
    output += vec_a[i]
  return output

def vector_average(vec_a):
  output = 0
  for i in range(len(vec_a)):
    output += vec_a[i]
  return output / len(vec_a)

def dot_product(vec_a, vec_b):
  output = 0
  return vector_sum(elementwise_multiplication(vec_a, vec_b))

print("elementwise_multiplication")
print(elementwise_multiplication(vec_a, vec_b))
print("elementwise_addition")
print(elementwise_addition(vec_a, vec_b))
print("vector_sum")
print(vector_sum(vec_a))
print(vector_sum(vec_b))
print("vector_average")
print(vector_average(vec_a))
print(vector_average(vec_b))
print("dot_product")
print(dot_product(vec_a, vec_b))

elementwise_multiplication
[ 4 10 18]
elementwise_addition
[ 4 10 18]
vector_sum
6
15
vector_average
2.0
5.0
dot_product
32


Use the following vectors as inputs and test your functions.

Make sure you get the same results for the weighted sum examples

* So, what exactly does a weighted sum tell us? Let's look at some of the interesting properties of the weighted sum of two vectors.

* First, let's observe what particular values in the vectors do to the final result.
  * Values of zero completely nullify any impact of the paired value on the result.
  * Values greater than zero but less than one (decimals) lessen the impact of the paired value on the result.
  * Values of 1 do no transfer on the paired value.
  * Values greater than 1 increase the impact of the paired value.
  * Negative values reverse the impact.
  
* The author likes to thing of the weighted sum as a *notion of similarity* between the two vectors. Let's review the examples above:
  * The largest weighted sum is w_sum(c,c), while the lowest are w_sum(a,b) and w_sum(c,e). Given the notion of similarity, why does this make sense?
  
* The author goes on to explain the similarities between a dot product and a logical AND operator.
  * For 1s and 0s, this is fairly intuitive. If the weight is a zero, then the similarity score is unaffected, otherwise a 1 increasese the score.
  * But for decimal values, we can think of it as a 'partial ANDing', where we get a reduced similarity score.
  * For values larger than one, we can think of it as 'boosted ANDing', where we get an increased similarity score.
  
* Negative values can be thought of as a logical NOT operator.
  * A positive weight paired with a negative weight cuases the similarity score to go down.
  * A double negative will add to the score.
  
* Continuing with the logical equivalences, the author mentions that we can insert ORs between weights to create a "crude language" of sorts. For example:
  * weights = [ 1, 0, 1] => if input[0] OR input[2]
  * weights = [ 1, 0, -1] => if input[0] OR NOT input[2]
  * weights = [ 0.5, 0, 1] => if BIG input[0] or input[2].  Note how a weight of 0.5 means the corresponding input needs to be larger to compensate.
  
* While a crude langauge, this helps us give some context to what the weights mean, and how they affect the output given the input.
  * This is to say, a high output score is given when the weights are similar to their corresponding input values.
  
* Given our example with num_toes, wlrec, and nfans; what parameter has the least affect on the score? How about the most affect?
  

In [None]:
# For the fun of it, let's simplify the neural_network function so we're using the built in numpy dot product function instead of our w_sum function

# Making a prediction with multiple outputs
* It's also possible to get multiple outputs from a neural network, even from a single input.

* Here we can see the wlrec is used to determine multiple outputs:
  * What percentage of the team is hurt?
  * Will the team win the next game?
  * Are the players happy or sad.

In [None]:
# Code multiple outputs together

# Predicting with multiple inputs and outputs

* We can combine the two methods to get multiple outputs with multiple inputs as well.
* Each input with have a weight associated with each output.
* The methodology is essentially the same as before, where now each output will be the dot product of it's corresponding inputs and weights.

* We're going to combine multiple vectors together to make matrices. Again, don't be intimidated, a matrix is just a several vectors placed together.
* And now we can simply perform three different dot products.

In [None]:
# Let's write the code using matrices to get multple outputs with multiple inputs
import numpy as np

                   #toes %win #fans
weights = np.array([ [0.1,0.1,-0.3],   # hurt?
                     [0.1,0.2, 0.0],   # win?
                     [0.0,1.3, 0.1] ]) # sad?

toes = np.array([8.5,9.5,10,9])
wlrec = np.array([0.65,0.8,0.8,0.9])
nfans = np.array([1.2,1.3,0.5,1.0])

* As you saw, we used the matrix to transport our vectors to our neural network function, but once there, we only used one vector at a time from the matrix to get the dot products.

# Stacked Neural Networks

* Now we'll see that we can take the output from one network and use it as input for another network. This creates a 'stacked', or **deep neural network**.
* We refer to a new set of nodes as a layer.
  * We'll need to do a set of vector-matrix multiplication for each layer after the input.
* I realize it's not apparent why we might want to stack layers like this to make predictions, but we'll study it later on in the book.
  * For now, think of more layers as allowing us to model more complex patterns.

In [None]:
import numpy as np

                   #toes %win #fans
ih_wgt = np.array([ [0.1, 0.2, -0.1],    # hid[0]
                    [-0.1,0.1, 0.9],     # hid[1]
                    [0.1, 0.4, 0.1] ])   # hid[2]

                # hid[0] hid[1] hid[2]
hp_wgt = np.array([ [0.3, 1.1, -0.3],    # hurt?
                    [0.1, 0.2, 0.0],     # win?
                    [0.0, 1.3, 0.1] ])   # sad?


toes = np.array([8.5,9.5,10,9])
wlrec = np.array([0.65,0.8,0.8,0.9])
nfans = np.array([1.2,1.3,0.5,1.0])

In [None]:
import numpy as np

a = np.zeros((2,4))         #1
b = np.zeros((4,3))         #2
c = a.dot(b)
print(c.shape)              #3

e = np.zeros((2,1))         #4
f = np.zeros((1,3))         #5
g = e.dot(f)
print(g.shape)              #6

h = np.zeros((5,4)).T       #7 8
i = np.zeros((5,6))         #9
j = h.dot(i)
print(j.shape)              #10

h = np.zeros((5,4))         #11
i = np.zeros((5,6))         #12
j = h.dot(i)
print(j.shape)              #13

* What we've done so far is called *forward propagation*, where a neural network takes input take and makes a prediction.
  * We're propagating activations forward through the network.
  * An **activation** are the numbers that aren't weights going through the network; the outputs from each node.
* Our goal now is to set the weights to make accurate predictions.
* *Weight Learning* is how we autotune the weights and is also a series of simple techniques combined many times across the architecture.

~ Fin