## Introduction ##

### Supervised Learning ###
Supervised learning is the direct imitation of a pattern between two datasets.
Given a dataset as input and another as output, the computer modifies its 'internal procedure' to transform the input dataset to the output dataset. <br>

Supervised learning is one of the most popular forms. The following examples all use supervised learning:
-  Using the __pixels (input)__ of an image  to detect the presence or absence of a **cat (output) **.
-  Using the __liked movies (input)__ to predict ** movies you may like **.
-  Using __someone's words (input)__ to predict ** happines or sadness**.<br>

__ In general, supervised learning transforms one dataset (what we know) into another dataset (what we want to know). __ 


### Supervised Learning ###
Supervised learning is the direct imitation of a pattern between two datasets.
Given a dataset as input and another as output, the computer modifies its 'internal procedure' to transform the input dataset to the output dataset. <br>

Supervised learning is one of the most popular forms. The following examples all use supervised learning:
-  Using the __pixels (input)__ of an image  to detect the presence or absence of a **cat (output) **.
-  Using the __liked movies (input)__ to predict ** movies you may like **.
-  Using __someone's words (input)__ to predict ** happines or sadness**.<br>

__ In general, supervised learning transforms one dataset (what we know) into another dataset (what we want to know). __ 

### Unsupervised Learning ### 

Unsupervised learning also transforms one input dataset into another. However, the main difference is that the dataset that it transforms into __is not previously known or understood__. <br>
Unsupervised learning finds patterns in the data that we don't know about. 


## Chapter 3: Forward  Propagation ##

The procedure followed in (supervised) machine learning is __ Predict -> Compare -> Learn __. We will first look at the __"Predict"__ part. <br>

Note that although the first neural network that we will build shortly, only deals with one datapoint at a time, however this need not be the case. Neural nets can handle multiple datapoints simulataneously, and one question we should always try to answer is __"how many datapoints should I propagate at a time?"__. <br>

The answer is, that enough datapoints should be passed so that the network can be accurate. For example, a network won't be able
to correctly classify whether a photo contains a cat or not, if it is passed one pixel at a time. The general rule of thumb is to provide as much information as a human would need to make the same prediction.

### A simple Neural Net making a prediction###
Our first neural net will take one input datapoint and output one prediction. Since we only have one input datapoint and one output, our network will have one weight. The network will try to predict __"win" (output) __ of the team based on one datapoint containing the __ average number of toes of the team (input)__.

In [1]:
weight = 0.1

# We first define the network
def neural_network(input, weight):
    prediction = input * weight
    return prediction

# Give it some (input) data points. This will usually be a value recorded in the real world. 
number_of_toes = [8.5, 9.5, 10, 9]

# Pass one datapoint
input = number_of_toes[0]

# Predict the win based on the input using the network
pred = neural_network(input, weight)
print(pred)

0.8500000000000001


The interface of the neural network is quite simple: It accepts an __input__ variable as _information_ and a __weight__ variable as _knowledge_. It combines the two (through multiplication) and outputs a _prediction_. All neural nets work in the same way, regardless of the number of input datapoints and number weights. 
Another way to think about the weights is as a measure of _sensitivity_ between the input and its prediction. 

Note that although we have managed to make a prediction, it doesn't mean that this was correct. It is through this "mistakes" and __trial & error__ that the network will learn: If sees if it has predicted too high or too low and adjusts its weights accordingly in order to predict more accurately the next time it sees the same input.

You might have already noticed, that __#of toes__ is not a really good predictor of win. We can modify our network and give it more information:

In [11]:
# Function definitions

# Define the network
def neural_network2(input, weights):
    pred = w_sum(input, weights)
    return pred

# Define the function used to multiply the inputs by the weights
def w_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    
    for i in range(len(a)):
        output += (a[i] * b[i])
    
    return output

In [12]:
weights = [0.1, 0.2, 0]

# The dataset represent the current status of the team at the beginning of each game
# for the first 4 games in a season
#
# toes := current number of toes
# wlrec := current games won (percent)
# nfans := fan count (in millions)

toes = [8.5, 9.5, 10, 9]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# input corresponds to the team state at the beginning of the first game of the season
input = [toes[0], wlrec[0], nfans[0]]

pred = neural_network2(input, weights)
print(pred)

0.9800000000000001


Notice that the only thing that changed from our first neural net, is that instead of a single value input and weight, we now have vectors and instead of multiplication, we now use a dot product.
Loosely stated, a dot product gives us a _notion of similarity_ between two vectors. Consider the examples:

-  a = [0, 1, 0, 1]
-  b = [1, 0, 1, 0]
-  c = [0, 1, 1, 0]
-  d = [.5, 0, .5, 0]
-  e = [0, 1, -1, 0]
<br><br>
Which give the dot products:

-  w_sum(a,b) = 0
-  w_sum(b,c) = 1
-  w_sum(b,d) = 1
-  w_sum(c,c) = 2
-  w_sum(d,d) = .5
-  w_sum(c,e) = 0
<br><br>
Notice that the heighest weighted sum is between vectors that are exactly identical. In contrast, vectors __a__ & __b__ that have no overlapping elements have a dot product of 0. 
It seems that one could equate the properties of the __"dot product"__ to that of a __"logical AND"__. This is evident in the _w_sum(a,b)_. 
Luckily, neural nets are also able to model partial __ANDing__ (for example _w_sum(c,d)_).

Following the same analogy, negative weights tend to imply a __logical NOT__ operator, since any positive weight paired with a negative one will cause the score to decrease. If both are negative, the score increases (two negatives make a positive).

We can thus " (crudely) read our weights in the following way:
-  weights = [ 1, 0, 1] => if input[0] OR input[2]
-  weights = [ 1, 0, -1] => if input[0] OR NOT input[2] 
-  weights = [ 0.5, 0, 1] => if BIG input[0] or input[2]

So given these intuitions, what does it mean for neural net to make a prediction? Roughly speaking it seems that our network gives a high score to inputs that are more _similar_ to our weights. In our weights __weights = [0.1, 0.2, 0]__ notice that the _nfans_ is completely ignored, while the _wlrec_ is the most sensitive predictor.
However, the most dominant force int he high score is the _toes_ because the input combined with the weight is by far the highest. (From the result __0.98__, __0.65__ comes from the _toes_ (8.5 x 0.1) and __0.13__ comes from the _wlrec_ (0.65 x 0.2))


In [14]:
import numpy as np
# Since we are going to be using vectors, we can use numpy, which contains fast vector/matrix
# operations written in C code
# Using numpy, we don't need our own w_sum method:

# redefine our vectors as numpy arrays:
weights = np.array([0.1, 0.2, 0])

toes = np.array([8.5, 9.5, 10, 9])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

def neural_network2(input, weights):
    pred = input.dot(weights)
    return pred

input = np.array([toes[0], wlrec[0], nfans[0]])

pred = neural_network2(input, weights)
print(pred)

0.98


### Predicting multiple Values ###
Instead of predicting only whether the team won or lost, we can also predict whether they are happy/sad AND the %age of the team that is hurt.
<img src="images/0.predicting_multiple_values.PNG">

In [17]:
weights = [0.3, 0.2, 0.9] 

def neural_network(input, weights):
    pred = ele_mul(input,weights)
    return pred

def ele_mul(number,vector):
    output = [0,0,0]
    assert(len(output) == len(vector))
 
    for i in range(len(vector)):
         output[i] = number * vector[i]
    return output

wlrec = [0.65, 0.8, 0.8, 0.9]
input = wlrec[0]
pred = neural_network(input, weights)
print(pred)

[0.195, 0.13, 0.5850000000000001]


### Predicting with multiple Inputs & Outputs
Conceptually, each input node is connected to each output node as you can see from the image below.
For this to be achieved, the input to the network is a vector, and the weights are now a matrix. The i-th row of the weight matrix correspond to the weights for the i-th neuron in the second (output) layer. 
<img src="images/1.multiple_inputs_outputs_net.PNG">
<br>

This is what happens when we pass the first vector as the input:
<img src="images/2.multiple_inputs_outputs_net_1.PNG">

In [20]:
weights = [ 
    [0.1, 0.1, -0.3],#hurt?
    [0.1, 0.2, 0.0], #win?
    [0.0, 1.3, 0.1] ]#sad?

def vect_mat_mul(input, matrix):
    assert(len(input) == len(matrix))
    
    output = [0 for z in range(len(input))]
    
    for i in range(len(input)):
            output[i] = w_sum(input, matrix[i])
    
    return output

def w_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    
    for i in range(len(a)):
        output += (a[i] * b[i])
    
    return output   

def neural_network(input, weights):
    pred = vect_mat_mul(input,weights)
    return pred

toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# input corresponds to every entry
# for the first game of the season
input = [toes[0],wlrec[0],nfans[0]]

pred = neural_network(input,weights)
print(pred)

[0.555, 0.9800000000000001, 0.9650000000000001]


There are two ways to visualise the current architecture. The first is to think of it as 3 weights coming out of each input node -> Each column of the weight matrix is the weights for each input node.
The second way is to think about it as 3 weights going into each output node. -> Each row of the weight matrix contains the weights for each output node. (Look at the image above).
Using the second approach, we can think about the network as three independent dot products between the __same__ input vector and the respective __weights__ of the i-th column of the weight matrix (i = 1,2,3)

_Note: For those of you experienced with Linear Algebra, the more formal definition would store/process weights as column vectors
instead of row vectors. This will be rectified shortly_

### Predicting On Predictions ###
As you can see in the image below, there is nothing preventing us from taking the output of one network, and feeding it as an input to another network. 
Practically this is nothing more than two consecutive vector-matrix multiplications. Below is an image of such an architecture:
<img src="images/3.predicting_on_predictions.PNG">

In [23]:
# ih : input-to-hidden weight matrix
ih_wgt = [ [0.1, 0.2, -0.1],#hid[0]
           [-0.1,0.1, 0.9], #hid[1]
           [0.1, 0.4, 0.1] ]#hid[2]

# hp : hidden-to-prediction weight matrix
hp_wgt = [ [0.3, 1.1, -0.3],#hurt?
           [0.1, 0.2, 0.0], #win?
           [0.0, 1.3, 0.1] ]#sad?

weights = [ih_wgt, hp_wgt]

toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# input corresponds to every entry
# for the first game of the season
input = [toes[0],wlrec[0],nfans[0]]

def neural_network(input, weights):
    hid = vect_mat_mul(input, weights[0])
    pred = vect_mat_mul(hid, weights[1])
    return pred

pred = neural_network(input, weights)
print(pred)

[0.21350000000000002, 0.14500000000000002, 0.5065]


### Quick Intro to numpy ###
Many of the functions that we have written ourselves (dot product, vector-matrix multiplication, etc.) exist in the numpy package. There is no need to reinvent the wheel - we will use these from now on now that we know what happens under the hood.


In [26]:
import numpy as np
a = np.array([0,1,2,3]) # a 1x4 - vector
b = np.array([4,5,6,7]) # another 1x4 vector
c = np.array([[0,1,2,3],[4,5,6,7]]) # a 2x4 matrix

d = np.zeros((2,3)) # a 2x4 matrix of zeros
e = np.random.rand(2,5) # a 2x5 random matrix ( uniform[0,1])
print(a)
print(b)
print(c)
print(d)
print(e)

[0 1 2 3]
[4 5 6 7]
[[0 1 2 3]
 [4 5 6 7]]
[[ 0.  0.  0.]
 [ 0.  0.  0.]]
[[ 0.65659491  0.22999505  0.08892333  0.00313538  0.32656931]
 [ 0.90129919  0.76581604  0.87399633  0.16575859  0.06371743]]


In [27]:
# We can do element-wise multiplication between vectors:
a * b

array([ 0,  5, 12, 21])

In [28]:
# We can even do element-wise multiplication between a vector and a matrix.
# This works if the vector and the matrix have the same number of columns. 
# In this case, the vector is repeated n-times where n is the number of rows of the matrix
a * c

array([[ 0,  1,  4,  9],
       [ 0,  5, 12, 21]])

In [30]:
# The general rule of thumb is that for anything elementwise  (+,-, *, /) to work, the two 
# variables must either have the SAME number of columns OR one of the variables must only
# have 1 column.

e * 2 # This will work because the second variable is a scalar
# e * a # This will throw an error because e has 5 columns while a has 4.

# It is therefore important when reading 'numpy code' to keep in mind the operators and the 
# dimensions (shapes) of the variables. All numpy objects have the convenient .shape attribute
# that returns the variables shape.

print(a.shape)
print(b.shape)
print(c.shape)
print(d.shape)
print(e.shape)

(4,)
(4,)
(2, 4)
(2, 3)
(2, 5)


In [40]:
# One of the most confusing functions is the .dot() function of the numpy library.
# This is because it behaves differently depending on what its arguments are.
# If the arguments are 1-D vectors of equal length, then the dot product is returned:
dp = a.dot(b)
print(dp)

# If (one of) the arguments are matrices, then matrix multiplication is performed. Note 
# that matrix multiplication rules apply (i.e. the number columns of the 1st matrix must
# equal the number of columns of the second)

mat = c.dot(c.T)
print(mat)

# Notice that in the second case .dot returns the same result as the np.matmul() function:
mat2 = np.matmul(c, c.T)
print(mat2)


38
[[ 14  38]
 [ 38 126]]
[[ 14  38]
 [ 38 126]]
