### Gradient Descent With Multiple Inputs ###
In the previous chapter, we have used gradient descent to update the weight of a simple network with a single weight and single input. We will now show how the same tecnhique can be used in the case of networks with multiple inputs/weights.

In [8]:
def w_sum(a,b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += a[i] * b[i]
    return output

def ele_mul(number, vector):
    """ multiplies all the elements of a @vector with a @number"""
    output = [number*elem for elem in vector]
    return output

weights = [0.1, 0.2, -.1]

def neural_network(input, weights):
    pred = w_sum(input, weights)
    return pred

# These are our features
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# These are our observed wins/loses that eventually will be predicted
win_or_lose_binary = [1, 1, 0, 1]

# Create the first input vector:
input = [toes[0], wlrec[0], nfans[0]]
print(input)
# Get the first label
true = win_or_lose_binary[0]

pred = neural_network(input, weights)
error = (pred - true) ** 2
delta = pred - true

# We still multiply the (pred - true) * input as before. The only difference
# is that now input is a vector and as a result weight_deltas is also a vector:
weight_deltas = ele_mul(delta, input)

alpha = 0.01

for i in range(len(weights)):
    weights[i] -= alpha * weight_deltas[i]
print("Error: {}, Delta: {}".format(error, delta))
print(weight_deltas)


[8.5, 0.65, 1.2]
Error: 0.01959999999999997, Delta: -0.1399999999999999
[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]


One might wonder why the above update rule works. Why is it ok to use the single delta multiplied by the input-vector to get the  weight_deltas vector? <br>
Well, the answer is clear if we calculate the partial derivatives of the error w.r.t each weight. It is evident then that the partial derivative of each weight is nothing more but the (SAME) delta value multiplied by the input value of each weight! <br><br>
Below are the graphs of the error plotted against each weight (with the other two weights kept constant <- partial derivative):
<img src="images/7.Gradient_Descent_Multiple_weights.PNG">
Notice that the error graph of weight1 is much steeper than the rest. This is because although all have the same delta, the first weight has a much larger input value. This actually forced us to keep the alpha lower than 0.1 to 0.01 otherwise the first weight does not converge. Also, large input values mean that some inputs/weights do the most learning and the rest are pretty much ignored since they contribute little to the final error. This is few of the reasons why all inputs should be normalised before we run them through the network.

<br>
Now, lets modify the code above to watch several steps of learning:

In [13]:
alpha = 0.01

for i in range(5):
    pred = neural_network(input, weights)
    error = (pred - true) ** 2
    delta = (pred - true)
    
    weight_deltas = ele_mul(delta, input)
    
    for j in range(len(weights)):
        weights[j] -= alpha * weight_deltas[j]
        
    print("At iteration {}:\n".format(i))
    print("Prediction: {}, Error: {}".format(pred, error))
    print("Updated Weights: {}".format(weights))
    print("------------------------\n")

At iteration 0:

Prediction: 0.9338317682295959, Error: 0.004378234895621914
Updated Weights: [0.11507515561204433, 0.19813606143916426, -0.10338046969783081]
------------------------

At iteration 1:

Prediction: 0.9828706990004368, Error: 0.00029341295273363764
Updated Weights: [0.1165311461970072, 0.19824740189566142, -0.10317491808583605]
------------------------

At iteration 2:

Prediction: 0.9955656522037379, Error: 1.9663440378214202e-05
Updated Weights: [0.11690806575968948, 0.19827622515633714, -0.10312170591228091]
------------------------

At iteration 3:

Prediction: 0.9988520582142426, Error: 1.3177703434878639e-06
Updated Weights: [0.11700564081147886, 0.19828368677794456, -0.10310793061085183]
------------------------

At iteration 4:

Prediction: 0.9997028265702121, Error: 8.831204737187771e-08
Updated Weights: [0.11703090055301084, 0.19828561840523817, -0.10310436452969438]
------------------------



### Gradient Descent Learning with Multiple Outputs ###
We will now consider the case where multiple predictions. are made from a single input value.<br>
Consider the network:
<img src="images/8.Gradient_Descent_Multiple_predictions.PNG">

The main difference is that now, we have 3 errors and 3 deltas instead of 1:

In [16]:
def neural_network(input, weights):
    # Now we use the ele_mul function as we want the result
    # to be vector instead of a single value: 
    pred = ele_mul(input, weights)
    return pred

# Define the weights
weights = [0.3, 0.2, 0.9] 
alpha = 0.1

# input-vector:
wlrec = [0.9, 1.0, 1.0, 0.9]

# now we have 3 label-vectors:
hurt = [0.1, 0.0, 0.0, 0.1]
win = [ 1, 1, 0, 1]
sad = [0.1, 0.0, 0.1, 0.2]

# First observation is now a value instead of a vector:
input = wlrec[0]

# This is now a vector instead of a single value
true = [hurt[0], win[0], sad[0]]


error = [0, 0, 0]
delta = [0, 0, 0]

weight_deltas = [0, 0, 0]

for iter in range(5):
    
    pred = neural_network(input,weights)
    for i in range(len(pred)):
        error[i] = (pred[i] - true[i]) ** 2
        delta[i] = pred[i] - true[i]
        weight_deltas[i] = delta[i] * input

    for i in range(len(weights)):
        weights[i] -= alpha*weight_deltas[i]
    
    print("At iteration {}:".format(iter))
    print("Prediction: {}".format(pred))
    print("Deltas: {}".format(delta))
    print("Error: {}".format(error))

    print("Weights: {}".format(weights))
    print("---------------------\n")

At iteration 0:
Prediction: [0.27, 0.18000000000000002, 0.81]
Deltas: [0.17, -0.82, 0.7100000000000001]
Error: [0.028900000000000006, 0.6723999999999999, 0.5041000000000001]
Weights: [0.2847, 0.27380000000000004, 0.8361000000000001]
---------------------

At iteration 1:
Prediction: [0.25623, 0.24642000000000006, 0.7524900000000001]
Deltas: [0.15623, -0.7535799999999999, 0.6524900000000001]
Error: [0.024407812900000003, 0.5678828163999998, 0.42574320010000016]
Weights: [0.2706393, 0.34162220000000004, 0.7773759]
---------------------

At iteration 2:
Prediction: [0.24357537000000004, 0.30745998, 0.69963831]
Deltas: [0.14357537000000004, -0.69254002, 0.59963831]
Error: [0.02061388687063691, 0.4796116793016004, 0.3595661028196561]
Weights: [0.25771751670000004, 0.40395080180000004, 0.7234084521]
---------------------

At iteration 3:
Prediction: [0.23194576503000003, 0.36355572162000005, 0.6510676068900001]
Deltas: [0.13194576503000002, -0.63644427838, 0.5510676068900001]
Error: [0.01740

### Gradient Descent with Multiple Inputs & Outputs ###
Combining the previous two examples we can now create the following architecture:
<img src="images/9.Gradient_Descent_Multiple_Input_Output.PNG">

Now, both the input to the network and the output are vectors. This means that the weights of the network must be a matrix. The matrix is constructed in such way so that each row corresponds to the 3 input values going into each output node


In [21]:
def vect_mat_mul(a,b):
    assert(len(a)==len(b))
    output = [0 for elem in a]
    for i in range(len(a)):
        output[i] = w_sum(a, b[i])
    
    return output 
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
hurt = [0.1, 0.0, 0.0, 0.1]
win = [ 1, 1, 0, 1]
sad = [0.1, 0.0, 0.1, 0.2]

weights = [  [0.1, 0.1, -0.3],#hurt?
             [0.1, 0.2, 0.0], #win?
             [0.0, 1.3, 0.1] ]#sad?

input = [toes[0],wlrec[0],nfans[0]]

print(vect_mat_mul(input, weights))
print(input)

[0.5800000000000001, 1.03, 1.29]
[8.5, 0.9, 1.2]
