In [2]:
"""
The order of "pred - true" vs "true - pred" when calculating the "error" and "delta" can technically be changed, but it will affect downstream calculations, particularly the sign of the gradients during backpropagation. Here's how:

1. Error
For squared error loss (pred−true)^2 , the order doesn't matter because squaring removes any negative sign. You could also use (true−pred)^2 and the outcome for the error would be the same.

2. Delta
The "delta" is essentially the derivative of the error with respect to the predicted value. In simple terms, it's the difference between your predicted and true values. The order matters here:

If you use delta=pred−true, a positive delta will indicate your model's prediction is too high, and a negative delta will indicate the prediction is too low.

If you use delta=true−pred, the meaning of positive and negative deltas flips. In this case, a positive delta will indicate that the model's prediction is too low, and a negative delta would indicate the prediction is too high.

So you could change the order, but if you do, be sure to also change how you update your weights. Currently, your weights are updated as follows:

weights[i] -= alpha * weight_deltas[i]

If you switched the order to delta=true−pred, you'd also want to change the weight update rule to:

weights[i] += alpha * weight_deltas[i]  # Notice the '+=' instead of '-='

This way, the weight updates would have the appropriate sign to correct the error.

why if it's "delta=pred−true" then has to be "-=" in "weights[i] -= alpha * weight_deltas[i]"?

The choice of using -= or += in updating the weights depends on how you've defined your loss function and its gradient, which ultimately directs how the optimization step should be carried out.

In the case of δ=pred−true:
A positive delta means the prediction is too high and needs to be reduced.
A negative delta means the prediction is too low and needs to be increased.
To make the adjustments:

For a positive delta, you'd want to decrease the weights slightly to reduce the prediction. Thus, you use -= in the update rule.
For a negative delta, the -= in the update rule actually increases the weights because the weight_deltas themselves would be negative.
In the case of δ=true−pred:
A positive delta means the prediction is too low.
A negative delta means the prediction is too high.
If you were to switch to this definition of delta, you would use += to update the weights in the opposite direction to correct the error.

In summary, the choice of -= or += in the weight update rule depends on how you've defined your error term and which direction you want to adjust your weights in. The goal is to make adjustments that minimize the error.

Q:So in convention we write 'delta = pred - true', but before excute the code how could we know that it's '-=' or '+=' in 'weights[i] -= alpha * weight_deltas[i]'? 
we assume pred is too big then weight_deltas value is suppose to be positive, but what if pred is actually too small then weight_delta value is actually negative?

The sign in the weight update step weights[i] -= alpha * weight_deltas[i] is determined by how you define your loss function and how you calculate the gradients (weight deltas) with respect to that loss function. This is a part of the backpropagation algorithm.

In the example you provided, you used mean squared error as your loss function, calculated as (pred - true)**2. The gradient of this loss with respect to pred is 2*(pred - true), or simplified, pred - true (since we often ignore constant scaling factors like 2 when performing gradient descent).

The gradient tells us the direction in which the function's output (the error) increases the fastest. To minimize the error, we want to go in the opposite direction, which is why we subtract the gradient from the weights: weights[i] -= alpha * weight_deltas[i].

So, even if pred is too small (making delta = pred - true negative), the weight update rule remains the same. The negative delta will result in a negative weight_delta, and when you subtract a negative number, you're effectively moving the weight in the positive direction, 
which is what you'd want in the case that pred is smaller than true.

In summary, the convention takes care of these details automatically, and you don't have to manually check the signs or change the update rule.
"""

weights = [0.1, 0.2, -.1] 

toes =  [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

win_or_lose_binary = [1, 1, 0, 1]

true = win_or_lose_binary[0]

alpha = 0.01

input = [toes[0],wlrec[0],nfans[0]]

def w_sum(a,b):
    assert(len(a) == len(b))
    output = 0

    for i in range(len(a)):
        output += (a[i] * b[i])

    return output

def neural_network(input,weights):
    pred = w_sum(input,weights)
    return pred

# Input corresponds to every entry
# for the first game of the season.

pred = neural_network(input,weights)
error = (pred - true) ** 2
delta = pred - true

def ele_mul(number,vector):
    output = [0,0,0]

    assert(len(output) == len(vector))

    for i in range(len(vector)):
        output[i] = number * vector[i]

    return output

weight_deltas = ele_mul(delta,input)

for i in range(len(weights)):
    weights[i] -= alpha * weight_deltas[i]
    
print("Weights:" + str(weights))
print("Weight Deltas:" + str(weight_deltas))

Weights:[0.1119, 0.20091, -0.09832]
Weight Deltas:[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]


In [12]:
#Let's watch several steps of leaning
def neural_network(input, weights):
  out = 0
  for i in range(len(input)):
    out += (input[i] * weights[i])
  return out

def ele_mul(scalar, vector):
  out = [0,0,0]
  for i in range(len(out)):
    out[i] = vector[i] * scalar
  return out

toes =  [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

win_or_lose_binary = [1, 1, 0, 1]
true = win_or_lose_binary[0]

alpha = 0.01
weights = [0.1, 0.2, -.1]
input = [toes[0],wlrec[0],nfans[0]]

for iter in range (3):
    pred = neural_network(input,weights)
    error = (pred - true)**2
    delta = pred - true

    weight_deltas = ele_mul(delta,input)

    print("Interation: " + str(iter+1))
    print("Pred: " + str(pred))
    print("Error: " + str(error))
    print("Delta: " + str(delta))
    print("Weights:" + str(weights))
    print("Weights deltas: " + str(weight_deltas))
    print()

    for i in range(len(weights)):
        weights[i] -= alpha * weight_deltas[i]

Interation: 1
Pred: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights:[0.1, 0.2, -0.1]
Weights deltas: [-1.189999999999999, -0.09099999999999994, -0.16799999999999987]

Interation: 2
Pred: 0.9637574999999999
Error: 0.0013135188062500048
Delta: -0.036242500000000066
Weights:[0.1119, 0.20091, -0.09832]
Weights deltas: [-0.30806125000000056, -0.023557625000000044, -0.04349100000000008]

Interation: 3
Pred: 0.9906177228125002
Error: 8.802712522307997e-05
Delta: -0.009382277187499843
Weights:[0.11498061250000001, 0.20114557625, -0.09788509000000001]
Weights deltas: [-0.07974935609374867, -0.006098480171874899, -0.011258732624999811]



In [14]:
#Freezing One Weight - What Does It Do?
def neural_network(input, weights):
  out = 0
  for i in range(len(input)):
    out += (input[i] * weights[i])
  return out

def ele_mul(scalar, vector):
  out = [0,0,0]
  for i in range(len(out)):
    out[i] = vector[i] * scalar
  return out

toes =  [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

win_or_lose_binary = [1, 1, 0, 1]
true = win_or_lose_binary[0]

alpha = 0.3
weights = [0.1, 0.2, -.1]
input = [toes[0],wlrec[0],nfans[0]]

for iter in range (3):
    pred = neural_network(input,weights)
    error = (pred - true)**2
    delta = pred - true

    weight_deltas = ele_mul(delta,input)
    weight_deltas[0] = 0
    
    print("Interation: " + str(iter+1))
    print("Pred: " + str(pred))
    print("Error: " + str(error))
    print("Delta: " + str(delta))    
    print("Weights:" + str(weights))
    print("Weights deltas: " + str(weight_deltas))
    print()

    for i in range(len(weights)):
        weights[i] -= alpha * weight_deltas[i]

Interation: 1
Pred: 0.8600000000000001
Error: 0.01959999999999997
Delta: -0.1399999999999999
Weights:[0.1, 0.2, -0.1]
Weights deltas: [0, -0.09099999999999994, -0.16799999999999987]

Interation: 2
Pred: 0.9382250000000001
Error: 0.003816150624999989
Delta: -0.06177499999999991
Weights:[0.1, 0.2273, -0.04960000000000005]
Weights deltas: [0, -0.040153749999999946, -0.07412999999999989]

Interation: 3
Pred: 0.97274178125
Error: 0.000743010489422852
Delta: -0.027258218750000007
Weights:[0.1, 0.239346125, -0.02736100000000008]
Weights deltas: [0, -0.017717842187500006, -0.032709862500000006]



In [17]:
# Gradient Descent Learning with Multiple Outputs
# Instead of predicting just 
# whether the team won or lost, 
# now we're also predicting whether
# they are happy/sad AND the
# percentage of the team that is
# hurt. We are making this
# prediction using only
# the current win/loss record.

weights = [0.3, 0.2, 0.9] 

wlrec = [0.65, 1.0, 1.0, 0.9] # win or lose record

hurt  = [0.1, 0.0, 0.0, 0.1]
win   = [  1,   1,   0,   1]
sad   = [0.1, 0.0, 0.1, 0.2]

true = [hurt[0],win[0],sad[0]]
input = wlrec[0]

alpha = 0.1

error = [0, 0, 0] 
delta = [0, 0, 0]

def ele_mul(scalar,vector):
    output = [0,0,0]
    for i in range(len(vector)):
        output[i] = scalar * vector[i]
    return output

def neural_network(input,weights):
    predication = ele_mul(input,weights)
    return predication

pred = neural_network(input,weights)

for i in range(len(true)): 
    error[i] = (pred[i] - true[i])**2
    delta[i] = pred[i] - true[i]

def scalar_ele_mul(scalar,vector):
    output = [0,0,0]
    assert(len(output) == len(vector))
    for i in range(len(vector)):
        output[i] = scalar * vector[i]
    return output

weight_deltas = scalar_ele_mul(input,delta)

for i in range(len(weights)):
    weights[i] -= alpha * weight_deltas[i]

print("Weights: ", str(weights))
print("Weight_deltas: ", str(weight_deltas))

    


Weights:  [0.293825, 0.25655, 0.868475]
Weight_deltas:  [0.061750000000000006, -0.5655, 0.3152500000000001]


In [24]:
#Gradient Descent with Multiple Inputs & Outputs
            #toes %win #fans
weights = [ [0.1, 0.1, -0.3],#hurt?
            [0.1, 0.2, 0.0], #win?
            [0.0, 1.3, 0.1] ]#sad?

toes  = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

hurt  = [0.1, 0.0, 0.0, 0.1]
win   = [  1,   1,   0,   1]
sad   = [0.1, 0.0, 0.1, 0.2]

alpha = 0.01
error = [0, 0, 0] 
delta = [0, 0, 0]

input = [toes[0],wlrec[0],nfans[0]]
true = [hurt[0],win[0],sad[0]]

def w_sum(vectora,vectorb):
    assert(len(vectora) == len(vectorb))
    output = 0
    for i in range(len(vectora)):
        output += (vectora[i] * vectorb[i])
    return output

def vect_mul_matrix(vector,matrix):
    output = [0,0,0]
    assert(len(vector) == len(matrix))
    for i in range(len(matrix)):
        output[i] = w_sum(vector,matrix[i])
    return output

def neural_network(input,weights):
    pred = vect_mul_matrix(input,weights)
    return pred

pred = neural_network(input,weights)

print("Pred: " + str(pred))


Pred: [0.555, 0.9800000000000001, 0.9650000000000001]


In [33]:
import numpy as np
#Numpy version Gradient Descent with Multiple Inputs & Outputs
            #toes %win #fans
weights = [ [0.1, 0.1, -0.3],#hurt?
            [0.1, 0.2, 0.0], #win?
            [0.0, 1.3, 0.1] ]#sad?

toes  = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

hurt  = [0.1, 0.0, 0.0, 0.1]
win   = [  1,   1,   0,   1]
sad   = [0.1, 0.0, 0.1, 0.2]

alpha = 0.01
error = [0, 0, 0] 
delta = [0, 0, 0]

input = [toes[0],wlrec[0],nfans[0]]
true = [hurt[0],win[0],sad[0]]
"""
def w_sum(vectora,vectorb):
    output = 0
    assert(len(vectora) == len(vectorb))
    for i in range(len(vectora)):
        output += (vectora[i] * vectorb[i])
    return output

def vect_mul_matrix(vector,matrix):
    output = [0,0,0]
    assert(len(vector) == len(matrix))
    for i in range(len(matrix)):
        output[i] = w_sum(vector,matrix[i]) 
    return output
"""

def neural_network(input,weights):
    pred = np.dot(input,np.array(weights).T) #also can write as vector.dot(np.array(matrix).T), also can use above vect_mul_matrix
    return pred

pred = neural_network(input,weights)

for i in range(len(true)):
    error[i] = (pred[i] - true[i])**2
    delta[i] = pred[i] - true[i] 

def outer_prod(a, b): #calculate every elements of result matrix by outer_prod 
    
    # just a matrix of zeros
    out = np.zeros((len(a), len(b)))

    for i in range(len(a)):
        for j in range(len(b)):
            out[i][j] = a[i] * b[j]
    return out

weight_deltas = outer_prod(delta,input)

for i in range(len(weights)):
    for j in range(len(weights[0])):
        weights[i][j] -= alpha * weight_deltas[i][j]

print("Weights: " + str(weights))
print("Weight_deltas: " + str(weight_deltas))


Weights: [[0.061325, 0.0970425, -0.30546], [0.1017, 0.20013, 0.00023999999999999887], [-0.07352500000000001, 1.2943775, 0.08962]]
Weight_deltas: [[ 3.8675   0.29575  0.546  ]
 [-0.17    -0.013   -0.024  ]
 [ 7.3525   0.56225  1.038  ]]
