In [1]:

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams["figure.figsize"] = (8, 5)
plt.rcParams["axes.grid"] = True

print("NumPy version:", np.__version__)


NumPy version: 2.3.5



# Chapter 05 – Learning Multiple Weights (Generalized GD)

> Work for this chapter of *Grokking Deep Learning*.

## 1. Quick notes (5–10 bullets, in my own words)

- Gradient descent is a flexible learning algorithm
- It can be used for both multiple inputs and multiple outputs, rather than just 1 to 1
- A neureal network can make multiple predictions with a single input
- If inputs are of significantly different sizes/ranges (say, the number of bathrooms in a house and the square footage of a house) it can force you to use a slower learning rate to prevent divergence. This is a case where you may want to normalize the data so no single input dominates the calculation.
- You can freeze the weight of a single input, and still minimize error through adjustments to other weights. This effectively shifts the curve in space, rather than moving the value along the curve.
- `delta` is a measure of how much higher or lower you want a node's value to be (which direction to shift to minimize error)
- `weight_delta` is a derivative-based estimate of direction and amount to adjust the delta for a given node, accounting for scaling, negative reversal, and stopping
- The dot product of two vectors is a rough measure of similarity between two vectors (e.g. "Is this a 2? A 1? A 9?")

In [10]:

# 2. Code from the book
#
# Recreate the main code examples from the chapter *by typing*,
# not copy/paste. Keep them as close to the book as is reasonable.

# Gradient descent with multiple inputs and outputs
# 1 - DEFINE NETWORK
# toes, % wins, # of fans
weights = [[0.1, 0.1, -0.3], # hurt?
           [0.1, 0.2, 0.0], # win?
           [0.0, 1.3, 0.1] # sad?
]
# calculate weighted sum for input a, weight b
def w_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output

def vect_mat_mul(vect,matrix):
    assert(len(vect) == len(matrix))
    output = [0,0,0]
    for i in range(len(vect)):
        output[i] = w_sum(vect,matrix[i])
    return output

def neural_network(inputs, weights):
    pred = vect_mat_mul(inputs, weights)
    return pred

# 2 - MAKE A PREDICTION
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

hurt = [0.1, 0.0, 0.0, 0.1]
win = [1, 1, 0, 1]
sad = [0.1, 0.0, 0.1, 0.2]

alpha = 0.01
inputs = [toes[0], wlrec[0], nfans[0]]
true = [hurt[0], win[0], sad[0]]

pred = neural_network(inputs, weights)
error = [0, 0, 0]
delta = [0, 0, 0]
for i in range(len(true)):
    error[i] = (pred[i] - true[i]) ** 2
    delta[i] = pred[i] - true[i]

# 3 - COMPARE RESULT
def outer_prod(vec_a, vec_b):
    out = np.zeros((len(vec_a), len(vec_b)))

    for i in range(len(vec_a)):
        for j in range(len(vec_b)):
            out[i][j] = vec_a[i] * vec_b[j]
    return out

weight_deltas = outer_prod(inputs, delta)

# 4 - UPDATE WEIGHTS
for i in range(len(weights)):
    for j in range(len(weights[0])):
        weights[i][j] -= alpha * weight_deltas[i][j]

for i in range(len(weights)):
    for j in range(len(weights[0])):
        print(weights[i][j])

0.061325
0.1017
-0.373525
0.0970425
0.20013
-0.005622500000000002
-0.0054600000000000004
1.30024
0.08962



## 4. 2–3 sentence wrap-up

- This chapter primarily demonstrated how different input and output shapes can utilize gradient descent to learn/calculate weights and minimize error.
- The next chapter dives into practical applications, which should help clarify the methods.
