# Gradient Descent

[Gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) is the fundamental algorithm used to train neural networks. The following example uses gradient descent to find the optimum weights and biases for the simple multilayer perceptron (MLP) shown below. The network contains three layers: an input layer that accepts two values, a hidden layer with three neurons, and an output layer with one neuron. The network can be trained to transform two inputs into an output — for example, to add two inputs or square the difference between two inputs. Values forwarded from the hidden layer to the output layer are transformed using the [ReLU](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) activation function, which turns negative numbers into zeros.

![](Images/network.png)

Let's begin by defining a class named `NeuralNetwork` that encapculates a network of this type. The network contains 13 trainable parameters: nine weights and four biases. `w0` and `w1` in the diagram correspond to `weights[0]` and `weights[1]` in the `NeuralNetwork` class, while `b0` and `b1` correspond to `biases[0]` and `biases[1]`.

In [1]:
import numpy as np

class NeuralNetwork():
    def __init__(self):
        self.weights = np.random.uniform(-1, 1, 9)
        self.biases = np.zeros(4)
        
    def show_weights_and_biases(self):
        print(f'h1: weights=[{self.weights[0]}, {self.weights[1]}], bias={self.biases[0]}')
        print(f'h2: weights=[{self.weights[2]}, {self.weights[3]}], bias={self.biases[1]}')
        print(f'h3: weights=[{self.weights[4]}, {self.weights[5]}], bias={self.biases[2]}')
        print(f'y: weights=[{self.weights[6]}, {self.weights[7]}, {self.weights[8]}], bias={self.biases[3]}')
        
    def relu(self, x):
        return max(0, x)
    
    def update_weights_and_biases(self, x1, x2, y, lr=0.01):
        prediction = self.predict(x1, x2)
        delta = prediction - y

        # Compute intermediate values for neurons in the hidden layer
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]

        # Compute deltas for 9 weights
        weight_deltas = np.empty(9)
        weight_deltas[0] = lr * (x1 * delta * self.weights[6])
        weight_deltas[1] = lr * (x2 * delta * self.weights[6])
        weight_deltas[2] = lr * (x1 * delta * self.weights[7])
        weight_deltas[3] = lr * (x2 * delta * self.weights[7])
        weight_deltas[4] = lr * (x1 * delta * self.weights[8])
        weight_deltas[5] = lr * (x2 * delta * self.weights[8])
        weight_deltas[6] = lr * delta * h1
        weight_deltas[7] = lr * delta * h2
        weight_deltas[8] = lr * delta * h3

        # Compute deltas for 4 biases
        bias_deltas = np.empty(4)
        bias_deltas[0] = lr * delta * self.weights[6]
        bias_deltas[1] = lr * delta * self.weights[7]
        bias_deltas[2] = lr * delta * self.weights[8]
        bias_deltas[3] = lr * delta

        # Update weights
        for i in range(len(self.weights)):
            self.weights[i] -= weight_deltas[i]
        
        # Update biases
        for i in range(len(self.biases)):
            self.biases[i] -= bias_deltas[i]

        # Show the results
        prediction = self.predict(x1, x2)
        error = (prediction - y) ** 2
        print(f'Prediction: ({x1}, {x2}) => {prediction}, Error: {error}')          
        return error
    
    def predict(self, x1, x2):
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]
        y = (self.relu(h1) * self.weights[6]) + (self.relu(h2) * self.weights[7]) + (self.relu(h3) * self.weights[8]) + self.biases[3]
        return y

Create an instance of `NeuralNetwork` and show the randomly initialized weights and biases. Note that biases are simply initialized to 0.

In [2]:
model = NeuralNetwork()
model.show_weights_and_biases()

h1: weights=[-0.4924165790056405, -0.38655527171258375], bias=0.0
h2: weights=[0.5751574698860336, 0.7520196646985469], bias=0.0
h3: weights=[-0.02893437180662395, 0.8895173170887947], bias=0.0
y: weights=[-0.9263179866814637, 0.39773121589585303, -0.1935542615208219], bias=0.0


Train the network for 1,000 iterations with samples that teach it how to add numbers together. The following code performs 5,000 forward passes through the network and 5,000 backpropagation passes:

In [3]:
x = np.array([[2, 2], [5, 1], [0, 4], [2, 8], [3, 0]])
y = np.array([4, 6, 4, 10, 3])

for i in range(1000):
    for j in range(len(x)):
        model.update_weights_and_biases(x[j][0], x[j][1], y[j], 0.01)

Prediction: (2, 2) => 1.1480560322538802, Error: 8.133584395163481
Prediction: (5, 1) => 2.824363721381465, Error: 10.084665774078177
Prediction: (0, 4) => 2.4748763254467208, Error: 2.326002222682897
Prediction: (2, 8) => 12.805117121435398, Error: 7.868682064970012
Prediction: (3, 0) => 2.716813885058114, Error: 0.08019437569587903
Prediction: (2, 2) => 4.439745863097399, Error: 0.19337642411127653
Prediction: (5, 1) => 5.764943830811861, Error: 0.05525140267340312
Prediction: (0, 4) => 4.841577212001709, Error: 0.7082522037605689
Prediction: (2, 8) => 8.929014936887851, Error: 1.1470090054093331
Prediction: (3, 0) => 2.341666661005539, Error: 0.43340278523159564
Prediction: (2, 2) => 3.6045289544042847, Error: 0.15639734790456836
Prediction: (5, 1) => 5.4388740505667705, Error: 0.3148623311273432
Prediction: (0, 4) => 4.2164006885032155, Error: 0.046829257984665695
Prediction: (2, 8) => 10.11132814981876, Error: 0.012393956942068357
Prediction: (3, 0) => 2.7840513999470797, Error: 0

Prediction: (0, 4) => 3.9989085673594578, Error: 1.1912252088409926e-06
Prediction: (2, 8) => 9.996982354777574, Error: 9.106182688433021e-06
Prediction: (3, 0) => 3.008201278440997, Error: 6.726096806676301e-05
Prediction: (2, 2) => 3.9957362530078937, Error: 1.817953841269551e-05
Prediction: (5, 1) => 5.998127541922414, Error: 3.5060992523177365e-06
Prediction: (0, 4) => 3.9989033714189954, Error: 1.2025942446760884e-06
Prediction: (2, 8) => 9.996977005653662, Error: 9.138494817992944e-06
Prediction: (3, 0) => 3.00819760949214, Error: 6.720080138562185e-05
Prediction: (2, 2) => 3.9957322842490757, Error: 1.82133977306876e-05
Prediction: (5, 1) => 5.998128745559968, Error: 3.501593179339888e-06
Prediction: (0, 4) => 3.9988982810499016, Error: 1.2137846450058384e-06
Prediction: (2, 8) => 9.996971765208158, Error: 9.170205954519668e-06
Prediction: (3, 0) => 3.0081940151119104, Error: 6.714188365421641e-05
Prediction: (2, 2) => 3.9957283961166854, Error: 1.8246599735948377e-05
Prediction

Prediction: (5, 1) => 5.998165105635619, Error: 3.3668373284383806e-06
Prediction: (0, 4) => 3.9987444699800117, Error: 1.576355631091944e-06
Prediction: (2, 8) => 9.996813416955158, Error: 1.0154311501675118e-05
Prediction: (3, 0) => 3.0080854284033793, Error: 6.537415246617331e-05
Prediction: (2, 2) => 3.9956109030038998, Error: 1.9264172441176143e-05
Prediction: (5, 1) => 5.998165545538867, Error: 3.365223169971806e-06
Prediction: (0, 4) => 3.998742608353019, Error: 1.581033753897365e-06
Prediction: (2, 8) => 9.996811500384382, Error: 1.0166529798795858e-05
Prediction: (3, 0) => 3.0080841146669175, Error: 6.535290994787059e-05
Prediction: (2, 2) => 3.9956094807438713, Error: 1.9276659338436523e-05
Prediction: (5, 1) => 5.998165976495984, Error: 3.3636422132837874e-06
Prediction: (0, 4) => 3.998740784553679, Error: 1.5856235402531557e-06
Prediction: (2, 8) => 9.996809622757048, Error: 1.017850695234326e-05
Prediction: (3, 0) => 3.008082827651123, Error: 6.53321028377604e-05
Predictio

Prediction: (0, 4) => 3.998653118750597, Error: 1.8140890999935903e-06
Prediction: (2, 8) => 9.996719368013864, Error: 1.0762546228461813e-05
Prediction: (3, 0) => 3.008021259665297, Error: 6.434060661812212e-05
Prediction: (2, 2) => 3.995541006389862, Error: 1.9882624015249694e-05
Prediction: (5, 1) => 5.998186618983193, Error: 3.288350712115396e-06
Prediction: (0, 4) => 3.998653112338226, Error: 1.8141063734394488e-06
Prediction: (2, 8) => 9.99671936140755, Error: 1.0762589574273645e-05
Prediction: (3, 0) => 3.008021256435577, Error: 6.434055480528626e-05
Prediction: (2, 2) => 3.9955410010330903, Error: 1.9882671786901867e-05
Prediction: (5, 1) => 5.998186620150856, Error: 3.2883464772830588e-06
Prediction: (0, 4) => 3.9986531060514046, Error: 1.814123308763016e-06
Prediction: (2, 8) => 9.996719354930486, Error: 1.0762632072129279e-05
Prediction: (3, 0) => 3.0080212532945416, Error: 6.434050441519478e-05
Prediction: (2, 2) => 3.9955409957722074, Error: 1.9882718703471845e-05
Predicti

Prediction: (3, 0) => 3.0080214034503436, Error: 6.434291331318475e-05
Prediction: (2, 2) => 3.995540572816147, Error: 1.988649080808664e-05
Prediction: (5, 1) => 5.998186606131865, Error: 3.2883973209890314e-06
Prediction: (0, 4) => 3.9986527407815937, Error: 1.8151074015807074e-06
Prediction: (2, 8) => 9.99671897743148, Error: 1.0765109095142153e-05
Prediction: (3, 0) => 3.008021404576605, Error: 6.434293138157867e-05
Prediction: (2, 2) => 3.995540572170685, Error: 1.98864965648678e-05
Prediction: (5, 1) => 5.99818660587103, Error: 3.288398266981949e-06
Prediction: (0, 4) => 3.998652740537531, Error: 1.8151080592118022e-06
Prediction: (2, 8) => 9.996718977175746, Error: 1.0765110773275102e-05
Prediction: (3, 0) => 3.0080214057029986, Error: 6.434294945209823e-05
Prediction: (2, 2) => 3.995540571525373, Error: 1.9886502320315008e-05
Prediction: (5, 1) => 5.998186605610152, Error: 3.2883992131328437e-06
Prediction: (0, 4) => 3.9986527402936565, Error: 1.8151087163368513e-06
Prediction:

Prediction: (3, 0) => 3.0080214429323955, Error: 6.434354671767745e-05
Prediction: (2, 2) => 3.9955405502963113, Error: 1.9886691659729435e-05
Prediction: (5, 1) => 5.998186596981825, Error: 3.2884305063250214e-06
Prediction: (0, 4) => 3.998652732329741, Error: 1.8151301753246657e-06
Prediction: (2, 8) => 9.996718968574218, Error: 1.0765167216970955e-05
Prediction: (3, 0) => 3.0080214440619417, Error: 6.434356483886005e-05
Prediction: (2, 2) => 3.995540549654593, Error: 1.9886697383150226e-05
Prediction: (5, 1) => 5.9981865967199015, Error: 3.288431456271967e-06
Prediction: (0, 4) => 3.998652732090416, Error: 1.8151308201944046e-06
Prediction: (2, 8) => 9.99671896832339, Error: 1.0765168862912998e-05
Prediction: (3, 0) => 3.0080214451915497, Error: 6.43435829610355e-05
Prediction: (2, 2) => 3.995540549012951, Error: 1.9886703105890583e-05
Prediction: (5, 1) => 5.998186596457958, Error: 3.288432406289917e-06
Prediction: (0, 4) => 3.9986527318511826, Error: 1.8151314648177559e-06
Predict

Show the weights and biases that were computed during training.

In [4]:
model.show_weights_and_biases()

h1: weights=[-1.1415756020629781, -0.7091113852168277], bias=0.010429828071970978
h2: weights=[0.9646399390134428, 0.8975537215954179], bias=-0.0349322410035984
h3: weights=[-0.04950824730916303, 0.8125798983398934], bias=-0.02742162840895683
y: weights=[-1.4539939520296932, 1.0404260668386875, 0.08130781140466084], bias=0.03345626326573472


Ask the model to add 4 and 4.

In [5]:
model.predict(4, 4)

7.992956304119273

This is a *very* simple network with just nine trainable parameters, and it updates its weights and biases after every training sample. Imagine how much longer training would take if the network had 100 million parameters, which isn't uncommon in deep learning. In practice, data scientists run batches of training samples through the network and update the weights and biases after each batch, a technique known as *mini-batch gradient descent*. Still, this network proves the principle that gradient descent can converge on a reasonable set of weights and biases, and it shows in a very limited way how gradient descent is enacted.