# Gradient Descent

[Gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) is the fundamental algorithm used to train neural networks. The following example uses gradient descent to find the optimum weights and biases for the simple multilayer perceptron (MLP) shown below. The network contains three layers: an input layer that accepts two values, a hidden layer with three neurons, and an output layer with one neuron. The network can be trained to transform two inputs into an output — for example, to add two inputs or square the difference. Values forwarded from the hidden layer to the output layer are transformed using the [ReLU](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) activation function, which turns negative numbers into zeros.

![](Images/network.png)

Let's begin by defining a class named `NeuralNetwork` that encapculates a network of this type. The network contains 13 trainable parameters: nine weights and four biases. `w0` and `w1` in the diagram correspond to `weights[0]` and `weights[1]` in the `NeuralNetwork` class, while `b0` and `b1` correspond to `biases[0]` and `biases[1]`.

In [1]:
import numpy as np

class NeuralNetwork():
    def __init__(self):
        self.weights = np.random.uniform(-1, 1, 9)
        self.biases = np.zeros(4)
        
    def show_weights_and_biases(self):
        print(f'h1: weights=[{self.weights[0]}, {self.weights[1]}], bias={self.biases[0]}')
        print(f'h2: weights=[{self.weights[2]}, {self.weights[3]}], bias={self.biases[1]}')
        print(f'h3: weights=[{self.weights[4]}, {self.weights[5]}], bias={self.biases[2]}')
        print(f'y: weights=[{self.weights[6]}, {self.weights[7]}, {self.weights[8]}], bias={self.biases[3]}')
        
    def relu(self, x):
        return max(0, x)
    
    def update_weights_and_biases(self, x1, x2, y, lr=0.01):
        prediction = self.predict(x1, x2)
        delta = prediction - y

        # Compute intermediate values for neurons in the hidden layer
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]

        # Compute deltas for 9 weights
        weight_deltas = np.empty(9)
        weight_deltas[0] = lr * (x1 * delta * self.weights[6])
        weight_deltas[1] = lr * (x2 * delta * self.weights[6])
        weight_deltas[2] = lr * (x1 * delta * self.weights[7])
        weight_deltas[3] = lr * (x2 * delta * self.weights[7])
        weight_deltas[4] = lr * (x1 * delta * self.weights[8])
        weight_deltas[5] = lr * (x2 * delta * self.weights[8])
        weight_deltas[6] = lr * delta * h1
        weight_deltas[7] = lr * delta * h2
        weight_deltas[8] = lr * delta * h3

        # Compute deltas for 4 biases
        bias_deltas = np.empty(4)
        bias_deltas[0] = lr * delta * self.weights[6]
        bias_deltas[1] = lr * delta * self.weights[7]
        bias_deltas[2] = lr * delta * self.weights[8]
        bias_deltas[3] = lr * delta

        # Update weights and biases
        self.weights -= weight_deltas
        self.biases -= bias_deltas

        # Show the results
        prediction = self.predict(x1, x2)
        error = (prediction - y) ** 2
        print(f'Prediction: ({x1}, {x2}) => {prediction}, Error: {error}')          
        return error
    
    def predict(self, x1, x2):
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]
        y = (self.relu(h1) * self.weights[6]) + (self.relu(h2) * self.weights[7]) + (self.relu(h3) * self.weights[8]) + self.biases[3]
        return y

Create an instance of `NeuralNetwork` and show the randomly initialized weights and biases. Note that biases are simply initialized to 0.

In [2]:
model = NeuralNetwork()
model.show_weights_and_biases()

h1: weights=[0.17416066024473253, -0.9507834130545751], bias=0.0
h2: weights=[-0.4912708360232585, -0.8934518508895994], bias=0.0
h3: weights=[0.5344938530405638, 0.7728034976279232], bias=0.0
y: weights=[-0.31760013737168746, 0.7920732430571527, 0.591660283931629], bias=0.0


Train the network for 1,000 iterations with samples that teach it how to add numbers together. The following code performs 5,000 forward passes through the network and 5,000 backpropagation passes:

In [3]:
x = np.array([[2, 2], [5, 1], [0, 4], [2, 8], [3, 0]])
y = np.array([4, 6, 4, 10, 3])

for i in range(1000):
    for j in range(len(x)):
        model.update_weights_and_biases(x[j][0], x[j][1], y[j], 0.01)

Prediction: (2, 2) => 1.8248377929471187, Error: 4.731330626991162
Prediction: (5, 1) => 3.418153270513197, Error: 6.665932534561701
Prediction: (0, 4) => 2.9919361690113138, Error: 1.0161926873475866
Prediction: (2, 8) => 10.975300317764704, Error: 0.9512107098319329
Prediction: (3, 0) => 2.564998787690121, Error: 0.1892260547110646
Prediction: (2, 2) => 4.122269617391775, Error: 0.014949859337131102
Prediction: (5, 1) => 5.744376634663677, Error: 0.0653433049058672
Prediction: (0, 4) => 4.584125477123367, Error: 0.34120257302460094
Prediction: (2, 8) => 9.524442930166174, Error: 0.2261545266689345
Prediction: (3, 0) => 2.5303500105450927, Error: 0.22057111259499457
Prediction: (2, 2) => 3.81400304864871, Error: 0.034594865911974144
Prediction: (5, 1) => 5.664654095298633, Error: 0.11245687579997857
Prediction: (0, 4) => 4.2537133298759535, Error: 0.06437045375674441
Prediction: (2, 8) => 9.96303457453011, Error: 0.0013664426801700508
Prediction: (3, 0) => 2.8236507169073986, Error: 0

Prediction: (0, 4) => 4.000295489237604, Error: 8.731388953961387e-08
Prediction: (2, 8) => 10.000178817634165, Error: 3.1975746288306814e-08
Prediction: (3, 0) => 3.00020188878468, Error: 4.075908137964538e-08
Prediction: (2, 2) => 4.000201462085348, Error: 4.058697183289583e-08
Prediction: (5, 1) => 5.999934583393956, Error: 4.279332346327491e-09
Prediction: (0, 4) => 4.000289700861868, Error: 8.39265893668515e-08
Prediction: (2, 8) => 10.00017531497858, Error: 3.0735341714711826e-08
Prediction: (3, 0) => 3.0001979340076472, Error: 3.917787138329248e-08
Prediction: (2, 2) => 4.000197515664509, Error: 3.901243772655742e-08
Prediction: (5, 1) => 5.999935865052364, Error: 4.1132915083033395e-09
Prediction: (0, 4) => 4.000284025860055, Error: 8.067068918007292e-08
Prediction: (2, 8) => 10.000171880918943, Error: 2.954305029670592e-08
Prediction: (3, 0) => 3.0001940566889362, Error: 3.7657998520896376e-08
Prediction: (2, 2) => 4.000193646538417, Error: 3.749898184085332e-08
Prediction: (5

Prediction: (3, 0) => 3.0001115191983447, Error: 1.2436531599452099e-08
Prediction: (2, 2) => 4.000111283467963, Error: 1.2384010241938105e-08
Prediction: (5, 1) => 5.999963868045998, Error: 1.3055181000094616e-09
Prediction: (0, 4) => 4.0001600236995385, Error: 2.560758441398551e-08
Prediction: (2, 8) => 10.000096842514173, Error: 9.378472551384486e-09
Prediction: (3, 0) => 3.00010933451476, Error: 1.195403611781097e-08
Prediction: (2, 2) => 4.0001091034016625, Error: 1.1903552254324517e-08
Prediction: (5, 1) => 5.999964575945314, Error: 1.254863650399524e-09
Prediction: (0, 4) => 4.000156888777441, Error: 2.4614088486799234e-08
Prediction: (2, 8) => 10.000094945400475, Error: 9.014629071368565e-09
Prediction: (3, 0) => 3.0001071926261016, Error: 1.1490259090555959e-08
Prediction: (2, 2) => 4.000106966039873, Error: 1.1441733686048552e-08
Prediction: (5, 1) => 5.999965269975265, Error: 1.206174618107375e-09
Prediction: (0, 4) => 4.000153815265113, Error: 2.3659135781864387e-08
Predict

Prediction: (2, 8) => 10.00000051162234, Error: 2.617574179285958e-13
Prediction: (3, 0) => 3.000000577601402, Error: 3.3362337938308807e-13
Prediction: (2, 2) => 4.00000057638027, Error: 3.322142153392149e-13
Prediction: (5, 1) => 5.999999812876513, Error: 3.501519952619199e-14
Prediction: (0, 4) => 4.0000008288182665, Error: 6.869397188225016e-13
Prediction: (2, 8) => 10.000000501598727, Error: 2.516012829265247e-13
Prediction: (3, 0) => 3.000000566285142, Error: 3.2067886192506247e-13
Prediction: (2, 2) => 4.0000005650879356, Error: 3.1932437491054943e-13
Prediction: (5, 1) => 5.999999816542602, Error: 3.365661677736709e-14
Prediction: (0, 4) => 4.000000812580212, Error: 6.602866009441272e-13
Prediction: (2, 8) => 10.0000004917715, Error: 2.4183920817917916e-13
Prediction: (3, 0) => 3.0000005551905904, Error: 3.082365916575944e-13
Prediction: (2, 2) => 4.000000554016838, Error: 3.0693465649872963e-13
Prediction: (5, 1) => 5.999999820136868, Error: 3.235074639240029e-14
Prediction: (

Prediction: (3, 0) => 3.0000002669921138, Error: 7.128478880524297e-14
Prediction: (2, 2) => 4.000000266427653, Error: 7.098369442838297e-14
Prediction: (5, 1) => 5.999999913503531, Error: 7.481639089650118e-15
Prediction: (0, 4) => 4.0000003831153, Error: 1.4677733337993172e-13
Prediction: (2, 8) => 10.000000231860442, Error: 5.3759264544546464e-14
Prediction: (3, 0) => 3.0000002617612527, Error: 6.851895340115238e-14
Prediction: (2, 2) => 4.000000261207852, Error: 6.822954183817436e-14
Prediction: (5, 1) => 5.9999999151981545, Error: 7.191353007958835e-15
Prediction: (0, 4) => 4.000000375609376, Error: 1.410824030404526e-13
Prediction: (2, 8) => 10.00000022731787, Error: 5.1673413756177634e-14
Prediction: (3, 0) => 3.0000002566328727, Error: 6.586043136589764e-14
Prediction: (2, 2) => 4.000000256090314, Error: 6.558224907205497e-14
Prediction: (5, 1) => 5.9999999168595775, Error: 6.912329861216363e-15
Prediction: (0, 4) => 4.000000368250505, Error: 1.3560843425335646e-13
Prediction: 

Prediction: (2, 2) => 4.0000000007472, Error: 5.583072952882899e-19
Prediction: (5, 1) => 5.999999999757421, Error: 5.88447119313239e-20
Prediction: (0, 4) => 4.000000001074451, Error: 1.1544444154505365e-18
Prediction: (2, 8) => 10.000000000650251, Error: 4.228268939623414e-19
Prediction: (3, 0) => 3.000000000734111, Error: 5.389189518288816e-19
Prediction: (2, 2) => 4.00000000073256, Error: 5.366438468127748e-19
Prediction: (5, 1) => 5.999999999762171, Error: 5.6562781551234e-20
Prediction: (0, 4) => 4.000000001053399, Error: 1.109649760139541e-18
Prediction: (2, 8) => 10.000000000637522, Error: 4.0643434531821287e-19
Prediction: (3, 0) => 3.0000000007197305, Error: 5.180119889203101e-19
Prediction: (2, 2) => 4.000000000718209, Error: 5.158235984294678e-19
Prediction: (5, 1) => 5.999999999766833, Error: 5.436697266312331e-20
Prediction: (0, 4) => 4.000000001032763, Error: 1.0665998443897479e-18
Prediction: (2, 8) => 10.000000000625022, Error: 3.9065226558078517e-19
Prediction: (3, 0)

Prediction: (3, 0) => 3.000000000367286, Error: 1.348991537774384e-19
Prediction: (2, 2) => 4.000000000366509, Error: 1.3432855478736986e-19
Prediction: (5, 1) => 5.999999999881012, Error: 1.4158233315843242e-20
Prediction: (0, 4) => 4.00000000052703, Error: 2.777605947771467e-19
Prediction: (2, 8) => 10.000000000318959, Error: 1.0173489558777561e-19
Prediction: (3, 0) => 3.0000000003600906, Error: 1.2966525748215366e-19
Prediction: (2, 2) => 4.0000000003593295, Error: 1.2911765728784537e-19
Prediction: (5, 1) => 5.999999999883343, Error: 1.3608833800807623e-20
Prediction: (0, 4) => 4.000000000516705, Error: 2.6698395480826216e-19
Prediction: (2, 8) => 10.00000000031271, Error: 9.778745529539609e-20
Prediction: (3, 0) => 3.0000000003530363, Error: 1.2463460573742064e-19
Prediction: (2, 2) => 4.00000000035229, Error: 1.2410807001603982e-19
Prediction: (5, 1) => 5.9999999998856275, Error: 1.3081071378039434e-20
Prediction: (0, 4) => 4.000000000506582, Error: 2.566256587283834e-19
Predict

Show the weights and biases that were computed during training.

In [4]:
model.show_weights_and_biases()

h1: weights=[-0.02909085057622536, -1.0171748385854245], bias=0.004438621829104805
h2: weights=[-0.216675955943991, -0.7082971200437425], bias=0.03749455703589989
h3: weights=[0.9196478933134615, 0.9196478934061337], bias=-0.01845365458987433
y: weights=[-0.49967239785933687, 0.39973842120132097, 1.0873726859367807], bias=0.0200660005627163


Ask the model to add 4 and 4.

In [5]:
model.predict(4, 4)

7.999999999955937

This is a *very* simple network with just nine trainable parameters, and it updates its weights and biases after every training sample. Imagine how much longer training would take if the network had 100 million parameters, which isn't uncommon in deep learning. In practice, data scientists run batches of training samples through the network and update the weights and biases after each batch, a technique known as *mini-batch gradient descent*. Still, this network proves the principle that gradient descent can converge on a reasonable set of weights and biases, and it shows in a very limited way how gradient descent is enacted.