# Gradient Descent

[Gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) is the fundamental algorithm used to train neural networks. The following example uses gradient descent to find the optimum weights and biases for the simple multilayer perceptron (MLP) shown below. The network contains three layers: an input layer that accepts two values, a hidden layer with three neurons, and an output layer with one neuron. The network can be trained to transform two inputs into an output — for example, to add two inputs or square the difference. Values forwarded from the hidden layer to the output layer are transformed using the [ReLU](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) activation function, which turns negative numbers into zeros.

![](Images/network.png)

Let's begin by defining a class named `NeuralNetwork` that encapculates a network of this type. The network contains 13 trainable parameters: nine weights and four biases. `w0` and `w1` in the diagram correspond to `weights[0]` and `weights[1]` in the `NeuralNetwork` class, while `b0` and `b1` correspond to `biases[0]` and `biases[1]`.

In [1]:
import numpy as np

class NeuralNetwork():
    def __init__(self):
        self.weights = np.random.uniform(-1, 1, 9)
        self.biases = np.zeros(4)
        
    def show_weights_and_biases(self):
        print(f'h1: weights=[{self.weights[0]}, {self.weights[1]}], bias={self.biases[0]}')
        print(f'h2: weights=[{self.weights[2]}, {self.weights[3]}], bias={self.biases[1]}')
        print(f'h3: weights=[{self.weights[4]}, {self.weights[5]}], bias={self.biases[2]}')
        print(f'y: weights=[{self.weights[6]}, {self.weights[7]}, {self.weights[8]}], bias={self.biases[3]}')
        
    def relu(self, x):
        return max(0, x)
    
    def update_weights_and_biases(self, x1, x2, y, lr=0.01):
        prediction = self.predict(x1, x2)
        delta = prediction - y

        # Compute intermediate values for neurons in the hidden layer
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]

        # Compute deltas for 9 weights
        weight_deltas = np.empty(9)
        weight_deltas[0] = lr * (x1 * delta * self.weights[6])
        weight_deltas[1] = lr * (x2 * delta * self.weights[6])
        weight_deltas[2] = lr * (x1 * delta * self.weights[7])
        weight_deltas[3] = lr * (x2 * delta * self.weights[7])
        weight_deltas[4] = lr * (x1 * delta * self.weights[8])
        weight_deltas[5] = lr * (x2 * delta * self.weights[8])
        weight_deltas[6] = lr * delta * h1
        weight_deltas[7] = lr * delta * h2
        weight_deltas[8] = lr * delta * h3

        # Compute deltas for 4 biases
        bias_deltas = np.empty(4)
        bias_deltas[0] = lr * delta * self.weights[6]
        bias_deltas[1] = lr * delta * self.weights[7]
        bias_deltas[2] = lr * delta * self.weights[8]
        bias_deltas[3] = lr * delta

        # Update weights and biases
        self.weights -= weight_deltas
        self.biases -= bias_deltas

        # Show the results
        prediction = self.predict(x1, x2)
        print(f'Prediction: ({x1}, {x2}) => {prediction}, Error: {delta}')          
        return delta
    
    def predict(self, x1, x2):
        h1 = (x1 * self.weights[0]) + (x2 * self.weights[1]) + self.biases[0]
        h2 = (x1 * self.weights[2]) + (x2 * self.weights[3]) + self.biases[1]
        h3 = (x1 * self.weights[4]) + (x2 * self.weights[5]) + self.biases[2]
        y = (self.relu(h1) * self.weights[6]) + (self.relu(h2) * self.weights[7]) + (self.relu(h3) * self.weights[8]) + self.biases[3]
        return y

Create an instance of `NeuralNetwork` and show the randomly initialized weights and biases. Note that biases are simply initialized to 0.

In [2]:
model = NeuralNetwork()
model.show_weights_and_biases()

h1: weights=[0.018851734576560197, 0.13428806584862496], bias=0.0
h2: weights=[0.46465647047090464, 0.11237603922837014], bias=0.0
h3: weights=[0.5913719113023415, -0.3245029613869448], bias=0.0
y: weights=[-0.5449562864005284, -0.20006221362198873, -0.423765639120498], bias=0.0


Train the network for 1,000 iterations with samples that teach it how to add numbers together. The following code performs 5,000 forward passes through the network and 5,000 backpropagation passes:

In [3]:
x = np.array([[2, 2], [5, 1], [0, 4], [2, 8], [3, 0]])
y = np.array([4, 6, 4, 10, 3])

for i in range(1000):
    for j in range(len(x)):
        model.update_weights_and_biases(x[j][0], x[j][1], y[j], 0.01)

Prediction: (2, 2) => -0.295667103762581, Error: -4.62397357856886
Prediction: (5, 1) => -0.20115931232205075, Error: -7.24230160389694
Prediction: (0, 4) => 0.16894807087240454, Error: -3.938328886656018
Prediction: (2, 8) => 0.5528998550668718, Error: -9.793322332502857
Prediction: (3, 0) => 0.13332427085577242, Error: -3.0689541320117653
Prediction: (2, 2) => 0.5988876445962781, Error: -3.490887967092468
Prediction: (5, 1) => 1.3570962822327637, Error: -5.388489483040029
Prediction: (0, 4) => 0.718695560419482, Error: -3.41105111782076
Prediction: (2, 8) => 3.4707165588826037, Error: -8.591608700779728
Prediction: (3, 0) => 1.679285868051953, Error: -1.4347961940610212
Prediction: (2, 2) => 2.09192964221619, Error: -2.1108514118224866
Prediction: (5, 1) => 3.8843392884914265, Error: -2.976461677229844
Prediction: (0, 4) => 2.537190900233203, Error: -1.7552887594481068
Prediction: (2, 8) => 10.010657993324607, Error: -4.406191239913491
Prediction: (3, 0) => 3.3140382951123626, Error:

Prediction: (5, 1) => 5.99989128444694, Error: -0.00028356749179980767
Prediction: (0, 4) => 4.000477574302595, Error: 0.0007255186724659524
Prediction: (2, 8) => 10.000273309129406, Error: -0.0004264792210655344
Prediction: (3, 0) => 3.000324871802693, Error: 0.00040590939822271466
Prediction: (2, 2) => 4.00033225075027, Error: 0.00043379072930349594
Prediction: (5, 1) => 5.999893667319685, Error: -0.0002773534296869329
Prediction: (0, 4) => 4.0004671092529875, Error: 0.0007096207404462263
Prediction: (2, 8) => 10.000267321294869, Error: -0.00041713355689054765
Prediction: (3, 0) => 3.000317753091453, Error: 0.0003970151308432257
Prediction: (2, 2) => 4.000324970401345, Error: 0.0004242854460043688
Prediction: (5, 1) => 5.99989599796549, Error: -0.00027127550930661215
Prediction: (0, 4) => 4.000456873458871, Error: 0.0006940710677003636
Prediction: (2, 8) => 10.000261464583875, Error: -0.00040799263724444756
Prediction: (3, 0) => 3.00031079032042, Error: 0.0003883156910835517
Predicti

Prediction: (2, 8) => 10.00010310398639, Error: -0.00016086355386235596
Prediction: (3, 0) => 3.000122542405672, Error: 0.00015311183680344342
Prediction: (2, 2) => 4.000125326338405, Error: 0.00016362797482116775
Prediction: (5, 1) => 5.999959898130458, Error: -0.00010461343954037972
Prediction: (0, 4) => 4.000176191162293, Error: 0.0002676684128948281
Prediction: (2, 8) => 10.000100844439011, Error: -0.00015733789645011598
Prediction: (3, 0) => 3.000119856688933, Error: 0.00014975616526502478
Prediction: (2, 2) => 4.000122579614728, Error: 0.0001600418150289329
Prediction: (5, 1) => 5.9999607771201, Error: -0.00010232060141923682
Prediction: (0, 4) => 4.00017232959952, Error: 0.0002618019944780059
Prediction: (2, 8) => 10.000098634401317, Error: -0.0001538895037125343
Prediction: (3, 0) => 3.0001172298272767, Error: 0.00014647402922296493
Prediction: (2, 2) => 4.000119893082721, Error: 0.00015653424210793077
Prediction: (5, 1) => 5.99996163684349, Error: -0.00010007801156763918
Predi

Prediction: (2, 2) => 4.000000574320733, Error: 7.498428358232445e-07
Prediction: (5, 1) => 5.999999816249124, Error: -4.79386826235384e-07
Prediction: (0, 4) => 4.000000807401944, Error: 1.2266083508905012e-06
Prediction: (2, 8) => 10.00000046215689, Error: -7.209987806788831e-07
Prediction: (3, 0) => 3.0000005492525377, Error: 6.862741166457909e-07
Prediction: (2, 2) => 4.000000561732081, Error: 7.334068801156945e-07
Prediction: (5, 1) => 5.9999998202767975, Error: -4.6887905025272403e-07
Prediction: (0, 4) => 4.0000007897043375, Error: 1.1997220754622617e-06
Prediction: (2, 8) => 10.000000452026782, Error: -7.051950632330772e-07
Prediction: (3, 0) => 3.00000053721336, Error: 6.712315356871557e-07
Prediction: (2, 2) => 4.0000005494193625, Error: 7.173311864505649e-07
Prediction: (5, 1) => 5.999999824216188, Error: -4.586015949215039e-07
Prediction: (0, 4) => 4.000000772394649, Error: 1.1734251241790616e-06
Prediction: (2, 8) => 10.000000442118719, Error: -6.897377478054523e-07
Predic

Prediction: (3, 0) => 3.0000001244167978, Error: 1.5545495779534235e-07
Prediction: (2, 2) => 4.000000127243668, Error: 1.6613147924005034e-07
Prediction: (5, 1) => 5.999999959289079, Error: -1.0621056922843763e-07
Prediction: (0, 4) => 4.000000178883982, Error: 2.717612872515929e-07
Prediction: (2, 8) => 10.000000102393217, Error: -1.5974091560622128e-07
Prediction: (3, 0) => 3.0000001216896806, Error: 1.5204750836161907e-07
Prediction: (2, 2) => 4.000000124454587, Error: 1.6249000811541237e-07
Prediction: (5, 1) => 5.9999999601814284, Error: -1.0388251769910539e-07
Prediction: (0, 4) => 4.000000174962985, Error: 2.658044930825554e-07
Prediction: (2, 8) => 10.000000100148842, Error: -1.5623952442922473e-07
Prediction: (3, 0) => 3.0000001190223387, Error: 1.487147476275652e-07
Prediction: (2, 2) => 4.000000121726641, Error: 1.5892835669717442e-07
Prediction: (5, 1) => 5.999999961054221, Error: -1.0160549290816334e-07
Prediction: (0, 4) => 4.000000171127937, Error: 2.599782691348196e-07

Prediction: (0, 4) => 4.000000000450555, Error: 6.844862454613576e-10
Prediction: (2, 8) => 10.000000000257899, Error: -4.0233949505363853e-10
Prediction: (3, 0) => 3.0000000003064993, Error: 3.829621064710409e-10
Prediction: (2, 2) => 4.000000000313464, Error: 4.092637340136207e-10
Prediction: (5, 1) => 5.99999999989971, Error: -2.616493688378796e-10
Prediction: (0, 4) => 4.0000000004406795, Error: 6.694822474173634e-10
Prediction: (2, 8) => 10.000000000252244, Error: -3.935198833460163e-10
Prediction: (3, 0) => 3.0000000002997815, Error: 3.74567932226455e-10
Prediction: (2, 2) => 4.000000000306594, Error: 4.0029224379622974e-10
Prediction: (5, 1) => 5.999999999901907, Error: -2.559135126034562e-10
Prediction: (0, 4) => 4.00000000043102, Error: 6.548077635670779e-10
Prediction: (2, 8) => 10.000000000246715, Error: -3.8489389453388867e-10
Prediction: (3, 0) => 3.0000000002932112, Error: 3.6635849909316676e-10
Prediction: (2, 2) => 4.000000000299872, Error: 3.915188173664319e-10
Predict

Prediction: (2, 8) => 10.000000000085148, Error: -1.3284129352086893e-10
Prediction: (3, 0) => 3.0000000001011964, Error: 1.2644241209613938e-10
Prediction: (2, 2) => 4.000000000103496, Error: 1.3512568841633765e-10
Prediction: (5, 1) => 5.999999999966886, Error: -8.638867399213268e-11
Prediction: (0, 4) => 4.000000000145498, Error: 2.2104273966760957e-10
Prediction: (2, 8) => 10.000000000083283, Error: -1.2992806830425252e-10
Prediction: (3, 0) => 3.000000000098978, Error: 1.2367085133746514e-10
Prediction: (2, 2) => 4.0000000001012275, Error: 1.3216361338663773e-10
Prediction: (5, 1) => 5.999999999967612, Error: -8.449596577975171e-11
Prediction: (0, 4) => 4.000000000142309, Error: 2.1619683820972568e-10
Prediction: (2, 8) => 10.000000000081458, Error: -1.2708056829069392e-10
Prediction: (3, 0) => 3.000000000096809, Error: 1.209596867113305e-10
Prediction: (2, 2) => 4.000000000099009, Error: 1.2926726355999563e-10
Prediction: (5, 1) => 5.9999999999683205, Error: -8.264144923941785e-1

Show the weights and biases that were computed during training.

In [4]:
model.show_weights_and_biases()

h1: weights=[-1.5035511151898924, -2.2631466584055144], bias=0.3695378545032682
h2: weights=[0.8619324774428804, 0.8619324774779634], bias=-0.23936319213150303
h3: weights=[-0.516936131935046, -1.8950752318179134], bias=0.2840765529677082
y: weights=[-2.6912968186551707, 1.1601836873739109, -1.9287622876007422], bias=0.2777052711124387


Ask the model to add 4 and 4.

In [5]:
model.predict(4, 4)

7.999999999982623

This is a *very* simple network with just nine trainable parameters, and it updates its weights and biases after every training sample. Imagine how much longer training would take if the network had 100 million parameters, which isn't uncommon in deep learning. In practice, data scientists run batches of training samples through the network and update the weights and biases after each batch, a technique known as *mini-batch gradient descent*. Still, this network proves the principle that gradient descent can converge on a reasonable set of weights and biases, and it shows in a very limited way how gradient descent is enacted.