# Minibatch Gradient Descent

We're going to open some data, and fit it to the model y = p[0]\*x1 + p[1]\*exp(x2/p[2]). Here the p values are parameters.

In [19]:
import pandas as pd
data = pd.read_csv(open("Files/data.csv"), delimiter=' ', header=None).values
x1_data = data[0]
x2_data = data[1]
y_data = data[2]

In [33]:
import numpy as np
p = [np.random.normal(), np.random.normal(), np.random.normal()]

Here we set learning rate and number of epochs. This is how we would train using stochastic gradient descent, with one data sample at a time.

In [42]:
learning_rate = 0.0001
n_epochs = 500

In [41]:
for epoch_n in range(n_epochs):
    for iteration_n, (x1, x2, y) in enumerate(zip(x1_data, x2_data, y_data)):
        # calculate output using model
        y_predicted = p[0] * x1 + p[1] * (x2 + p[2])
        # calculate loss
        loss = (y_predicted - y)**2
        # find gradients
        p_gradient[0] = 2*(y_predicted - y) * x1
        p_gradient[1] = 2*(y_predicted - y) * (x2 + p[2])
        p_gradient[2] = 2*(y_predicted - y) * p[1]
        # update parameters
        p[0] -= learning_rate * p_gradient[0]
        p[1] -= learning_rate * p_gradient[1]
        p[2] -= learning_rate * p_gradient[2]

Here we set the batch size, reset the parameters, and train using minibatches.

In [43]:
batch_size = 100
p = [np.random.normal(), np.random.normal(), np.random.normal()]

In [44]:
p_gradient = [0, 0, 0]
loss = 0

for epoch_n in range(n_epochs):
    print("Epoch", epoch_n+1)
    for iteration_n, (x1, x2, y) in enumerate(zip(x1_data, x2_data, y_data)):
        # calculate output using model
        y_predicted = p[0] * x1 + p[1] * (x2 + p[2])
        # calculate loss
        loss += (y_predicted - y)**2
        # find gradients
        p_gradient[0] += 2*(y_predicted - y) * x1
        p_gradient[1] += 2*(y_predicted - y) * (x2 + p[2])
        p_gradient[2] += 2*(y_predicted - y) * p[1]
        # update parameters
        if (iteration_n+1)%batch_size == 0:
            p[0] -= learning_rate * p_gradient[0] / batch_size
            p[1] -= learning_rate * p_gradient[1] / batch_size
            p[2] -= learning_rate * p_gradient[2] / batch_size
            # print to screen
            print("Iteration", iteration_n+1, "- loss is", loss / batch_size, "- parameters are", p[0], p[1], p[2])
            p_gradient = [0, 0, 0]
            loss = 0

Epoch 1
Iteration 100 - loss is 364224.209643 - parameters are 6.67018104965 6.7836947113 0.0607817021789
Iteration 200 - loss is 34714.7865834 - parameters are 4.70531879887 5.76452645684 -0.110133853597
Iteration 300 - loss is 9485.75743724 - parameters are 4.24900553602 6.0661056621 -0.112731877682
Iteration 400 - loss is 7845.13161321 - parameters are 4.020367644 6.55425174605 -0.0974336963759
Iteration 500 - loss is 6059.06838296 - parameters are 3.8167990201 7.02112508457 -0.0774228789883
Iteration 600 - loss is 3776.30180294 - parameters are 3.37782089262 7.08000145658 -0.094672154214
Iteration 700 - loss is 2669.20839428 - parameters are 3.37365342601 7.46276424346 -0.064500117562
Iteration 800 - loss is 1729.92500023 - parameters are 3.1578363251 7.54843710901 -0.0646481518585
Iteration 900 - loss is 1374.79146222 - parameters are 2.9919333428 7.6668263452 -0.0591136801569
Iteration 1000 - loss is 900.954103218 - parameters are 2.90862030891 7.77922067751 -0.0483208642514
Epoc