# Gradient Descent Without Importing any Library

### Here, we perform basic Gradient Descent without using numpy or pytorch.

#### We use a single function, <i>y</i> = 3<i>x</i> for ease, where <i>y</i> and <i>x</i> are vectors

#### We create two lists representing an input and the expected output to fulfill the previous function

#### As we are not using numpt or pytorch, we use lists to represent the vectors

In [1]:
x = [1,2,3,4]  # Input vector
y = [3,6,9,12]  # Expected output vector

#### As we cannot make a dot product between two lists, we create the following helper function:

In [2]:
def dot_product(a, b):
    ans = []
    for i, j in zip(a, b):
        ans.append(i*j)
    return ans

#### Let's set the initial weight value to 0.1. Note that the target after training the model is 3.

In [3]:
w = 0.1

#### A function to perform the forward propagation. It accepts the input list and return the predicted output y_hat

In [4]:
def model(x):
    y_hat = [w*e for e in x]
    return y_hat

#### A function to calculate and return the mean square error loss by comparing the predicted output and the target output.

In [5]:
def loss(y_pred, y):
    SE = []  ## First calculate square error
    for i, j in zip(y_pred, y):
        SE.append((i-j)**2)
    MSE = sum(SE) / len(SE)
    return MSE    

#### The gradient function calculates the partial derivative of the loss with respect to the weight. 
#### To calculate that, we first claculate the partial derivative of the loss with respect to predicted output and multiply it by the partial derivative of the predicted output with respect to the weight. (Check the partial derivative chain rule: https://tutorial.math.lamar.edu/classes/calciii/chainrule.aspx

In [6]:
def gradient(y_pred, y, x):
    # The returned gradient = mean(2 * x * (y_pred - y))
    # We implement the previous function as follows:
    first_part = 2 * [2*e for e in x]  # 2 * x
    diff = []  # (y_pred - y)
    for i, j in zip(y_pred, y):
        diff.append(i-j)
    gr = dot_product(first_part, diff)  # 2 * x * (y_pred - y)
    gr = sum(gr) / len(gr)  # mean(2 * x * (y_pred - y))
    return gr

#### We set the number of iterations to improve the weight.

In [7]:
epochs = 30

#### The step to update the weight, try different ones and see the difference.
#### Note that if the step is so big, the model will not learn and the weight will keep changing drastically (try lr=1)
#### Note that if the step is so small, the model will need more epochs to converge (try lr=0.0001)

In [8]:
lr = 0.01

In [9]:
for epoch in range(epochs):
    y_pred = model(x)  # Use the model function and pass the input tor predict the output
    l = loss(y_pred, y)  # Calculate the loss
    dw = gradient(y_pred, y, x) # calculate the gradient
    w -= dw * lr  # Update the weight
    print(f"Epoch: {epoch}, loss: {l}, weight: {w}")

Epoch: 0, loss: 63.074999999999996, weight: 0.535
Epoch: 1, loss: 45.571687499999996, weight: 0.9047499999999999
Epoch: 2, loss: 32.925544218750005, weight: 1.2190375
Epoch: 3, loss: 23.788705698046876, weight: 1.486181875
Epoch: 4, loss: 17.18733986683887, weight: 1.7132545937499999
Epoch: 5, loss: 12.417853053791083, weight: 1.9062664046875
Epoch: 6, loss: 8.971898831364058, weight: 2.070326443984375
Epoch: 7, loss: 6.482196905660529, weight: 2.209777477386719
Epoch: 8, loss: 4.683387264339732, weight: 2.328310855778711
Epoch: 9, loss: 3.3837472984854555, weight: 2.4290642274119043
Epoch: 10, loss: 2.4447574231557434, weight: 2.514704593300119
Epoch: 11, loss: 1.7663372382300224, weight: 2.587498904305101
Epoch: 12, loss: 1.2761786546211908, weight: 2.649374068659336
Epoch: 13, loss: 0.9220390779638108, weight: 2.7019679583604357
Epoch: 14, loss: 0.6661732338288526, weight: 2.7466727646063704
Epoch: 15, loss: 0.4813101614413452, weight: 2.784671849915415
Epoch: 16, loss: 0.3477465916

## Now we can see that the model converged succesfully and the weight became very close to 3 as targeted. More epochs should make better results.