In [114]:
import numpy as np

#Defining the vector variable Y
Y = np.array([1,0,1,4,3,2,5,6,9,13,15,16])

# Defining the (3,12) matrix as X
X = np.array([[1,1,1],
          [1,2,1],
          [1,2,2],
          [1,3,2],
          [1,5,4],
          [1,5,6],
          [1,6,5],
          [1,7,4],
          [1,10,8],
          [1,11,7],
          [1,11,9],
          [1,12,10]])


In [115]:
#Defining the linear regression function and calculating the loss.
def linear_regression(b, X, Y):
    y_pred = X @ b
    loss = np.sum((y_pred - Y)**2)
    return loss

#Defining the gradient descent function
def gradient_descent(X, Y, learning_rate, num_iterations):
    n = len(X)      # Number of rows
    p = len(X[0])   # Number of columns
    
    #initializing the beta and best_beta to 0
    b = np.zeros(p)
    best_loss = float("inf")
    best_beta = np.zeros(p)
    
    #calculating the gradient of the loss function at beta
    for i in num_iterations:
        y_pred = X @ b
        gradient = (2 * X.T) @ (y_pred - Y)
        
        #Update Beta
        b = b - learning_rate * gradient
        
        #Keep track of best seen so far loss and parameters.
        current_loss = linear_regression(b, X, Y)
        
        if (current_loss < best_loss):
            best_loss = current_loss
            best_beta = b
        #print beta and loss update within the for loop. only print first 9 and last 9
        #iterations
        if (i<10 or i > 29990):
            print("Iteration:", i, "Beta Values", beta)
            print("\n")
            print("Best Loss: ", current_loss)
            
            
    #return list of final results
    return ({"beta": best_beta, "loss": best_loss})

#set the learning rate and number of iterations. 
learning_rate = 0.0001

In [116]:
#number of iterations 1-30000
num_iterations = range(30000)

#call and run the gradient descent function. 
result = gradient_descent(X, Y, learning_rate, num_iterations)


Iteration: 0 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  539.0223544
Iteration: 1 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  360.7318699583712
Iteration: 2 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  248.78911178512737
Iteration: 3 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  178.4977897913792
Iteration: 4 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  134.35420528676462
Iteration: 5 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  106.62553310358618
Iteration: 6 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  89.20175472042988
Iteration: 7 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  78.24715687403072
Iteration: 8 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  71.35377691342127
Iteration: 9 Beta Values [-2.2630379   1.54972927 -0.2385295 ]


Best Loss:  67.00995746756537
Iteration: 29991 Beta Values [-2.2630379   1.54972927

In [112]:
#Prints the betas and loss
print("Betas and Loss: ", result)

Betas and Loss:  {'beta': array([-2.26303788,  1.54972927, -0.2385295 ]), 'loss': 34.10088344257619}


In [113]:
#Verify that betas are correct values using "OLS Matrix Formulation"
beta = np.linalg.inv(X.T @ X) @ X.T @ Y
print(beta)

[-2.2630379   1.54972927 -0.2385295 ]


The purpose of this code is to create a gradient descent algorithm from scratch. It begins by defining the vector(1x12) as Y, and the matrix(3x12) as X. First the loss function must be defined and calculated using "np.sum((X @ b)- Y)^2", using "np.sum" to compute the total sum of all rows and columns separately. Next the gradient descent function is designed requiring inputs for X, Y, learning_rate, and num_iterations which are defined in the function. Within the gradient descent function, the number of columns and rows are defined. Then the betas are initialized to 0 with the length "p" parameters used to store parameter values. Following this, a for loop is used to calculate the gradient descent and update the betas while keeping track of the best loss and parameters. If the current loss is less than the best loss, the beta and best loss is updated as the values decrease. Next, the number of returned values is defined as the first and last 10 values so that 30,000 values are not printed, but we get an idea if the gradient descent function is working etc. The best betas and best loss are then called outside of the gradient_descent function and the learning rate is defined at 0.0001 ensuring accurate results. The calculated betas are then verified using the OLS Matrix Formulation equation. 