#Generalized simple/multiple linear regressor
This script applies basic ML techniques such as OLS gradient descent to produce a simple/multiple linear regression model to make predictions based on previously unseen data.

My intention was to write this script in such a way that it could, with limited modification, be applied to any simple/multiple linear regression problem/dataset.

# Initial steps
The first steps I took in writing this script was to import a handful of useful modules for importing and manipulating data.

With these imports complete, I set the constant learning rate and number of epochs. I then imported both the training and test data sets.

In [6]:
import pandas as pd
import os
import numpy as np

LEARNING_RATE = 0.01
NUM_EPOCHS = 1000
train = pd.read_csv("./train_data.csv")
test = pd.read_csv("./test_data.csv")

# Gradient descent algorithm
The following two functions implement the weight and bias update schema that serves as the heart of the "learning" that takes place.

updateWeights serves as a high-level function for fetching the weight updates and making the appropriate arithmetic updates.

calculateWeightUpdates actually calculates, via gradient descent and treating the entire training dataset as the batch size, the appropriate weight and bias updates. The gradient calculation was done assuming an OLS loss function.

In [7]:
def updateWeights(weights, b, predictedY, trueY, trainX, learningRate):
  BUpdate, weightUpdates = calculateWeightUpdates(weights,b,predictedY,trueY,trainX)
  b = b - learningRate * BUpdate
  weights = weights -learningRate * weightUpdates
  return b, weights


In [8]:
def calculateWeightUpdates(weights, b, predictedY, trueY, trainX):
  BUpdate = 0
  weightUpdates = np.zeros(shape=weights.shape)
  for exampleIndex in range(0,len(predictedY)):
    BUpdate += -2*(trueY[exampleIndex] - predictedY[exampleIndex])
    for weightIndex in range(0, len(weightUpdates)):
      weightUpdates[weightIndex] += -2*trainX[exampleIndex][weightIndex]*(trueY[exampleIndex] - predictedY[exampleIndex])
  BUpdate = BUpdate / len(predictedY)
  for weight in weightUpdates:
    weight = weight/len(predictedY)
  return BUpdate, weightUpdates
  

# Evaluation
The following function implements a fairly basic method of evaluation by computing the MSE for the given set (training or test)

In [9]:
def calculateMSE(predictedY, trueY):
  N = len(trueY)
  sumSquares = 0
  for example in range(0, len(predictedY)):
    sumSquares += (trueY[example] - predictedY[example])**2
  return sumSquares/N


# Data manipulations and training loop
The next bit of code creates separate data frames for the dependent and independent variables. The independent variables are normalized to avoid any issues with scaling. This can be adjusted as needed given the dataset.

The last section of code forms the training loop which computes the predictions based on the current weights via matrix multiplication, passes these predictions to the graident descent algorithm, and repeats for a specified number of epochs. The MSE is then calculated for both the training and test data sets.

In [10]:
trainY = train["octanenum"].to_numpy()
trainY = trainY.reshape(len(trainY),1)
trainX = train[["material1", "material2", "material3", "condition"]]
trainX = (trainX - trainX.min())/(trainX.max() - trainX.min())
trainX = trainX.to_numpy()

testY = test["octanenum"].to_numpy()
testY = testY.reshape(len(testY),1)
testX = test[["material1", "material2", "material3", "condition"]]
testX = (testX - testX.min())/(testX.max() - testX.min())
testX = testX.to_numpy()

weights = np.random.rand(len(trainX[0]),1)
b = 0

for i in range(0, NUM_EPOCHS):
  predictedY = np.matmul(trainX,weights) + b
  b, weights = updateWeights(weights, b, predictedY, trainY, trainX, LEARNING_RATE)
predictedTrainY = np.matmul(trainX, weights) + b
print("Training MSE: " + str(calculateMSE(predictedTrainY, trainY)[0]))
predictedTestY = np.matmul(testX, weights) + b
print("Test MSE: " + str(calculateMSE(predictedTestY, testY)[0]))


Training MSE: 74.66440813562025
Test MSE: 174.00120226760654
