Skip to content

Linear Regression SGD Optimization Implementation

License

Notifications You must be signed in to change notification settings

EtzionR/Linear-SGD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linear-SGD

Linear Regression SGD Optimization Implementation

Overview

The SGD algorithm used as machine learning method for weights optimization in a given statistical model. The method based on iterative process, when in each iteration the model learns from the prediction erorr in order to get better weight values. The code 'sgd.py' used as such SGD-Algorithm method in from-scarch implementation for a linear model.

The familiar linear model is based on matrix X and weights B, that for their product we get the Y values (in addition to bulit-in errors):

Given some X and Y, we would like to use the SGD method to find the B values. To do this, we will follow this steps:

  1. Sample randomly the B values.
  2. Stochastically sample only subset from the given data points.
  3. We will use B on the selected subset and calculated the errors between the prediction to Y.
  4. Using the errors we calculated, we will update the B values.
  5. Repeat the 2-4 steps until we get the required iteration times

At the end of the process, if we used enough iterations, we will get close values to the real B values. An illustration of the iterative process can be seen in the following GIF:

iterations

To update the weight in the currect direction, we need to minimize the errors between our prediction to the actual values of Y. to do so, we want to use the derivative of our model loss function (that measure the distance between the prediction and the actual values). our loss look as the following equation:

(MSE loss)

So, the derivative can found by the following equation:

Now, we can use this derivative to update each of our B values:

When describes the learning rate of the change each time (also called "step size"). This equation describes the simple method of updating the weights, while there are also other methods for even better optimization. One of them is ADAM, an algorithm designed to find the values of the weights in a particularly efficient and fast way, based on adjusting the learning rate for each weight individually, as can be seen in the following equation:

when i is the iteration number and

As mentioned, when we compare the two methods, it can be seen that ADAM achieves better performance than the simple method (the lr values for this comparsion selected for this illustration are optimal for each of the methods):

compare

It should be noted that the code writed for application on the multivariate linear model, but of course is also suitable for implementation for cases of polynomial model, as can be seen here:

poly

When the original function is:

For more on the math behind the linear-polynomial regression model, you can see here: Poly Regression

The 'sgd.py' code also include built in plot function, so you can follow the loss by the number of the iterations:

# import code
from sgd import LinearSGD
# load data
x = np.load(...)
y = np.load(...)

# using the code
LinearSGD(iters=400).fit(x,y).plot_loss()

plot

Libraries

The code uses the following libraries in Python:

matplotlib

numpy

Application

An application of the code is attached to this page under the name:

implementation.py

The examples are also attached here data.

Example for using the code

To use this code, you just need to import it as follows:

# import code
from sgd import LinearSGD

# load data
x = np.load(...)
y = np.load(...)

# define variables
lr = 0.01
iters = 500
sample_rate = .5
adam = True

# using the code
model = LinearSGD(lr=lr, iters=iters, sample_rate=sample_rate, adam=adam)

# fitting the model
model.fit(x,y)

# get prediction
model.predict(x_new)

When the variables displayed are:

lr: float, the learning rate (defualt = .001)

iters: int, number of iteration (defualt = 100)

sample_rate: float, rate of sampling subset from the given data points (defualt = .5)

adam: bool, using ADAM optimizer, or simple method (defualt = True)

License

MIT © Etzion Harari

Releases

No releases published

Packages

No packages published

Languages