Linear-SGD

Linear Regression SGD Optimization Implementation

Overview

The SGD algorithm used as machine learning method for weights optimization in a given statistical model. The method based on iterative process, when in each iteration the model learns from the prediction erorr in order to get better weight values. The code 'sgd.py' used as such SGD-Algorithm method in from-scarch implementation for a linear model.

The familiar linear model is based on matrix X and weights B, that for their product we get the Y values (in addition to bulit-in errors):

$Y=BX + \varepsilon$

Given some X and Y, we would like to use the SGD method to find the B values. To do this, we will follow this steps:

Sample randomly the B values.
Stochastically sample only subset from the given data points.
We will use B on the selected subset and calculated the errors between the prediction to Y.
Using the errors we calculated, we will update the B values.
Repeat the 2-4 steps until we get the required iteration times

At the end of the process, if we used enough iterations, we will get close $\widehat{B}$ values to the real B values. An illustration of the iterative process can be seen in the following GIF:

To update the weight in the currect direction, we need to minimize the errors between our prediction to the actual values of Y. to do so, we want to use the derivative of our model loss function (that measure the distance between the prediction and the actual values). our loss look as the following equation:

$Loss_{f} = \frac{1}{n}\sum_{i}^{n}(Y - \hat{B}X)^{2}$ (MSE loss)

So, the derivative can found by the following equation:

$\frac{\partial Loss_{f} }{\partial B} = -2\sum_{i}^{n} X\cdot (Y-BX)= \frac{-2}{n}\sum_{i}^{n} X\cdot Errors$

Now, we can use this derivative to update each of our B values:

$B_{i+1} = B_{i} - \eta \cdot (\frac{\partial Loss_{f} }{\partial B_{i}})= B_{i} - \eta \cdot ( -\frac{2}{n}\sum_{i}^{n} X_{\cdot j}\cdot Errors)$

When $\eta$ describes the learning rate of the change each time (also called "step size"). This equation describes the simple method of updating the weights, while there are also other methods for even better optimization. One of them is ADAM, an algorithm designed to find the values of the weights in a particularly efficient and fast way, based on adjusting the learning rate for each weight individually, as can be seen in the following equation:

$M_{i+1} = (\beta_{1} \cdot M_{i})+ (1-\beta_{1}) \cdot (\frac{\partial Loss_{f} }{\partial B_{i}})$

$V_{i+1} = (\beta_{2} \cdot V_{i})+ (1-\beta_{2}) \cdot (\frac{\partial Loss_{f} }{\partial B_{i}})^{2}$

$\hat{M_{i+1}} = \frac{M_{i+1}}{1-\beta_{1}^{i+1}}, \hat{V_{i+1}} = \frac{V_{i+1}}{1-\beta_{2}^{i+1}}$

$B_{i+1} = B_{i} - \eta \cdot (\frac{\hat{M_{i+1}}}{\sqrt{\hat{V_{i+1}}}+\varepsilon})$

when i is the iteration number and $\beta_{1} = 0.9, \beta_{2} = 0.999, M_{0} = 0, V_{0} = 0, \varepsilon = epsilon, \eta = learning-rate$

As mentioned, when we compare the two methods, it can be seen that ADAM achieves better performance than the simple method (the lr values for this comparsion selected for this illustration are optimal for each of the methods):

It should be noted that the code writed for application on the multivariate linear model, but of course is also suitable for implementation for cases of polynomial model, as can be seen here:

When the original function is:

$Y = \frac{1}{2}X^{2} + 3X + 2$

For more on the math behind the linear-polynomial regression model, you can see here: Poly Regression

The 'sgd.py' code also include built in plot function, so you can follow the loss by the number of the iterations:

# import code
from sgd import LinearSGD
# load data
x = np.load(...)
y = np.load(...)

# using the code
LinearSGD(iters=400).fit(x,y).plot_loss()

Libraries

The code uses the following libraries in Python:

matplotlib

numpy

Application

An application of the code is attached to this page under the name:

implementation.py

The examples are also attached here data.

Example for using the code

To use this code, you just need to import it as follows:

# import code
from sgd import LinearSGD

# load data
x = np.load(...)
y = np.load(...)

# define variables
lr = 0.01
iters = 500
sample_rate = .5
adam = True

# using the code
model = LinearSGD(lr=lr, iters=iters, sample_rate=sample_rate, adam=adam)

# fitting the model
model.fit(x,y)

# get prediction
model.predict(x_new)

When the variables displayed are:

lr: float, the learning rate (defualt = .001)

iters: int, number of iteration (defualt = 100)

sample_rate: float, rate of sampling subset from the given data points (defualt = .5)

adam: bool, using ADAM optimizer, or simple method (defualt = True)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
examples		examples
pictures		pictures
LICENSE		LICENSE
README.md		README.md
implementation.py		implementation.py
sgd.py		sgd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear-SGD

Overview

Libraries

Application

Example for using the code

License

About

Releases

Packages

Languages

License

EtzionR/Linear-SGD

Folders and files

Latest commit

History

Repository files navigation

Linear-SGD

Overview

Libraries

Application

Example for using the code

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages