Linear Regression SGD Optimization Implementation
The SGD algorithm used as machine learning method for weights optimization in a given statistical model. The method based on iterative process, when in each iteration the model learns from the prediction erorr in order to get better weight values. The code 'sgd.py' used as such SGD-Algorithm method in from-scarch implementation for a linear model.
The familiar linear model is based on matrix X and weights B, that for their product we get the Y values (in addition to bulit-in errors):
Given some X and Y, we would like to use the SGD method to find the B values. To do this, we will follow this steps:
- Sample randomly the B values.
- Stochastically sample only subset from the given data points.
- We will use B on the selected subset and calculated the errors between the prediction to Y.
- Using the errors we calculated, we will update the B values.
- Repeat the 2-4 steps until we get the required iteration times
At the end of the process, if we used enough iterations, we will get close values to the real B values. An illustration of the iterative process can be seen in the following GIF:
To update the weight in the currect direction, we need to minimize the errors between our prediction to the actual values of Y. to do so, we want to use the derivative of our model loss function (that measure the distance between the prediction and the actual values). our loss look as the following equation:
So, the derivative can found by the following equation:
Now, we can use this derivative to update each of our B values:
When describes the learning rate of the change each time (also called "step size"). This equation describes the simple method of updating the weights, while there are also other methods for even better optimization. One of them is ADAM, an algorithm designed to find the values of the weights in a particularly efficient and fast way, based on adjusting the learning rate for each weight individually, as can be seen in the following equation:
when i is the iteration number and
As mentioned, when we compare the two methods, it can be seen that ADAM achieves better performance than the simple method (the lr values for this comparsion selected for this illustration are optimal for each of the methods):
It should be noted that the code writed for application on the multivariate linear model, but of course is also suitable for implementation for cases of polynomial model, as can be seen here:
When the original function is:
For more on the math behind the linear-polynomial regression model, you can see here: Poly Regression
The 'sgd.py' code also include built in plot function, so you can follow the loss by the number of the iterations:
# import code
from sgd import LinearSGD
# load data
x = np.load(...)
y = np.load(...)
# using the code
LinearSGD(iters=400).fit(x,y).plot_loss()
The code uses the following libraries in Python:
matplotlib
numpy
An application of the code is attached to this page under the name:
The examples are also attached here data.
To use this code, you just need to import it as follows:
# import code
from sgd import LinearSGD
# load data
x = np.load(...)
y = np.load(...)
# define variables
lr = 0.01
iters = 500
sample_rate = .5
adam = True
# using the code
model = LinearSGD(lr=lr, iters=iters, sample_rate=sample_rate, adam=adam)
# fitting the model
model.fit(x,y)
# get prediction
model.predict(x_new)
When the variables displayed are:
lr: float, the learning rate (defualt = .001)
iters: int, number of iteration (defualt = 100)
sample_rate: float, rate of sampling subset from the given data points (defualt = .5)
adam: bool, using ADAM optimizer, or simple method (defualt = True)
MIT © Etzion Harari