# Linear Regression Models

## Implementation in sci-kit learn

Version 1.0.3 of the scikit-learn library has two different linear regression models: one that uses OLS and another that uses a variation of gradient descent.

## Ordinary Least Squares

The LinearRegression model uses OLS. For most applications this is a good approach. Even if a data set has hundreds of predictor variables or thousands of observations, your computer will have no problem computing the parameters using OLS. One advantage of OLS is that it is **guaranteed to find the exact optimal parameters for linear regression**. Another advantage is that you don’t have to worry about what the learning rate is or whether the gradient descent algorithm will converge.

Here’s some code that uses LinearRegression.

In [1]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression

In [3]:
# Import the data set
X, y = load_diabetes(return_X_y=True)

In [5]:
# Create the OLS linear regression model
ols = LinearRegression()


# Fit the model to the data
ols.fit(X, y)


# Print the coefficients of the model
print(ols.coef_)


# Print R^2
print(ols.score(X, y))

[ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218]
0.5177484222203498


## Gradient Descent

Scikit-learn’s SGDRegressor model uses a variant of gradient descent called stochastic gradient descent (or SGD for short). SGD is very similar to gradient descent, but instead of using the actual gradient it uses an approximation of the gradient that is more efficient to compute. This model is also sophisticated enough to adjust the learning rate as the SGD algorithm iterates, so in many cases you won’t have to worry about setting the learning rate.

SGDRegressor also uses a technique called regularization that encourages the model to find smaller parameters. Regularization is beyond the scope of this article, but it’s important to note that the use of regularization can sometimes result in finding different coefficients than OLS would have.

**If your data set is simply too large for your computer to handle OLS, you can use SGDRegressor. It will not find the exact optimal parameters, but it will get close enough for all practical purposes and it will do so without using too much computing power.** 

Here’s an example.

In [11]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import SGDRegressor

# Import the data set
X, y = load_diabetes(return_X_y=True)

# Create the SGD linear regression model
# max_iter is the maximum number of iterations of SGD to try before halting
sgd = SGDRegressor(max_iter = 10000)

# Fit the model to the data
sgd.fit(X, y)

# Print the coefficients of the model
print(sgd.coef_)

# Print R^2
print(sgd.score(X, y))

[  13.20925913 -174.67575821  460.30248544  289.0409847   -31.96825226
  -92.86256473 -202.42859655  130.2586078   383.84128416  124.59945089]
0.5072418527869915


### Gradient Descent in Other Machine Learning Algorithms

Gradient descent can be used for much more than just linear regression. In fact, it can be used to train any machine learning algorithm as long as the ML algorithm has a loss function that is a differentiable function of the ML algorithm’s parameters. In more intuitive terms, gradient descent can be used whenever the loss function looks like smooth terrain with hills and valleys (even if those hills and valleys live in a space with more than 3 dimensions).

Gradient descent (or variations of it) can be used to find parameters in **logistic regression models, support vector machines, neural networks, and other ML models**. Gradient descent’s flexibility makes it an essential part of machine learning.
