# Ridge Regression

The fewer degrees of freedom it has, the harder it will be for it to overfit the data.

For a linear model, regularization is typically achieved by constraining the weights of
the model. We will now look at Ridge Regression, Lasso Regression, and Elastic Net,
which implement three different ways to constrain the weights

## Ridge Regression

##### Ridge Regression - Tikhonov regularization:
a regularization term equal to 
![image.png](attachment:image.png)

 is added to the cost function.
 
 ![image-2.png](attachment:image-2.png)
 
 This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. 
 Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to evaluate the model’s performance using the unregularized performance measure
 
 ![image-3.png](attachment:image-3.png)
 
 The hyperparameter α controls how much you want to regularize the model. If α = 0
then Ridge Regression is just Linear Regression. If α is very large, then all weights end up very close to zero and the result is a flat line going through the data’s mean.
![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)


![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Using a matrix factorization technique by André-Louis
Cholesky

In [12]:
from sklearn.linear_model import Ridge, SGDRegressor
import numpy as np

X = np.random.rand(100,1)
y = 0.5*X**2 + 10*X +20
ridge_reg = Ridge(alpha=1, solver='cholesky')
ridge_reg.fit(X, y)
ridge_reg.predict([[1.5]])

array([[34.43036421]])

In [13]:
sgd_reg = SGDRegressor(penalty='l2')

In [16]:
sgd_reg.fit(X, y.ravel())
sgd_reg.predict([[1.5]])

array([35.4630891])

The penalty hyperparameter sets the type of regularization term to use. Specifying
"l2" indicates that you want SGD to a  norm of the weight vector: this is simply Ridge
Regression.