# Ridge Regression

`Ridge regression` is a type of `linear regression` that is used to `deal` with `multicollinearity`. An `additional penalty term` is `added` to the `cost function` in order to `shrink` the `coefficients` of the `predictor variables` towards `zero`.

The `strength` of the `penalty term` is controlled by a `hyperparameter` $λ$. `Ridge regression` can be used in situations where there are `many predictor variables`, but the number of observations is relatively small. It can also be useful when there is multicollinearity among the predictor variables, which can lead to unstable coefficient estimates.

Ridge regression has several advantages over traditional linear regression, including reduced variance, improved stability of coefficient estimates, and the ability to handle multicollinearity. However, it also has some limitations, including the need to tune the hyperparameter λ and the possibility of introducing bias into the coefficient estimates if the penalty term is too strong.
$$J_{\beta_-modified} = RSS_\beta + \lambda \sum\beta^2$$

$$\beta_{Ridge} = \frac {x^Ty}{x^TX+\lambda}$$


In [2]:
import numpy as np 
import pandas as pd 

Lets assume we have two arrays `train` and `test` of the same length

In [17]:
features = np.array([[x for x in range(1 , 101)] ,
                  [x for x in range(1 , 101)]])

target = np.array([x for x in range(2 , 202 , 2)])

In [18]:
features.shape

(2, 100)

In [19]:
target.shape

(100,)

In [20]:
features

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
         14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
         27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
         40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
         53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
         66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
         79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
         92,  93,  94,  95,  96,  97,  98,  99, 100],
       [  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
         14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
         27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
         40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
         53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
         66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  7

In [21]:
target

array([  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,  26,
        28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,  52,
        54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,  78,
        80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102, 104,
       106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130,
       132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156,
       158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182,
       184, 186, 188, 190, 192, 194, 196, 198, 200])

Lets see this in a dataframe form for better understanding

In [22]:
pd.DataFrame(features)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100


This data is in column wise, but we need in row wise, for this we just need to take transpose of the array

In [23]:
pd.DataFrame(features.T)

Unnamed: 0,0,1
0,1,1
1,2,2
2,3,3
3,4,4
4,5,5
...,...,...
95,96,96
96,97,97
97,98,98
98,99,99


Now lets create an identity matrix of the same size 

In [24]:
x = pd.DataFrame(features.T)

In [14]:
np.identity(x.shape[1])

array([[1., 0.],
       [0., 1.]])

Now we need to apply the formula $$(X_{train}^T.X_{train} +α.X_{train}^T.Y_{train})^{-1}$$

In [15]:
I = np.identity(x.shape[1])

In [16]:
alpha = 1e-7

In [25]:
result = np.linalg.inv(np.dot(x.T , x) + alpha*I).dot(x.T).dot(target)

In [26]:
result

array([1.00017666, 1.0001755 ])

So now we have applied the formula, these are actually the values of `coefficients` and `intercepts`

Now we just need to put this all into a function for better usage 

In [None]:
class MultipleRidgeRegression():

  def __init__(self , alpha):
    self.coef = None
    self.interpet = None
    self.alpha = alpha

  def fit(self , X_train , Y_train):
    
    X_train = np.insert(X_train , 0 , 1 , axis = 1)
    I = np.identity(X_train.shape[1])
    result = np.linalg.inv(np.dot(X_train.T , X_train) + self.alpha*I).dot(X_train.T).dot(Y_train)
    self.intercept = result[0]
    self.coef = result[1:]

  def predict(self , X_test):

    Y_pred = np.dot(X_test.coef) + self.intercept

    return Y_pred