In [1]:
import numpy as np

Ridge Regression has been invented independently in many different contexts - Andrey Tikhonov (1906 - 1993) takes most of the credit.

### Generate some data

In [2]:
intercept = 10
coefficient = 2
x = np.random.normal(0, 1, 100)
eps = np.random.normal(0, 1, 100)
y = (coefficient * x) + intercept + eps
# Add intercept term to x for convenience of notation
i = np.repeat(1, x.shape[0])
X = np.vstack([i,x]).T

### Model

Our objective takes on the familiar *loss and penalty* structure. We seek to find paramaters $\beta$ that minimize the RSS loss subject to an L2 penalty:


$\beta^{ridge} = argmin_\beta \bigg\{ \sum_\limits{i=1}^N (y_i - \beta_0 - \sum_\limits{j=1}^p x_{ij}\beta_j)^2 + \lambda \sum_\limits{j=1}^p \beta_j^2 \bigg\}$

Which can equivalently be expressed as:

$\beta^{ridge} = argmin_\beta \sum_\limits{i=1}^N (y_i - \beta_0 - \sum_\limits{j=1}^p x_{ij}\beta_j)^2$

*subject to:*

$\sum_\limits{j=1}^p \beta_j^2 \leq t$

### Analytical Solution

In matrix notation we have:

$RSS(\lambda) = (\mathbf{y} - \mathbf{X}\beta)^T (\mathbf{y} - \mathbf{X}\beta) + \lambda \beta ^T \beta$

Differentiating with respect to our parameters $\beta$:

$\frac{\partial RSS}{\partial \beta} = -2\mathbf{X}^T (y - \mathbf{X} \beta) + 2 \lambda \beta$

Setting to zero and solving for $\beta$ yields the analytical solution:

$\beta^{ridge} = (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^T \mathbf{y}$

In [3]:
lambdav = 0.1
beta = np.linalg.inv(((X.T @ X) + np.identity(X.shape[1]) * lambdav)) @ X.T  @ y
print('intercept: {}'.format(beta[0]))
print('coefficient: {}'.format(beta[1]))

intercept: 10.100778031650291
coefficient: 1.8578351773690924


### Numerical Solution - Gradient Descent

In [11]:
def partial_rss_wrt_beta(X, lambdav ,beta):
    opr1 =  -2 * X.T @ (y - (X @ beta))
    opr2 = 2 * lambdav * beta
    return opr1 + opr2

In [12]:
lambdav = 0.1
beta = np.repeat(0, 2)
alpha = 0.001
i = 0  
while i < (10 ** 5):
    partial = partial_rss_wrt_beta(X, lambdav, beta)
    partial
    beta = beta - (alpha * partial)
    i += 1

print('intercept: {}'.format(beta[0]))
print('coefficient: {}'.format(beta[1]))

intercept: 10.100778031650288
coefficient: 1.8578351773690915
