Importing packages

In [1]:
using CSV, DataFrames,LinearAlgebra, Statistics

We are using min-max feature scaling.
$$Normalize(\vec{X}) = \frac{\vec{X} -\vec{X}_{min}}{\vec{X}_{max}-\vec{X}_{min}} $$

In [2]:
Normalize(X)=(X .- findmin(X,dims=1)[1]) ./ (findmax(X,dims=1)[1] - findmin(X, dims=1)[1])

Normalize (generic function with 1 method)

We will use root mean-squared error as our primary performance metric.

$$RMSE(\vec{y},\hat{y}) = \sqrt{\frac{1}{n}\sum{y_i-\hat{y}_i}}$$

where $\vec{y}$ is a vector of actual values, and $\hat{y}$ is a vector of predicted values.

In [6]:
RMSE(y,ŷ) = sqrt.(sum(ŷ-y).^2/ size(y)[1])

RMSE (generic function with 1 method)

We are using a standard linear model.
$$lm(\vec{X},\vec{\beta}) = \vec{X}^{T}\vec{\beta}$$

Wher $\vec{X}$ is a matrix of observations, and $\vec{\beta}$ is a vector of coefficients. Note that, if you wish there to be a bias term, it must be concatenated to $\vec{X}$ as a column of 1's prior to using the function below.

In [5]:
lm(X,β)= X * β

lm (generic function with 1 method)

We are wanting to minimize the residual sum of squares (RSS) function. Which thankfully, has a closed-form solution!

Objective Function: 

$$\vec{\hat{\beta}}=\min_{\vec{\hat{\beta}}} L(D, \vec{\beta}) =\min_{\vec{\hat{\beta}}} \sum_{i=1}^{n}{(\hat{\beta} .\vec{x_i} - y_i)^2}$$
$$L(D,\vec{\beta})=||X\vec{\beta} - Y||^2$$
$$=(X\vec{\beta}-y)^T(X\vec{\beta}-Y)$$
$$=Y^TY-Y^TX\vec{\beta}-\vec{\beta}^TX^TY+\vec{\beta}^TX^TX\vec{\beta}$$

Get gradient w.r.t. $\vec{\beta}$

$$\frac{\partial{L(D,\vec{\beta})}}{\partial{\vec{\beta}}} = \frac{\partial{(Y^TY-Y^TX\vec{\beta}-\vec{\beta}^TX^TY+\vec{\beta}X^TX\vec{\beta}})}{\partial{\vec{\beta}}}$$
$$= -2Y^TX+2\vec{\beta}^TX^TX$$
$$=-2Y^TX+2\vec{\beta}+2\vec{\beta}^TX^TX$$

Set gradient to zero.

$$=-2Y^TX+2\vec{\beta}^TX^TX=0$$
$$Y^TX=\vec{\beta}^TX^TX$$
$$X^TY=X^TX\vec{\beta}$$
$$\vec{\beta}=(X^TX)^{-1}X^TY$$

In [3]:
ols(X,y)= pinv(X' * X) * X' * y

ols (generic function with 1 method)

With ridge regression, we add an L2 regularization term. Regularization terms are meant to prevent overfitting data by penalizing large weights. It works by taking the L2 norm (otherwise known as the euclidean distance) of the weights and adding them to the loss function. We will also have a $\lambda$ coefficient on the l2-norm, to be able to tune the weight of the penalty on the overall loss.

L2 regularization is defined as follows:
$$norm_2(\vec{\beta})=||\vec{\beta}||_2 = \beta^T\beta$$

The resulting objective functions comes out as follows:
$$\vec{\hat{\beta}}=\min_{\vec{\hat{\beta}}} L(D, \vec{\beta}) =\min_{\vec{\hat{\beta}}} \sum_{i=1}^{n}{(\hat{\beta} .\vec{x_i} - y_i)^2} + \lambda |\beta|$$
$$L(D,\vec{\beta})=||X\vec{\beta} - Y||^2$ + \lambda||\beta||_2$$
$$=(\vec{X}\vec{\beta}-\vec{Y})^T(\vec{X}\vec{\beta}-\vec{Y}) + \lambda\beta^T\beta$$

$$=Y^TY-Y^TX\vec{\beta}-\vec{\beta}^TX^TY+\vec{\beta}^TX^TX\vec{\beta}+\lambda\beta^T\beta$$

Get gradient w.r.t. $\vec{\beta}$

$$\frac{\partial{L(D,\vec{\beta})}}{\partial{\vec{\beta}}} = \frac{\partial{(Y^TY-Y^TX\vec{\beta}-\vec{\beta}^TX^TY+\vec{\beta}^TX^TX\vec{\beta}+\lambda\beta^T\beta)}{\partial{\vec{\beta}}}$$
$$= -2Y^TX+2\vec{\beta}^TX^TX$$
$$=-2Y^TX+2\vec{\beta}+2\vec{\beta}^TX^TX$$

In [4]:
lasso_ols(X,y,λ) = pinv(X' * X + λ * I) * X' * y

ridge_ols (generic function with 1 method)

In [7]:
wine = DataFrame(CSV.File("/home/john/Documents/julia_data/winequality-red.csv"));

In [8]:
X=Array(wine[!,names(wine)[1:11]])
y=Array(wine[!,"quality"])
X = Normalize(X)
X=hcat(X,ones(size(X)[1]));

In [9]:
RMSE(y,lm(X,ridge_ols(X,y,1)))

0.13499589484380095

In [10]:
RMSE(y,lm(X,ols(X,y)))

4.013955839618286e-12