# CS229 Assignment 3 - First Coding Question

We will be implementing the $l_1$ regularised least squares algorithm from question 3 of problem set 3 (https://see.stanford.edu/materials/aimlcs229/problemset3.pdf ) as part of Stanford's Engineering Everywhere CS229 Machine Learning course, taught by Andrew Ng in '08 ( https://see.stanford.edu/course/cs229).

In this question we will be minimising the cost function $J(\theta)=\frac{1}{2}||X\theta - \vec{y}||_2^2 + \lambda ||\theta||_1$, where $X$ is our matrix of training examples, $\vec{y}$ is a vector of the correct classifications and $\theta$ is the parameters for our learning algorithm. The $\lambda$ parameter contorls the amount of penilisation we want to apply.

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

As with assignment $1$ we first load in our data. Note that $X$ is a $20\times 100$ sized matrix, and $Y$ is a $20\times 1$ vector, so that means we have a $100$ input features, with $20$ training pairs. Hence we have $\theta\in\mathbb{R}^{101}$, once we account for the constant $x_0:=1$ column we will append.

In [49]:
xData=np.asmatrix(np.loadtxt("data/q3/x.dat"))
yData = np.loadtxt("data/q3/y.dat")
m=xData.shape[0] # number of training examples
n=xData.shape[1]+1 # number of features
converganceConst=0.00001
extraCol=np.ones((m,1))
xData=np.append(extraCol, xData, axis=1)
xData.shape

(20, 101)

To perform the coordinate descent on $J$ we need to be able to differentiate it, but this is difficult when we have the $||\theta||_1=\sum|\theta_i|$ term there as this is not-differentiable in any of the $\theta_i$. Hence the assignment has us rewrite $J$ cleverly to allow us to perform this differentiation. Once we have done this we see that we get $\theta_i$ updates to the max of $0$ and the following quantity



$$\theta_i = \frac{-X_i^T(X\bar{\theta}-\vec{y})-\lambda s_i}{X_i^TX_i},$$
where $X_i$ is the $i$th column of $X$, $\bar{\theta}$ is $\theta$ with the $i$th entry set to $0$ and $s_i$ is the sign of $\theta_i$. We first write a function to update a single $\theta_i$.

In [54]:
def updateSingleTheta(i, X, y, theta, lambdaConst):
    Xi=X[i]
    thetai=theta[i]
    thetaBar=theta
    thetaBar[i]=0
    numerator=-np.transpose(Xi)*(X*thetaBar-y)-lambdaConst*np.sign(thetai)
    denominator=np.transpose(Xi)*Xi
    return np.maximum(numerator/denominator, 0)

We now combine these together to update the whole $\theta$ vector.

In [55]:
def updateTheta(X, y, theta, lambdaConst):
    diffs=[]
    for i in range(n):
        thetai=theta[i]
        theta[i]=updateSingleTheta(i, X, y, theta, lambdaConst)
        diffs.append(abs(theta-thetai))
    maxDiff=np.amax(diffs)
    return theta, maxDiff

We need to now write a function to repeat this updating until our $\theta$ converges. From the assgnment we are told to stop once all of the coordinates in one pass change by less than $10^{-5}$. We do this as part of the l1ls(X, y, lambda) function the assignment asks us to create.

In [56]:
def l1ls(X, y, lambdaConst):
    iterate=True
    theta=np.zeros((n,1))
    while(iterate):
        theta, maxDiff=updateTheta(X, y, theta, lambdaConst)
        if maxDiff < converganceConst:
            iterate=False
    return theta

In [57]:
l1ls(xData, yData, 0.001)

ValueError: shapes (101,1) and (20,20) not aligned: 1 (dim 1) != 20 (dim 0)