# Linear Classification

In this lab you will implement parts of a linear classification model using the regularized empirical risk minimization principle. By completing this lab and analysing the code, you gain deeper understanding of these type of models, and of gradient descent.


## Problem Setting

The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal (1) and abnormal (0). The training data contains 80 SPECT images from which 22 binary features have been extracted. The goal is to predict the label for an unseen test set of 187 tomography images.

In [1]:
import urllib
import pandas as pd
import numpy as np
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

testfile = urllib.request.URLopener()
testfile.retrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/spect/SPECT.train", "SPECT.train")
testfile.retrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/spect/SPECT.test", "SPECT.test")

df_train = pd.read_csv('SPECT.train',header=None)
df_test = pd.read_csv('SPECT.test',header=None)

train = df_train.as_matrix()
test = df_test.as_matrix()

y_train = train[:,0]
X_train = train[:,1:]
y_test = test[:,0]
X_test = test[:,1:]

  app.launch_new_instance()


In [2]:
#print(df_train)

### Exercise 1

Analyze the function learn_reg_ERM(X,y,lambda) which for a given $n\times m$ data matrix $\textbf{X}$ and binary class label $\textbf{y}$ learns and returns a linear model $\textbf{w}$.
The binary class label has to be transformed so that its range is $\left \{-1,1 \right \}$. 
The trade-off parameter between the empirical loss and the regularizer is given by $\lambda > 0$. 
Try to understand each step of the learning algorithm and comment each line.

In [3]:
def learn_reg_ERM(X,y,lbda):
    max_iter = 200
    e  = 0.001
    alpha = 1.

    w = np.random.randn(X.shape[1]);
    for k in np.arange(max_iter):
        h = np.dot(X,w)
        l,lg = loss(h, y)
        print ('loss: {}'.format(np.mean(l)))
        r,rg = reg(w, lbda)
        g = np.dot(X.T,lg) + rg 
        if (k > 0):
            alpha = alpha * (np.dot(g_old.T,g_old))/(np.dot((g_old - g).T,g_old))
        w = w - alpha * g
        if (np.linalg.norm(alpha * g) < e):
            break
        g_old = g
    return w

### Exercise 2

Fill in the code for the function loss(h,y) which computes the hinge loss and its gradient. 
This function takes a given vector $\textbf{y}$ with the true labels $\in \left \{-1,1\right \}$ and a vector $\textbf{h}$ with the function values of the linear model as inputs. The function returns a vector $\textbf{l}$ with the hinge loss $\max(0, 1 − y_{i} h_{i})$ and a vector $\textbf{g}$ with the gradients of the hinge loss at the points $h_i$. The partial derivative of the hinge loss $h_i$ with respect to the $i$-th position of the weight vector $\textbf{w}$ is $g_{i} = −y x_{i}$ if $l_{i} > 0$, else $g_{i} = 0$).

In [4]:
def loss(h, y):

    l = np.maximum(0,1 - y * h)
    g = - y *(l > 0) # if l>0 we have to calculate lg because we want to go to the "minimum"
    
    return l, g

### Exercise 3

Fill in the code for the function reg(w,lambda) which computes the $\mathcal{L}_2$-regularizer and the gradient of the regularizer function at point $\textbf{w}$. 


$$r = \frac{\lambda}{2} \textbf{w}^{T}\textbf{w}$$

$$g = \lambda \textbf{w}$$

In [5]:
def reg(w, lbda):
    r = (lbda/2) * w.dot(w.T)
    g = lbda * w
    return r, g

### Exercise 4

Fill in the code for the function predict(w,x) which predicts the class label $y$ for a data point $\textbf{x}$ or a matrix $X$ of data points (row-wise) for a previously trained linear model $\textbf{w}$. If there is only a data point given, the function is supposed to return a scalar value. If a matrix is given a vector of predictions is supposed to be returned.

In [6]:
def predict(w, X):

    preds = 2 * (np.dot(X,w) > 0) - 1
    
    return preds

### Exercise 5

#### 5.1 
Train a linear model on the training data and classify all 187 test instances afterwards using the function predict. 
Please note that the given class labels are in the range $\left \{0,1 \right \}$, however the learning algorithm expects a label in the range of $\left \{-1,1 \right \}$. Then, compute the accuracy of your trained linear model on both the training and the test data. 

In [7]:
# y_train = [-1 if y_i == 0 else 1 for y_i in y_train]
# y_test = [-1 if y_i == 0 else 1 for y_i in y_test]

# convert labels so that they are -1 or 1: 
# 2*0-1 = -1
# 2*1-1 = 1
y_train = 2 * y_train - 1
y_test = 2 * y_test - 1

w = learn_reg_ERM(X_train, y_train, 10)
y_hat_train = predict(w, X_train)
y_hat_test = predict(w, X_test)

loss: 1.4909402278161443
loss: 12.883554496385182
loss: 0.8736583157648209
loss: 1.2042370734046783
loss: 1.110854306512659
loss: 0.7460660314504904
loss: 0.9606878169757309
loss: 0.7169945490661647
loss: 0.720498780137845
loss: 0.7174356529627762
loss: 0.7300285298553355
loss: 0.7274899805915994
loss: 0.7182819258744523
loss: 0.7193391445123106
loss: 0.7189409422208869
loss: 0.7212500000000001
loss: 0.7680808131505948
loss: 0.7207932955282388
loss: 0.7185459609731646
loss: 0.7213820425527337
loss: 0.7193209498221477
loss: 0.7662500000000001
loss: 0.7216549367347408
loss: 0.7192856020498459
loss: 0.7276659617134514
loss: 0.7199270590666378
loss: 0.719581731515466
loss: 0.7562500000000001
loss: 0.7349987544374365
loss: 0.7250555642292088
loss: 0.7281358077352313
loss: 0.7184119721029008
loss: 0.7193887177566441
loss: 0.7185385965430597
loss: 0.7190818758826
loss: 0.7186529966392009
loss: 0.7188231633587823
loss: 0.7184549692957758
loss: 0.7183234236496763
loss: 0.7189446094150627
loss: 

#### 5.2
Compare the accuracy of the linear model with the accuracy of a random forest and a decision tree on the training and test data set.

In [8]:
##################
#INSERT CODE HERE#
##################