## Mathematical Principles in Pattern Recognition (2016/2017)
$\newcommand{\bPhi}{\mathbf{\Phi}}$
$\newcommand{\bb}{\mathbf{b}}$
$\newcommand{\bx}{\mathbf{x}}$
$\newcommand{\bw}{\mathbf{w}}$
$\newcommand{\bt}{\mathbf{t}}$
$\newcommand{\by}{\mathbf{y}}$
$\newcommand{\bm}{\mathbf{m}}$
$\newcommand{\bS}{\mathbf{S}}$
$\newcommand{\bI}{\mathbf{I}}$
$\newcommand{\bA}{\mathbf{A}}$
$\newcommand{\bQ}{\mathbf{Q}}$
$\newcommand{\bR}{\mathbf{R}}$
$\newcommand{\bX}{\mathbf{X}}$
$\newcommand{\bsigma}{\boldsymbol{\sigma}}$
$\newcommand{\bmu}{\boldsymbol{\mu}}$
$\newcommand{\bpi}{\boldsymbol{\pi}}$

# Lab 5

In the computer labs we will work with the Python programming language within a Jupyter notebook. Each week a new notebook is made available that contains the exercises that are to be handed-in. 

* You are expected to work in pairs
* Only one of each pair has to submit on blackboard. Make sure that you add the student ID of your partner in the submission comments.
* The main notebook file you submit should read "Lab[number]_[last name 1]_[last name 2].ipynb", for example "Lab2_Bongers_Versteeg.ipynb". 
* Please make sure your code will run without problems!

Feel free ask any questions during the computer lab sessions, or email the TA, Elise (e.e.vanderpol@uva.nl).


**The due date for the labs is Friday, Oct 27 at 23:59**

In [None]:
%pylab inline

In this tutorial we will work on one final project, instead of the step-by-step exercises of previous labs.

## 1. Final project: Overfitting
**[100 points]** Create a project about overfitting in the remainder of this notebook using markdown cells for equations and comments and code cells for code. Make sure to touch upon the following topics:
1. Use the wine data set to show what *overfitting* is in terms of a regression problem. (see: white_data.npy, white_targets.npy, red_data.npy, red_targets.npy)
2. Discuss how low and high *bias* and *variance* come into play here using figure(s), and write down what *model complexity* has to do with it.
3. One way to deal with your overfitted data in a frequentist setting is regularized regression. Use your pick of regularized regression here and apply a cross-validation scheme to determine the regularization parameter $\lambda$. 
4. Finally, shortly explain the Bayesian point-of-view on what you have done and how this would prevent overfitting. How could you use the Bayesian method to select the best model for your data? Contrast between model averaging and model selection and use the latter to select a good model.

For more background information, refer to Bishop 1.1, 1.3, 1.5, 3.1.4, 3.2, 3.4!
 
Notes on implementation:
* Make sure that your hand-in is self-contained, understandable to read from start to end with an introduction about overfitting and overall conclusion or outlook.
* This time we emphasize code cleanness and will allocate **[20 points]** to the readability of your code and graphical output.
* Use your own implementations instead of standard Python machine-learning tools, like `sk-learn`. More standard modules like `numpy` are allowed as always.
* As always: make sure you submit all included data and files necessary to run your notebook out-of-the-box!

In [None]:
# We converted the data files to python 3
def read_input_data():
    white_feat = load('white_data_beter.npy')
    white_tar = load('white_targets_beter.npy')
    red_feat = load('red_data_beter.npy')
    red_tar = load('red_targets_beter.npy')
    return white_feat, white_tar, red_feat, red_tar


def design_matrix(n, white_feat, red_feat):
    white_size = shape(white_feat)
    red_size = shape(red_feat)
    feat_size = 1 + n*white_size[1]
    X = ones((white_size[0]+red_size[0], feat_size))
    if n==1:
        X[:white_size[0], 1:] = white_feat
        X[white_size[0]:, 1:] = red_feat
        return X
    elif n== 2:
        X[:white_size[0], 1:white_size[1]+1] = white_feat
        X[white_size[0]:, 1:white_size[1]+1] = red_feat
        X[:white_size[0], white_size[1]+1:] = square(white_feat)
        X[white_size[0]:, white_size[1]+1:] = square(red_feat)
        return X
    else:
        return X
    
def train_test_split(X, target, ratio):
    assert len(X) == len(target)
    p = numpy.random.permutation(len(target))
    X, target = X[p], target[p]
    
    X_train = X[:int(floor(len(X)*ratio)),:]
    X_test = X[int(floor(len(X)*ratio)):,:]
    
    target_train = target[:int(floor(len(target)*ratio)),:]
    target_test = target[int(floor(len(target)*ratio)):,:]
    
    return X_train, X_test, target_train, target_test


In [None]:

def gradient_descent(alpha, n, X, y, labda):
    w = ones((shape(X)[1], 1))
    for i in range(n):
        w -= alpha * ((X.T@(X@w-y))/shape(y)[0] - 2*labda*w)
    return w

def cost(X, w, y, labda):
    temp = X@w-y
    temp = (temp.T@temp) / shape(X)[0] - labda*sqrt((w.T@w))
    return temp[0,0]**2

In [None]:
labda = 0.001 # magic number
alpha = 0.0001 # magic numer
iterations = 100000
n = 2


w_f, w_t, r_f, r_t = read_input_data()

for j in range(5):
    alpha *= 0.1
    for i in range(1,n+1):
        X = design_matrix(i, w_f, r_f)
        target = append(w_t, r_t, axis=0)
        X_train, X_test, target_train, target_test =  train_test_split(X, target, 0.90)

        w = gradient_descent(alpha, iterations, X_train, target_train, labda)
        wine_cost = cost(X_test, w, target_test, labda)
        print("The cost of the test set is %s."%(wine_cost))
        print("There are polynomials of degree %d in the design matrix and the learning rate is %s.\n"%(i,alpha) )

### Evaluation of test results
The wine dataset has been used to train several models with linear regression. We can see in the test results above that the design matrix that contains the second degree polynomials don't give a useful model since the regression does not converge untill the learning rate is lower than 1e-9. In comparison, the linear polyomials converges to a decent result when the learning rate is 1e4 times higher. This shows that using second degree polynomials causes the linear tegession odel to overfit.

In [None]:
# What happens to the cost when we increase the regularization parameter?
c = []
labda_space = linspace(0.1, 0.4, 15)
for l in labda_space:
    w = gradient_descent(alpha, iterations, X_train, target_train, l)
    c.append(cost(X_test, w, target_test, l))
c = array(c)

plt.plot(labda_space, c);

In [None]:
def plot_result(X, w, y, dimension=1):
    plt.scatter(X[:, dimension], y, alpha=0.3)
    plt.scatter(X[:, dimension], X@w, alpha=0.3)

# Plot testdata with found weights for each dimension
plt.figure(figsize=(15,20))
for dim in range(12):
    plt.subplot(6, 2, dim + 1)
    plot_result(X_test, w, target_test, dimension=dim)
plt.show()
