# Learning From the Data, Homework 6

## Overfitting and deterministic noise

### Problem 1

Deterministic noise depends on $\mathcal{H}$, as some models approximate $f$ better than others. Assume that $\mathcal{H'} \subset \mathcal{H}$ and that $f$ is fixed. **In general** (but not necessarily in all cases), if we use $\mathcal{H'}$ instead of $\mathcal{H}$, how does deterministic noise behave?

Assuming that there are fewer hypotheses in set $\mathcal{H'}$ than $\mathcal{H}$, then $\mathcal{H'}$ is less likely to approximate $f$ as well as $\mathcal{H}$, therefore deterministic noise will increase.

Answer: [**b**] In general, deterministic noise will increase.



## Regularization with Weight Decay

In the following problems use the data provided in the files:

http://work.caltech.edu/data/in.dta

http://work.caltech.edu/data/out.dta

as a training and test set respectively. Each line of the files corresponds to a two-dimensional input $\mathbf{x} = (x_1, x_2)$, so that $\mathcal{X} = \mathbb{R}^2$, followed by the corresponding label from $\mathcal{Y} = \{-1, 1\}$. We are going to apply Linear Regression with a non-linear transformation for classification. The nonlinear transformation is given by 

$\Phi(x_1, x_2) = (1, x_1, x_2, x_1^2, x_2^2, x_1x_2, \left|x_1 - x_2\right|, \left|x_1 + x_2\right|)$.

Recall that the classification error is defined as the fraction of misclassified points.

### Problem 2

Run Linear Regression on the training set after performing the non-linear transformation. What values are closest (in Euclidean distance) to the in-sample and out-of-sample classifcation errors, respectively?


In [67]:
import numpy as np
import pandas as pd

def linear_regression(data, W=None, lamd=0):
    # Extract and transform data
    X1 = data[:,0]
    X2 = data[:,1]
    Y = data[:,[2]]
    
    Z = np.column_stack((np.ones(X1.shape), 
                         X1, 
                         X2, 
                         X1**2, 
                         X2**2,
                         X1*X2,
                         np.abs(X1 - X2),
                         np.abs(X1 + X2)))
    # Compute W if not given
    if W is None:
        Z_square = np.dot(Z.T, Z)
        W = np.dot(np.linalg.inv(Z_square + np.identity(Z_square.shape[0])*lamd), np.dot(Z.T, Y))
    
    Y_hat = np.sign(np.dot(W.T, Z.T))
    error = np.sum(Y_hat.T != Y)/Y.shape[0]
    return error, W

def run_regression(train, test, lamd=0):
    # Compute E_in and E_out
    E_in, W_lin = linear_regression(train, lamd=lamd)
    E_out, _ = linear_regression(test, W=W_lin)
    return E_in, E_out

# Read in data
train = pd.read_table("http://work.caltech.edu/data/in.dta", delim_whitespace=True, header=None).values
test = pd.read_table("http://work.caltech.edu/data/out.dta", delim_whitespace=True, header=None).values

# Compute E_in and E_out
print(run_regression(train, test))

print(run_regression(train, test, lamd=10**(-3)))
print(run_regression(train, test, lamd=10**(3)))

for i in [2, 1, 0, -1, -2]:
    print(i, ":", run_regression(train, test, lamd=10**(i)))

(0.028571428571428571, 0.084000000000000005)
(0.028571428571428571, 0.080000000000000002)
(0.37142857142857144, 0.436)
2 : (0.20000000000000001, 0.22800000000000001)
1 : (0.057142857142857141, 0.124)
0 : (0.0, 0.091999999999999998)
-1 : (0.028571428571428571, 0.056000000000000001)
-2 : (0.028571428571428571, 0.084000000000000005)


Answer: [**a**] 0.03, 0.08