In [1]:
from IPython.display import IFrame

In [2]:
IFrame("final.pdf", width=1000, height=1000)

## Problem 1

__[e]__

By definition, since $Q=10$, the order of the highest term in the transform will be $10$. The book tells us that the feature transform $\Phi_Q$ maps a two-dimensional vector $\textbf{x}$ to $$ \tilde{d} = \dfrac{Q(Q+3)}{2} $$ dimensions. Plugging in $Q=10$, we have $$ \tilde{d} = 65. $$

## Problem 2

__[d]__

We can immediately eliminate choice __[a]__ since if $\mathcal{H}$ is a singleton hypothesis set then any hypothesis trained on any dataset will be the same $g \in \mathcal{H}$, meaning $\bar{g}$ is the same as that single hypothesis in $\mathcal{H}$. 

Choice __[b]__ is also out because if $\mathcal{H}$ is the set of constant, real-valued hypotheses, then the average $\bar{g}$ of any number of hypotheses $g^{(\mathcal{D})} \in \mathcal{H}$ will also be in $\mathcal{H}$, because the average of any set of real-valued constants is also a real-valued constant.

Let's examine choice __[c]__. Linear regression will output a hypothesis of the form $ax + b$ defined by the weights learned. If we think about this numerically or graphically, we can see that the average of any number of lines with real-valued parameters will also be a line that has real-valued parameters, so we can eliminate this as well.

Logistic regression outputs a hypothesis of the form $$ h(\textbf{x}) = \theta(\textbf{w}^T\textbf{x}), $$ where $\theta(s) = \frac{e^s}{1 + e^s}$.

## Problem 3

__[d]__

To determine overfitting, we want to compare the values of $(E_{out} - E_{in})$ among different hypotheses, which can tell us if we are encountering this issue or not.

## Problem 4

__[d]__

By definition, deterministic noise depends on the hypothesis set _and_ the target function, because it tracks the hypothesis set's inability to fully approximate the target function (not powerful enough).

By contrast, stochastic noise is entirely dependent on the target distribution, because a dataset where the data points have noise is generated by the target distribution and is not affected in any way by the hypothesis set.

## Problem 5

__[a]__

On page 130 of LfD, it says if $\textbf{w}_{\text{lin}}^T\textbf{w}_{\text{lin}} \leq C$, then $\textbf{w}_{\text{reg}} = \textbf{w}_{\text{lin}}$ because $\textbf{w}_{\text{lin}} \in \mathcal{H}(C)$, where $$ \mathcal{H}(C) = \{ h \ \vert \ h(x) = \textbf{w}^T\textbf{z}, \textbf{w}^T\textbf{w} \leq C \}. $$

## Problem 6

__[b]__

We discussed how to translate a soft-order constraint like the basic regularization one into an augmented error as follows: $$ E_{aug}(\textbf{w}) = E_{in}(\textbf{w}) + \lambda\textbf{w}^T\textbf{w}, $$ where $\lambda \geq 0$ is now a free parameter that we can choose.

## Problem 7

In [3]:
train_file = 'features.train'
test_file = 'features.test'

In [4]:
def load_data(filename):    
    with open(filename, "r") as f:
        data = []
        for line in f:
            if line.strip():
                y, x1, x2 = line.split()
                data.append([ int(float(y)), float(x1), float(x2) ])
    return data

In [5]:
data_train = load_data(train_file)
data_test = load_data(test_file)

In [6]:
def k_vs_all(k, data):
    """
    k-vs-all: recall in 1-vs-all one digit has class +1 w/ rest -1.
    k is just number k--so basically if the data pt has value k (e.g. 1, etc.) we label it with +1 in new y
    """
    y = [data[i][0] for i in range(len(data))]
    X = [[data[i][1], data[i][2]] for i in range(len(data))]
    
    y_new = []
    ct = 0
    for i in range(len(y)):
        if y[i] == k:
            y_new.append(1.0)
            ct += 1
        else:
            y_new.append(-1.0)
            
    return y_new, X

In [9]:
def linearRegression(X, y):        
    # convert to np
    X = np.array(X)
    y = np.array(y)
    X_inv = np.linalg.pinv(X)
    
    # use w = X_inv * y one-shot learning
    return X_inv.dot(y)

In [7]:
def regularizedLinearRegression(samplePoints, l):
    """
    perform LR with regularization, where l is lambda
    """
    X = []
    y = []
    y_location = len(samplePoints[0]) -1 # y's location is assumed to be the last element in the list
    
    # construct X, split y vals
    for point in samplePoints:
        X.append(np.array(point[:y_location]))
        y.append(point[y_location])
        
    weights = linearRegression(samplePoints) # get weights to use for regularization
    X = np.array(X)
    
    # get the regularized form of the inverse, taking the pseudo-inv of X + lambda/N * wTw
    X_regInv = np.linalg.pinv(X + np.array(l / len(samplePoints) * weights.dot(weights)))
    
    return X_regInv.dot(y) # again using one-shot learning

In [22]:
def regularizedLinearRegression2(X, y, l):
    """
    perform LR with regularization, where l is lambda
    """
    weights = linearRegression(X, y) # get weights to use for regularization
    
    X = np.array(X)
    y = np.array(y)
    
    # get the regularized form of the inverse, taking the pseudo-inv of X + lambda/N * wTw
    X_regInv = np.linalg.pinv(X + np.array(l / len(X) * weights.dot(weights)))
    
    return X_regInv.dot(y) # again using one-shot learning

In [11]:
 def X_reshape(X):
    num_ex = X.shape[0]
    X_res = np.c_[np.ones(num_ex), X]

    return X_res

In [12]:
def predict(weights, X):
    real_X = X_reshape(X)
    cur_h = np.matmul(real_X, weights)
    return cur_h

In [14]:
def calc_error(weights, X, Y):
    nb_ex = X.shape[0]
    predicted = np.sign(predict(weights, X))
    num_incorrect = np.sum(np.not_equal(predicted, np.sign(Y)))
    prop_incorrect = float(num_incorrect)/float(num_ex)
    return prop_incorrect

In [15]:
def X_reshape_nlt(X):
    num_ex = X.shape[0]
    #nlt = (1, x1, x2, x1x2, x1^2, x2^2)
    X_mult = np.prod(X, axis=1)
    X_res = np.c_[np.ones(num_ex), X, X_mult, np.square(X)]
    return X_res

In [19]:
import numpy as np

In [23]:
for val in range(10):
    y_train, X_train = k_vs_all(val, data_train)
    y_test,  X_test = k_vs_all(val, data_test)
    y_train, X_train = np.array(y_train), np.array(X_train)
    y_test, X_test = np.array(y_test), np.array(X_test)
    
    Xtrain_nlt = X_reshape_nlt(X_train)
    Xtest_nlt = X_reshape_nlt(X_test)
    
    w = regularizedLinearRegression2(X_train, y_train, 1)
    
    E_in = calc_error(w, X_train, y_train)
    E_out = calc_error(w, X_test, y_test)
    
    print("___ %d vs all ___" % val)
    print("no transform: e_in = %f, e_out = %f" % (E_in, E_out))
    
    w_nlt = regularizedLinearRegression2(Xtrain_nlt, y_train, 1)
    Ein_nlt = calc_error(w_nlt, X_train, y_train)
    Eout_nlt = calc_error(w_nlt, X_test, y_test)
    
    print("transform: e_in = %f, e_out = %f" % (Ein_nlt, Eout_nlt))

ValueError: shapes (7291,3) and (2,) not aligned: 3 (dim 1) != 2 (dim 0)

## Problem 8

## Problem 9

## Problem 10

## Problem 11

## Problem 12

## Problem 13

## Problem 14

## Problem 15

## Problem 16

## Problem 17

## Problem 18

## Problem 19

## Problem 20