In this problem we tackle the learning problem for the canonical case of one-dimensional functions. This seemingly simple problem provides a rich test bed for understanding several interesting concepts both theoretically and empirically since almost everything can be visualized in this domain.




In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import fixed


## Part a): Generating training data

Run the  cells in part a) (i.e upto the beginning of part b)) and visualize the noisy training data. Play around with the standard deviation of the additive white gaussian noise. What do you observe?



### Generating $x_i$
Before we can learn a function from its samples we must first generate the training data $(x_i, y_i)_{i=1,\dots,n}$. Here $n$ is the number of training samples. In this problem we consider two ways of sampling $x_i$ for training data. <br>
1. $x_i$ sampled uniform randomly from $[-1,1]$.
2. $x_i$ from an evenly spaced grid on the interval $[-1,1]$.
   For example, for $n$ = 4, then we have the samples (-1, -0.5, 0, 0.5). Note that the endpoint 1, is not included in our training set.
   This kind of evenly spaced samples gives rise to interesting properties of the feature matrix when using Fourier features and we will see this later in this problem.


In [None]:
def generate_x(n, x_type, x_low=-1, x_high=1):
    if x_type == 'grid':
        x = np.linspace(x_low, x_high, n, endpoint = False).astype(np.float64)

    elif x_type == 'uniform_random':
        x = np.sort(np.random.uniform(x_low, x_high, n).astype(np.float64))
        #Note that for making it easy for plotting we sort the randomly sampled x in ascending order
    else:
        raise ValueError


    return x


### Generating $y_i$
Now that we have $x_i$  we can use it to generate $y_i = f(x_i)$ where $f$ is the true function that we are trying to learn.  In this problem we consider the following:
1. $y = x$.
2. $y = cos(2\pi x)$.
3. $y = sgn(x) =  \begin{cases}
         -1, & x \leq 0 \\
         1, & x = 0
        \end{cases}$



In [None]:
def mysign(x):
    y = np.sign(x)
    y[x == 0] = 1
    return y

def generate_y(x, f_type):
    if f_type == 'x':
        y = x

    elif f_type == 'cos2':
        y = np.cos(2*np.pi * x)

    elif f_type == 'sign':
        y = mysign(x)

    else:
        raise ValueError

    return y


### Visualizing training data


In [None]:
def plot_training_data(x_type):
    n = 64
    x_true = generate_x(x_type = 'grid', n=1000)
    x_train = generate_x(x_type=x_type, n=n)
    labels = ['y=x', 'y=cos(2${\pi}$x)', 'y=sgn(x)']
    for k,f_type in enumerate(['x', 'cos2', 'sign']):
        y_true = generate_y(x=x_true, f_type=f_type)
        y_train = generate_y(x=x_train, f_type=f_type)
        plt.plot(x_true, y_true, linewidth = 0.5)
        plt.ylabel('y')
        plt.xlabel('x')
        plt.scatter(x_train, y_train, marker='o', label = labels[k])
    plt.legend(bbox_to_anchor  = (1.03, 0.97))
    plt.show()



slider = widgets.RadioButtons(
    options=['uniform_random', 'grid'],
    description='x_type:',
    disabled=False
)


interactive_plot = interactive(plot_training_data, x_type=slider)
output = interactive_plot.children[-1]
interactive_plot



### Noise in training data

Often in practice our training data is noisy. We model this in our problem by assuming that our samples for y are corrupted by Additive White Gaussian Noise (AWGN).  The true function is given by $y = f(x)$. The amount of noise is controlled by the standard deviation of the noise which we denote by awgn_std. The noiseless case corresponds to awgn_std = 0


In [None]:
def add_awgn_noise(y, awgn_std=0):
    noise = np.random.normal(0, awgn_std, y.shape)
    y_noisy = y + noise
    return y_noisy


### Visualizing noise in training data


In [None]:
def plot_noisy_training_data(awgn_std):
    n = 64
    np.random.seed(7)
    x_true = generate_x(x_type = 'grid', n=1000)
    x_train = generate_x(x_type='uniform_random', n=n)
    f_type = 'x'
    y_true = generate_y(x=x_true, f_type=f_type)
    y_train_clean = generate_y(x=x_train, f_type=f_type)
    y_train = add_awgn_noise(y_train_clean, awgn_std=awgn_std)
    plt.plot(x_true, y_true, linewidth = 0.5, label = 'True function')
    plt.ylabel('y')
    plt.xlabel('x')
    plt.ylim([-4,4])
    plt.scatter(x_train, y_train, marker='o', label = 'Training samples')
    plt.legend(loc = 'upper right', bbox_to_anchor  = (1.43, 0.97))

    plt.show()


slider= widgets.FloatLogSlider(
    value=-3,
    base=10,
    min=-3, # max exponent of base
    max=1, # min exponent of base
    step=0.2, # exponent step
    description='awgn_std',
    continuous_update= False
)

interactive_plot = interactive(plot_noisy_training_data, awgn_std=slider)
output = interactive_plot.children[-1]
interactive_plot


## Part b): Featurization- Lifting the training data


Run the cell in part b) and understand visually how the polynomial features and Fourier features look like. Intuitively which choice of feature type will be easier to learn the function $y=x$? What about the function $y = cos (2 \pi x)$?




We would like to learn the function from training samples by performing linear regression. However if the true function is not affine in $x$, we cannot hope to learn the function well. The key is to first lift the training data, i.e featurize it and express it in a richer space which we can then use to perform linear regression. For instance we could think of a polynomial featurization of $x$ given by $[1, x, x^2]$. Using this featurization if the true function were quadratic (say $y = x^2$),  by performing linear regression in the feature space we will succeed in learning a good approximation to the function. We will consider two types of featurizations that lift the one dimensional data into a d-dimensional feature space.


### Polynomial features
We consider the d-dimensional features given by the Vandermonde polynomials:
    $\phi(x) = [1, x, x^2, \dots, x^{d-1}]$


In [None]:
from numpy.polynomial.polynomial import polyvander
def featurize_vandermonde(x, d, normalize = False):
    A = polyvander(x, d-1)
    for d_ in range(A.shape[1]):
        if normalize:
            A[:,d_] *=  np.sqrt(2*d_+1)
    return A


In [None]:
def plot_poly_features(d):
    n = 128
    d_max = 20
    x_type = 'uniform_random'
    np.random.seed(7)
    x_true = generate_x(x_type = 'grid', n=1000)
    x_train = generate_x(x_type=x_type, n=n)
    phi_train = featurize_vandermonde(x_train, d_max)
    phi_true = featurize_vandermonde(x_true, d_max)

    plt.plot(x_true, phi_true[:,d], linewidth = 0.5)
    plt.scatter(x_train, phi_train[:,d], marker='o')
    plt.ylim([-1.2,1.2])
    plt.xlabel('x')
    plt.ylabel('$\phi(x)$')
    plt.show()


slider = widgets.IntSlider(
    value=0,
    min=0,
    max=10,
    step=1,
    description='Feature # k:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)


interactive_plot = interactive(plot_poly_features, d=slider)
output = interactive_plot.children[-1]
interactive_plot


### Fourier features
We consider the d-dimensional real Fourier features given by:<br>
    $\phi(x) = [1, \sin(\pi x), \cos(\pi x), \sin(2 \pi x), \cos(2\pi x), \dots,  \sin (r \pi x), \cos(r \pi x)]$, <br>
    where $r = \frac{d-1}{2}$.

Note that by this convention we require $d$ to be an odd integer.


In [None]:
from numpy.polynomial.polynomial import polyvander
def featurize_fourier(x, d, normalize = False):
    assert (d-1) % 2 == 0, "d must be odd"
    max_r = int((d-1)/2)
    n = len(x)
    A = np.zeros((n, d))
    A[:,0] = 1
    for d_ in range(1,max_r+1):
        A[:,2*(d_-1)+1] =  np.sin(d_*x*np.pi)
        A[:,2*(d_-1)+2] =  np.cos(d_*x*np.pi)

    if normalize:
        A[:,0] *= (1/np.sqrt(2))
        A *= np.sqrt(2)
    return A



In [None]:
def plot_fourier_features(x_type,d):
    n = 128
    d_max = 21
    np.random.seed(7)
    x_true = generate_x(x_type = 'grid', n=1000)
    x_train = generate_x(x_type=x_type, n=n)
    phi_train = featurize_fourier(x_train, d_max)
    phi_true = featurize_fourier(x_true, d_max)

    plt.plot(x_true, phi_true[:,d], linewidth = 0.5)
    plt.scatter(x_train, phi_train[:,d], marker='o')
    plt.ylim([-1.2,1.2])
    plt.xlabel('x')
    plt.ylabel('$\phi(x)$')
    plt.show()


slider1 = widgets.RadioButtons(
    options=['uniform_random', 'grid'],
    description='x_type:',
    disabled=False
)


slider2 = widgets.IntSlider(
    value=0,
    min=0,
    max=20,
    step=1,
    description='Feature # k:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)


interactive_plot = interactive(plot_fourier_features, d=slider2, x_type=slider1)
output = interactive_plot.children[-1]
interactive_plot


In [None]:
def featurize(x, d, phi_type, normalize = False):
    function_map = {'polynomial':featurize_vandermonde, 'fourier':featurize_fourier}
    return function_map[phi_type](x,d,normalize)


## Part c): Linear Regression to learn the 1d-function in feature space

Fill in the code in the functions solve_ls and solve_ridge to populate a variable "coeffs" that contains the minimizer $w^*$. We want you to learn how to use  sklearn. To check for correctness you can compare your solutions using sklearn to that obtained using the explicit formulas/numpy's least squares function.

Then run cells in part c).  Report the learned coeffs $w^*$ and the training loss value from "CELL LS_LINE".



#### Least squares
To learn the function we will perform linear regression in the lifted feature space, i.e. we learn a set of coefficients $w \in \mathbb{R}^d$ to minimize the least-squares loss:

$\ell(w) = \frac{1}{n} \| y - \phi w \|_2^2$.

Assume that all training samples $x_i$ are unique.
Here $\phi$ has dimensions $n \times d$. Depending on how $n$ and $d$ are related we are in one of the following regimes:
1. $n > d$. This is the underparameterized regime and we have more samples than free parameters and there exists a unique solution that minimizes our loss, <br>
$w^* = (\phi^\top \phi)^{-1} \phi^\top y$.
2. $n=d$. In this case the $\phi$ matrix is square and invertible and the solution is given by,
<br>
$w^* = (\phi)^{-1}  y$.
3. $n < d$. This is the overparameterized regime and there are infinitely many solutions that acheive zero training loss. In this problem we consider the minimum $\ell_2$ norm solution given by,<br>
$w^* =  \phi^\top  ( \phi \phi^\top)^{-1} y$.


In [None]:
#LABEL: CELL LR

from sklearn.linear_model import LinearRegression
def solve_ls(phi, y):

    LR = LinearRegression(fit_intercept=False, normalize=False)

    ### start c1 ###

    ### end c1 ###

    loss = np.mean((y- phi@coeffs)**2)
    return coeffs, loss


#### Ridge

We can  add a regularizing penalty term to the least squares objective to perform ridge regression where we minimize the loss,

$\ell(w) = \frac{1}{n} (\| y - \phi w \|_2^2+ \lambda \| w \|_2^2)$.

In this case, irrespective of n and d, the minimizer is unique and is given by,

$w^* =  \phi^\top  ( \phi \phi^\top + \lambda I_n)^{-1} y = (\phi^\top \phi + \lambda I_d)^{-1} \phi^\top y.$



In [None]:
from sklearn.linear_model import Ridge

def solve_ridge(phi, y, lambda_ridge):

    Rdg = Ridge(fit_intercept=False, normalize=False, alpha = lambda_ridge)

    ### start c2 ###

    ### end c2 ###

    loss = np.mean((y- phi@coeffs)**2) + lambda_ridge*np.mean(coeffs**2)
    return coeffs, loss


### Learning function using least squares
Let us use least squares and try to learn the function $y = x$ from noisy training samples using polynomial features.








In [None]:
#LABEL: CELL LS_LINE

def get_params1():
    np.random.seed(18)
    n = 64
    d = 2
    awgn_std = 1e-1
    x_type = 'uniform_random'
    phi_type = 'polynomial'
    f_type = 'x'

    return n, d, awgn_std, x_type, phi_type, f_type

n, d, awgn_std, x_type, phi_type, f_type = get_params1()

x_train = generate_x(x_type=x_type, n=n)
phi_train = featurize(x_train, d, phi_type)
y_train = generate_y(x=x_train, f_type=f_type)
y_train = add_awgn_noise(y_train, awgn_std)
w, loss = solve_ls(phi_train , y_train)

# print("w:", w)
# print("loss:", loss)


## Part d)  Visualizing the learned function

Fill in the code in function get_plot_data to populate the variable "y_plot_pred" that contains the function values predicted by our learned function. Then run the cells in part d) to visualize the learned function plotted alongside the true function.


### Visualizing learned function vs true function

Now that we have a set of coefficients that determines our learned function we would like to plot the learned function alongside the true function to visually determine how good our approximation is


In [None]:
n, d, awgn_std, x_type, phi_type, f_type = get_params1()

def get_plot_data( f_type, phi_type, d, w, n_plot = 1000):
    x_plot= generate_x(x_type = 'grid', n=n_plot)
    y_plot_true = generate_y(x=x_plot, f_type=f_type)

    #Computing predictions via learned function

    phi_plot = featurize(x_plot, d, phi_type)

    ### start d ###

    ### end d ###

    return x_plot, y_plot_true, y_plot_pred


x_plot, y_plot_true, y_plot_pred = get_plot_data(f_type, phi_type, d, w, n_plot = 1000)
plt.plot(x_plot, y_plot_pred, 'o-', ms=2, label = 'Learned function')
plt.plot(x_plot, y_plot_true, label = 'True function')
plt.scatter(x_train, y_train, marker='o', s=10, label = 'Training samples')
plt.legend()
plt.show()


## Part e)  Calculating goodness of approximation/fit

Fill in code in function get_fit_mse to populate the variable "fit_error" that contains the fit mean squared-error an empirical approximation to the distance between the true function and the learned function. Run cells in part e) and report the fit mean squared error.



To quantify the goodness of approximation/fit we would like to measure the "distance" between the learned function and the true function. Since we are interested in the functions in the range (-1,1) one way to measure distance is to compute the integral,
$d(f_{true}, f_{learned}) = \frac{1}{2} \int_{-1}^{1} (f_{true} (x) - f_{learned}(x))^2 dx$

We can approximate this integral by taking an empirical mean of the squared difference between the function values of the true and learned function over a batch. We construct this batch by sampling uniform randomly from the interval $[-1,1]$.


In [None]:
#LABEL: CELL FIT MSE
n, d, awgn_std, x_type, phi_type, f_type = get_params1()

def get_fit_mse( f_type, phi_type, d, w, n_fit = 1000):
    x_fit= generate_x(x_type = 'uniform_random', n=n_fit)
    y_fit_true = generate_y(x=x_fit, f_type=f_type)

    #Computing predictions via learned function

    phi_fit = featurize(x_fit, d, phi_type)
    y_fit_pred = phi_fit @ w

    ### start e ###

    ### end e ###
    return fit_error


fit_mse = get_fit_mse(f_type, phi_type, d, w, n_fit = 1000)

# print("Fit MSE: ", fit_mse)


## Part f): Choice of feature type matters
Run the cells in part f) and explore the effect of number of features $d$, and the choice of feature type in learning the functions $y = x$, and $y = cos(2\pi x)$. What do you observe?




While using both fourier and polynomial features we can approximate any function in an interval, the number of features that we need to consider varies based on the function we are trying to approximate. Some functions are easy to approximate using polynomial features while others are easier to approximate using Fourier features.



In [None]:
def run1(n, d, x_type, f_type, phi_type, seed = 1, awgn_std = 0, lambda_ridge = 0):


    #Set randomness
    np.random.seed(seed)

    x_train = generate_x(x_type=x_type, n=n)
    phi_train = featurize(x_train, d, phi_type)
    y_train = generate_y(x=x_train, f_type=f_type)

    if awgn_std != 0:
        y_train = add_awgn_noise(y_train, awgn_std)

    if lambda_ridge == 0:

        #Write one line of code calling the function to solve least squares that returns w, loss
        ### start f1 ###

        ### end f1 ###
    else:
        #Write one line of code calling the function to solve ridge regression that returns w, loss

        ### start f2 ###

        ### end f2 ###

    return x_train, y_train, w, loss


def visualize1(x_train, y_train, f_type, phi_type, d, w, loss, n_plot = 1000, n_fit = 1000):
    x_plot, y_plot_true, y_plot_pred = get_plot_data(f_type, phi_type, d, w, n_plot)
    plt.plot(x_plot, y_plot_true, label = 'True function')
    plt.scatter(x_train, y_train, marker='o', s=20, label = 'Training samples')
    plt.plot(x_plot, y_plot_pred, 'o-', ms=2, label = 'Learned function')


    fit_mse = get_fit_mse(f_type, phi_type, d, w, n_fit = n_fit)
    plt.title("Train loss:" + str("{:.2e}".format(loss)) + ", Fit MSE: " + str("{:.2e}".format(fit_mse)))
    plt.ylim([-1.5, 1.5])
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend(bbox_to_anchor = (1.03, 0.97))
    plt.show()

#     plt.plot(w, 'o')
    markerlines, stemlines,  baseline = plt.stem(np.arange(d), w, 'b', 'o',  use_line_collection=True)
    plt.setp(stemlines, 'color', plt.getp(markerlines,'color'))
    plt.xlabel('feature #(k)')
    plt.ylabel('weight')
    plt.show()


    return fit_mse



In [None]:
def plot_feature_type_matters(n, n_plot, n_fit, x_type, f_type, phi_type, seed, awgn_std, lambda_ridge, d):
    x_train, y_train, w, loss = run1(n, d, x_type, f_type, phi_type, awgn_std = awgn_std, lambda_ridge = lambda_ridge, seed = seed)
    fit_mse = visualize1(x_train, y_train, f_type, phi_type, d, w, loss, n_plot , n_fit)


In [None]:
#CELL FTM 1
def get_params2():
    n = 64
    n_plot = 1000
    n_fit = 10000
    x_type = 'uniform_random'
    seed = 1
    awgn_std = 0
    lambda_ridge = 0
    return n, n_plot, n_fit, x_type, seed, awgn_std, lambda_ridge



In [None]:
n, n_plot, n_fit, x_type, seed, awgn_std, lambda_ridge = get_params2()
slider1 = widgets.IntSlider(
    value=1,
    min=1,
    max=25,
    step=1,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

f_type = 'x'
phi_type = 'polynomial'
print("Polynomial features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider1)
interactive_plot


In [None]:
n, n_plot, n_fit, x_type, seed, awgn_std, lambda_ridge = get_params2()

slider2 = widgets.IntSlider(
    value=1,
    min=1,
    max=25,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

f_type = 'x'
phi_type = 'fourier'
print("Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider2)
interactive_plot


In [None]:
n, n_plot, n_fit, x_type, seed, awgn_std, lambda_ridge = get_params2()

slider1 = widgets.IntSlider(
    value=1,
    min=1,
    max=25,
    step=1,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

f_type = 'cos2'
phi_type = 'polynomial'
print("Polynomial features")
interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider1)
interactive_plot


In [None]:
n, n_plot, n_fit, x_type, seed, awgn_std, lambda_ridge = get_params2()

slider2 = widgets.IntSlider(
    value=1,
    min=1,
    max=25,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

f_type = 'cos2'
phi_type = 'fourier'
print("Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider2)
interactive_plot


In [None]:
### Measuring the approxmiation error vs d
def plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = None, ylim = -1):
    train_loss_array = []
    fit_mse_array = []
    for d in d_range:
        x_train, y_train, w, loss = run1(n, d, x_type, f_type, phi_type, awgn_std=awgn_std, lambda_ridge = lambda_ridge, seed=seed)
        fit_mse = get_fit_mse(f_type, phi_type, d, w, n_fit = n_fit)
        train_loss_array.append(loss)
        fit_mse_array.append(fit_mse)

    plt.plot(d_range, train_loss_array, 'o-', label = 'Train loss')
    plt.plot(d_range, fit_mse_array, 'o-', label = 'Fit MSE')
    plt.yscale('log')
    plt.ylabel('Loss/Error')
    plt.xlabel('d')
    if title is not None:
        plt.title(title)
    plt.legend(bbox_to_anchor = (1.33, 0.97))


    if ylim != -1:
        plt.ylim(ylim)
    plt.show()



In [None]:
n = 64
n_fit = 10000
x_type = 'uniform_random'
seed = 1
awgn_std = 0
lambda_ridge = 0

phi_type = 'polynomial'
for f_type in ['x', 'cos2']:
    d_range = np.arange(1,25,1)
    plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = 'f_type:' + str(f_type) + ', phi_type: '  + str(phi_type))


phi_type = 'fourier'
for f_type in ['x', 'cos2']:
    d_range = np.arange(1,25,2)
    plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = 'f_type:' + str(f_type) + ', phi_type: '  + str(phi_type))


## Part g): Effect of noise
Run the cells in part g) and explore the effect of noise on our learning process. What do you observe as d grows to n? Does one particualr combination of input sampling/ feature type do better than the others?




Next we will see the effect of noise in our learning method. For this purpose we will consider the three kinds of input data/feature combinations:
1. uniform randomly sampled x, polynomial features
2. uniform randomly sampled x, fourier features
3. evenly spaced x, fourier features

Further in this case, we will see what happens when d grows all the way to n.


In [None]:
#CELL EON 1
def get_params3():
    n = 65
    n_plot = 1000
    n_fit = 10000
    seed = 1
    awgn_std = 1e-1
    lambda_ridge = 0
    return n, n_plot, n_fit, seed, awgn_std, lambda_ridge


In [None]:
n, n_plot, n_fit, seed, awgn_std, lambda_ridge = get_params3()
slider1 = widgets.IntSlider(
    value=1,
    min=1,
    max=65,
    step=1,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
x_type = 'uniform_random'

f_type = 'cos2'
phi_type = 'polynomial'
print("Uniform random x, Polynomial features")
interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider1)
interactive_plot


In [None]:
n, n_plot, n_fit, seed, awgn_std, lambda_ridge = get_params3()
slider2 = widgets.IntSlider(
    value=1,
    min=1,
    max=65,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
x_type = 'uniform_random'

f_type = 'cos2'
phi_type = 'fourier'
print("Uniform random x, Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider2)
interactive_plot


In [None]:
n, n_plot, n_fit, seed, awgn_std, lambda_ridge = get_params3()
slider2 = widgets.IntSlider(
    value=1,
    min=1,
    max=65,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
x_type = 'grid'

f_type = 'cos2'
phi_type = 'fourier'
print("Evenly spaced x, Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = fixed(lambda_ridge), d = slider2)
interactive_plot


In [None]:
n = 65
n_fit = 10000

seed = 1
awgn_std = 1e-1
lambda_ridge = 0
f_type = 'cos2'

x_type = 'uniform_random'
phi_type = 'polynomial'
d_range = np.arange(1,65,1)
plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = 'x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type))

x_type = 'uniform_random'
phi_type = 'fourier'
d_range = np.arange(1,65,2)
plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = 'x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type))

x_type = 'grid'
phi_type = 'fourier'
d_range = np.arange(1,65,2)
plot_errors(n, d_range, n_fit, x_type, seed, awgn_std, lambda_ridge, phi_type, f_type, title = 'x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type))


## Part h) conditioning of $\phi^T \phi$
Run the cells in part h) and visualize the eigenvalue spectrum of $\phi^T \phi$ we see that the matrix can get highly ill conditioned depending on the input sampling and feature type.  Comment on the shape of the curve for poly random. Do you think thats really what the curve should look like? If not can you think of a reason why this is happening?



In [None]:
def plot_eig_values(n, d, seed, lambda_ridge, shadow = False):
    np.random.seed(seed)
    x_type_phi_type_pairs = [('uniform_random', 'polynomial'), ('uniform_random', 'fourier'), ('grid', 'fourier')]

    colors = ['blue', 'green', 'orange']

    for k, (x_type, phi_type) in enumerate(x_type_phi_type_pairs):
        x_train = generate_x(x_type=x_type, n=n)
        phi_train = featurize(x_train, d, phi_type)
        eig_vals,_ = np.linalg.eig(phi_train.T @ phi_train + lambda_ridge*np.eye(d))

#         print(eig_vals)

        eig_vals = np.sort(np.abs(eig_vals))[::-1]
        plt.plot(eig_vals, 'o-', c = colors[k], label = 'x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type))
        if shadow is True and lambda_ridge != 0:

            eig_vals_shadow,_ = np.linalg.eig(phi_train.T @ phi_train )
            eig_vals_shadow = np.sort(np.abs(eig_vals_shadow))[::-1]

            plt.plot(eig_vals_shadow, '-', c = colors[k], label = '$\lambda=0$, x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type), alpha = 0.5)

        #         print(eig_vals)


    plt.legend(bbox_to_anchor = (1.73, 0.97))
    plt.yscale('log')
    plt.ylim(1e-20, 1e4)
    plt.xlim([-1, n+1])

    plt.show()


In [None]:
n = 65
seed = 1
awgn_std = 1e-1
lambda_ridge = 0


slider = widgets.IntSlider(
    value=11,
    min=1,
    max=65,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

interactive_plot =interactive(plot_eig_values,n = fixed(n), d = slider, seed = fixed(seed),
                              lambda_ridge = fixed(lambda_ridge), shadow =fixed(False))
interactive_plot


## Part i) Ridge regression to combat noise
Run the cells in part i) and explore the effect of using ridge regression instead of least squares. Can we use ridge regression to combat the effect of noise and allow us to get good approximation even using $d$ close to $n$? Play around. Can you adjust $\lambda$ to combat the effect of noise at d = n?



In [None]:
#CELL EOR 1
def get_params4():
    n = 65
    n_plot = 1000
    n_fit = 10000
    seed = 1
    awgn_std = 1e-1
    return n, n_plot, n_fit, seed, awgn_std


In [None]:
n, n_plot, n_fit, seed, awgn_std = get_params4()

slider1 = widgets.IntSlider(
    value=65,
    min=1,
    max=65,
    step=1,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

slider3= widgets.FloatLogSlider(
    value=-25,
    base=10,
    min=-25, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)

x_type = 'uniform_random'

f_type = 'cos2'
phi_type = 'polynomial'
print("Uniform random x, Polynomial features")
interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = slider3, d = slider1)
interactive_plot


In [None]:
n, n_plot, n_fit, seed, awgn_std = get_params4()
slider2 = widgets.IntSlider(
    value=65,
    min=1,
    max=65,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

slider3= widgets.FloatLogSlider(
    value=-25,
    base=10,
    min=-25, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)

x_type = 'uniform_random'

f_type = 'cos2'
phi_type = 'fourier'
print("Uniform random x, Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = slider3, d = slider2)
interactive_plot


In [None]:
n, n_plot, n_fit, seed, awgn_std = get_params4()
slider2 = widgets.IntSlider(
    value=65,
    min=1,
    max=65,
    step=2,
    description='d:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

slider3= widgets.FloatLogSlider(
    value=-25,
    base=10,
    min=-25, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)

x_type = 'grid'

f_type = 'cos2'
phi_type = 'fourier'
print("Uniform random x, Fourier features")

interactive_plot =interactive(plot_feature_type_matters,n = fixed(n), n_plot = fixed(n_plot), n_fit = fixed(n_fit),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), lambda_ridge = slider3, d = slider2)
interactive_plot


In [None]:
### Plotting the fit errors
def get_params5():
    n = 65
    n_fit = 10000

    seed = 1
    awgn_std = 1e-1
    # lambda_ridge = 1e-1
    f_type = 'cos2'
    return n, n_fit, seed, awgn_std, f_type


In [None]:
n, n_fit, seed, awgn_std, f_type = get_params5()
x_type = 'uniform_random'
phi_type = 'polynomial'
slider3= widgets.FloatLogSlider(
    value=1e-1,
    base=10,
    min=-6, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)
d_range = np.arange(1,65,1)

interactive_plot =interactive(plot_errors, ylim = fixed([1e-4, 1e5]), n = fixed(n), n_fit = fixed(n_fit),  d_range = fixed(d_range),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), title = fixed('x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type)),  lambda_ridge = slider3, d = slider2)
interactive_plot


In [None]:
n, n_fit, seed, awgn_std, f_type = get_params5()

x_type = 'uniform_random'
phi_type = 'fourier'
slider3= widgets.FloatLogSlider(
    value=1e-1,
    base=10,
    min=-6, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)
d_range = np.arange(1,65,2)

interactive_plot =interactive(plot_errors, ylim = fixed([1e-4, 1e5]), n = fixed(n), n_fit = fixed(n_fit),  d_range = fixed(d_range),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), title = fixed('x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type)),  lambda_ridge = slider3, d = slider2)
interactive_plot


In [None]:
n, n_fit, seed, awgn_std, f_type = get_params5()

x_type = 'grid'
phi_type = 'fourier'
slider3= widgets.FloatLogSlider(
    value=1e-1,
    base=10,
    min=-6, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)
d_range = np.arange(1,65,2)

interactive_plot =interactive(plot_errors, ylim = fixed([1e-4, 1e5]), n = fixed(n), n_fit = fixed(n_fit),  d_range = fixed(d_range),
                              x_type = fixed(x_type),f_type = fixed(f_type), phi_type = fixed(phi_type),seed = fixed(seed),
                              awgn_std= fixed(awgn_std), title = fixed('x_type: ' + str(x_type) + ', phi_type: ' + str(phi_type)),  lambda_ridge = slider3, d = slider2)
interactive_plot



## Part j) conditioning of $\phi^T \phi  + \lambda I$
Run the cells in part j) and observe the eigenvalue spectrum. What is the effect of $\lambda$ on the shape of the eigenvalue curve?


In [None]:
n = 65
d = 65
seed = 1
awgn_std = 1e-1
lambda_ridge = 0
slider3= widgets.FloatLogSlider(
    value=1e-6,
    base=10,
    min=-20, # max exponent of base
    max=2, # min exponent of base
    step=1, # exponent step
    description='$\lambda$',
    continuous_update= False
)
interactive_plot =interactive(plot_eig_values,n = fixed(n), d = fixed(d), seed = fixed(seed), lambda_ridge = slider3, shadow = fixed(True))
interactive_plot
