# Exercise - Regularized polynomial regression

In [1]:
from scipy.io import loadmat
import matplotlib.pyplot as plt
import numpy as np

In the first part of this exercise, you will implement regularized linear regression to
predict the amout of water flowing out of a dam as a function of the change of water
level in a reservoir.

The provided dataset consists of one feature (change in water level) and one label
(amout of water flowing out of the dam) and it is divided into three parts corresponding
to the training, validation and test sets.

In [2]:
# Change the path if needed
data = loadmat('data/poly_regression/ex5data1.mat')

In [None]:
# Extract train, test, validation data
X, y = data['X'], data['y'][:, 0] # training set
Xtest, ytest = data['Xtest'], data['ytest'][:, 0]
Xval, yval = data['Xval'], data['yval'][:, 0]

y = y.reshape(-1,1)
yval = yval.reshape(-1,1)
ytest = ytest.reshape(-1,1)

# Number of samples
N = y.size

# Plot training data
plt.plot(X, y, 'ro', ms=6, mec='k', mew=1)
plt.xlabel('Change in water level (x)')
plt.ylabel('Water flowing out of the dam (y)');
plt.show()

In [4]:
# adding column of ones
X_ = np.hstack((np.ones((X.shape[0],1)), X))
Xval_ = np.hstack((np.ones((Xval.shape[0],1)), Xval))

In [5]:
# Auxiliary function for dataset normalization
def featureNormalize(X):
    mu = np.mean(X, axis=0)
    X_norm = X - mu

    sigma = np.std(X_norm, axis=0, ddof=1)
    X_norm /= sigma
    return X_norm, mu, sigma

1. Implement the `RegularizedLinearRegression` class by extending the `LinearRegression`
   class seen in the previous exercises. The cost function should include a L2
   regularization term and the appropriate gradient of it should be computed for
   gradient descent.

In [6]:
class RegularizedLinearRegression:
    def __init__(self, num_features=1):
        pass

    # Here lam is the regularization factor
    def fit(self, X, y, learning_rate, epochs, lam=0.):
        pass
    
    def loss(self, prediction, y, lam):
        pass

    def predict(self, X):
        pass

2. Train the model with zero regularization (standard linear regression) and plot the
   best fit line along with the training data. (Dataset normalization is not needed).

3. Compute the errors on the training and the validation datasets.

## Adding polynomial features

The problem with our linear model was that it was too simple for the data, and resulted in underfitting (high bias). In this part of the exercise, you will address this problem by adding more features. For polynomial regression, our hypothesis has the form:
    $$
    \begin{align}
    h_\theta(x)  &= \theta_0 + \theta_1 \times (\text{waterLevel}) + \theta_2 \times
    (\text{waterLevel})^2 + \cdots + \theta_p \times (\text{waterLevel})^p \\   
    & = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_p x_p
    \end{align}
    $$

Notice that by defining $x_1 = (\text{waterLevel})$, $x_2 = (\text{waterLevel})^2$ ,
$\cdots$, $x_p = (\text{waterLevel})^p$, we obtain a linear regression model where the
features are the various powers of the original value (waterLevel).

Now, you will add more features using the higher powers of the existing feature $x$ in
the dataset. 

1. Complete the code in the function `polyFeatures` in the next cell. The function should map the original training set $X$
of size $N \times 1$ into its higher powers. Specifically, when a training set $X$ of size $N \times 1$ is passed into the function, the function should return a $N \times p$ matrix `X_poly`, where column 1 holds the original values of X, column 2 holds the values of $X^2$, column 3 holds the values of $X^3$, and so on. Note that you don’t have to account for the zero-eth power in this function.

In [11]:
def polyFeatures(X, p):
    """
    Maps X (1D vector) into the p-th power.
    
    Parameters
    ----------
    X : array_like
        A data vector of size m, where m is the number of examples.
    
    p : int
        The polynomial power to map the features. 
    
    Returns 
    -------
    X_poly : array_like
        A matrix of shape (m x p) where p is the polynomial 
        power and m is the number of examples. That is:
    
        X_poly[i, :] = [X[i], X[i]**2, X[i]**3 ...  X[i]**p]
    """
    # X_poly = ...
    return X_poly

In [12]:
# Build the polynomial features up to 8-th degree
p = 8

# Datasets standardization
X_poly = polyFeatures(X, p)
X_poly, mu_X, sigma_X = featureNormalize(X_poly)
X_poly = np.concatenate([np.ones((N, 1)), X_poly], axis=1)
ynorm, mu_y, sigma_y = featureNormalize(y)

X_poly_test = polyFeatures(Xtest, p)
X_poly_test -= mu_X
X_poly_test /= sigma_X
X_poly_test = np.concatenate([np.ones((ytest.size, 1)), X_poly_test], axis=1)
ytest -= mu_y
ytest /= sigma_y

X_poly_val = polyFeatures(Xval, p)
X_poly_val -= mu_X
X_poly_val /= sigma_X
X_poly_val = np.concatenate([np.ones((yval.size, 1)), X_poly_val], axis=1)
yval -= mu_y
yval /= sigma_y

2. Train the model using the polynomial features. Choose 0.01 for the learning rate and
   10000 for the epochs. Plot the history of the loss. 

In [13]:
# lr = ...

3. Execute the following cell to plot the training set and the fitted polynomial model.

In [None]:
# We plot a range slightly bigger than the min and max values to get
# an idea of how the fit will vary outside the range of the data points
x = np.arange(np.min(X) - 25, np.max(X) + 25, 0.05).reshape(-1, 1)

# Map the X values
x_poly = polyFeatures(x, p)
x_poly -= mu_X
x_poly /= sigma_X

# Add ones
x_poly = np.concatenate([np.ones((x.shape[0], 1)), x_poly], axis=1)

# De-scale predictions
y_pred = sigma_y*lr.predict(x_poly) + mu_y
plt.plot(x, y_pred, '--', lw=2)

plt.plot(X, y, 'ro', ms=6, mec='k', mew=1)

plt.xlabel('Change in water level (x)')
plt.ylabel('Water flowing out of the dam (y)');
plt.show()

4. Compute the MSE on the training and the validation datasets.

5. Find the value of the regularization parameter in the interval $[0,10]$ that minimizes the
   validation MSE. Plot the training and validation MSEs as a function of the learning rate. 

6. Use the optimal value of the regularization parameter found in the previous step to re-train the
   model. Plot the model predictions and the training data.

7. Evaluate the MSEs on the training, validation and test sets for the re-trained model.