### Linear Regression with Synthetic Data

In this coding exercise, you will work with synthetic data and implement linear regression using different design matrices. The goal is to calculate the beta hat estimates, y hat estimates, and residual estimates for each design matrix. You will also generate additional synthetic data with added Gaussian noise and assess the error of the regression models.

Instructions:
1. Generate a synthetic dataset (using a cosine function, a linear trend, and Gaussian noise).
2. Define Design Matrix 1 with only the mean and linear trend.
3. Define Design Matrix 2 by adding the cosine function to Design Matrix 1.
4. Calculate the beta hat estimates for Design Matrix 1 and 2 using the provided formula in the slides.
5. Calculate the y hat estimates for Design Matrix 1 and 2 using the calculated beta hat estimates.
6. Calculate the residual estimates for Design Matrix 1 and 2 by subtracting the y hat estimates from the true values.
7. Generate additional synthetic data with added Gaussian noise and calculate the y hat estimates for Design Matrix 1 and 2 using the new data.
8. Assess the error by calculating the mean squared error (MSE) for Design Matrix 1 and 2, and the new data for both design matrices.
9. Print the MSE values for Design Matrix 1 and 2, and the new data for both design matrices.
10. Plot the true values, y hat estimates for Design Matrix 1 and 2 to visualize the linear regression models.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic dataset
T = 500
beta_0 = -0.5
beta_1 = 0.2
beta_2 = 0.5

np.random.seed(0)
x = np.linspace(0, 10, T)
y_true = beta_0 + beta_1*X + beta_2*np.cos(x) + np.random.normal(0, .2, T)

# TODO: Visualize the data

# Are we doing supervised or unsupervised learning here?

# TODO: Define the design matrix for Model 1 (mean and linear trend: Model 1)
X1 = # TODO

# TODO: Define the design matrix for Model 1 (mean, linear and cosine trend: Model 2)
X2 = # TODO


# TODO: Calculate beta hat estimates for Model 1 and 2: b=(X'X)**(-1)X'Y?
beta_hat_1 = # TODO
beta_hat_2 = # TODO

# Print beta estimates
print("Beta hat for Model 1:", beta_hat_1)
print("Beta hat for Model 2:", beta_hat_2)

# TODO: Calculate y hat estimates for Model 1 and 2: y=xb?
y_hat_1 = # TODO
y_hat_2 = # TODO

# TODO: Plot the y estimates for both models

# TODO: Calculate residual estimates for Model 1 and 2
residual_1 = # TODO
residual_2 = # TODO

# TODO: Calculate mean squared error (MSE: E(y - y_hat)**2) for Model 1 and 2
mse_1 = # TODO
mse_2 = # TODO

print("MSE for Model 1:", mse_1)
print("MSE for Model 2:", mse_2)


# Generate additional synthetic data with added Gaussian noise
x_new = np.linspace(7, 9, T//2)
y_true_new = # TODO

# TODO: Calculate y hat estimates for Model 1 and 2 with the new data
y_hat_1_new = # TODO
y_hat_2_new = # TODO

# TODO: Calculate mean squared error (MSE: E(y - y_hat)**2) for Model 1 and 2 with the new data
mse_1_new = # TODO
mse_2_new = # TODO

# Print MSE values
print("MSE for Model 1 (New Data):", mse_1_new)
print("MSE for Model 2 (New Data):", mse_2_new)

# Which one is the training error? which one the tessting error?
# What is the link between MSE and sigma?

### Optimization Approach for Linear Regression

In this coding exercise, we will explore an optimization approach for linear regression. Using a cost function and a gradient function, you will implement the gradient descent algorithm to find the optimal coefficients for each model. By completing this exercise, you will gain experience with an optimization-based approach to linear regression and understand the importance of evaluating models on both training and testing data.

Your task:
1. Implement the cost function, which calculates the mean squared error.
2. Implement the gradient function, which computes the gradient of the cost function.
3. Implement the gradient descent algorithm to find the optimal coefficients for both models.
4. Calculate the y hat estimates for Model 1 and 2 using the obtained coefficients.
5. Calculate the residuals for Model 1 and 2.
6. Calculate the mean squared error (MSE) for Model 1 and 2.
7. Generate additional synthetic data and calculate the y hat estimates and MSE for the new data.
8. Interpret and compare the MSE values for both models and discuss the training and testing errors.

In [None]:
# TODO: Define the cost function (mean squared error)
def cost_function(X, y, beta):
    assert X.shape[0] == y.shape[0]
    assert X.shape[1] == beta.shape[0]
    # TODO

# TODO: Define the gradient function for the cost function
def gradient_function(X, y, beta):
    assert X.shape[0] == y.shape[0]
    assert X.shape[1] == beta.shape[0]
    # TODO

# TODO: Define the optimization algorithm (gradient descent)
def gradient_descent(X, y, learning_rate, num_iterations):
    num_features = X.shape[1]
    beta = np.zeros(num_features)
    costs = []
    
    for i in range(num_iterations):
        if i % 10 == 0: print("Minimizing cost function for lenear regression, iteration {}".format(i))
        grad = # TODO
        beta -= # TODO
        cost = # TODO
        costs.append(cost)
    
    return beta, costs

In [None]:
# Set the learning rate and number of iterations for gradient descent
learning_rate = 0.0001
num_iterations = 100

# TODO: Perform gradient descent for Model 1 and 2
beta_hat_1, costs_1 = gradient_descent(# TODO)
beta_hat_2, costs_2 = gradient_descent(# TODO)

# Print beta estimates
print("Beta hat for Model 1 (using optimization method):", beta_hat_1)
print("Beta hat for Model 2 (using optimization method):", beta_hat_2)

# TODO: Plot the cost functions evolution for both model

# What are your conclusions? Did the optimization work? Did the learning stop?
# What parameters can you change to improve the results? What are the advantages what are the limits of this approach?

# TODO: Calculate y hat estimates for Model 1 and 2
y_hat_1 = # TODO
y_hat_2 = # TODO

# TODO: Plot the data and the estimated y

# TODO: Calculate residual estimates for Model 1 and 2
residual_1 = # TODO
residual_2 = # TODO

# TODO: Calculate mean squared error (MSE: E(y - y_hat)**2) for Model 1 and 2
mse_1 = # TODO
mse_2 = # TODO

print("MSE for Model 1:", mse_1)
print("MSE for Model 2:", mse_2)

# TODO: Calculate y hat estimates for Model 1 and 2 with the new data
y_hat_1_new = # TODO
y_hat_2_new = # TODO

# TODO: Calculate mean squared error (MSE: E(y - y_hat)**2) for Model 1 and 2 with the new data
mse_1_new = # TODO
mse_2_new = # TODO

# Print MSE values
print("MSE for Model 1 (New Data):", mse_1_new)
print("MSE for Model 2 (New Data):", mse_2_new)

# What are your conclusions?