**Instructions:**

- For questions that require coding, you need to write the relevant code and display its output. Your output should either be the direct answer to the question or clearly display the answer in it.
- For questions that require a written answer (sometimes along with the code), you need to put your answer in a Markdown cell. Writing the answer as a comment or as a print line is not acceptable.
- You need to render this file as HTML using Quarto and submit the HTML file. **Please note that this is a requirement and not optional.** A submission cannot be graded until it is properly rendered.

Import all the libraries and tools you need below.

In [31]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler, Normalizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, accuracy_score, recall_score, precision_score

In this assignment, you will write two more functions to implement parts of the Gradient Descent algorithm. All functions will come together in the next In-Class Assignment.

### 1)

Define a function called `predict`. It should take two inputs: `model_params`, which are the parameters of a Linear Regression model ($w_0, w_1, w_2, ... , w_N$), and `X_test`, which contains the predictors of the test data.

The function should return the predicted responses of the Linear Regression model. You do not need to check or account for any invalid inputs.

**Note:**

- **You are not allowed to use loops/comprehensions.** The implementation should be vectorized.
- Make sure you remember the formula of a Linear Regression model.
- You can use the given test case to check your function.

**(25 points)**

In [32]:
# Input parameters
input_params = np.array([0.42310646, 0.9807642, 0.68482974])
# Dataframe
toy_data = pd.DataFrame([[1,2],[3,4],[5,6]])
# Define a predict function
def predict(model_params, X_test):
    # Add 1 to toy_data
    toy_data_with_intercept = np.c_[np.ones(X_test.shape[0]), X_test]
    # Apply dot product to calculate the prediction
    prediction = np.dot(toy_data_with_intercept, model_params)
    return prediction
    
predict(input_params, toy_data) # Should produce [2.77353014, 6.10471802, 9.4359059]

array([2.77353014, 6.10471802, 9.4359059 ])

### 2)

Define a function called `calc_cost_gradient`. It should take three inputs: (1) `model_params`, which are the (current) parameters of a Linear Regression model ($w_0, w_1, w_2, ... , w_N$), (2) `X_train` and (3) `y_train`, which contain the training predictors and responses, respectively.

The function should return two outputs: (1) `cost`, which is the Mean Squared Error (MSE) cost given the input parameters and the data, and (2) `gradient`, which is the gradient of the MSE cost function ($\nabla MSE$) at the input parameters.

`cost` should be a scalar value. For `gradient`, you have different options, as long as it contains all the partial derivatives: it can be a numpy vector, a list, a dictionary, etc.

**Note:** 

- **You are not allowed to use loops/comprehensions.** The implementation should be vectorized.
- You may consider using the `predict` function inside `calculate_cost_gradient` to make it more organized. (This is just a suggestion.)
- Make sure you follow the derivation of the gradient from the lecture.
- You do not need to check or account for any invalid inputs.
- You can use the given test case to check your function.

**(75 points)**

In [33]:
toy_model_params = np.array([0.42310646, 0.9807642, 0.68482974])
toy_X_train = pd.DataFrame([[1,2],[3,4],[5,6]])
toy_y_train = pd.Series([20,30,40])

# Define a function
def calc_cost_gradient(model_params, X_train, y_train):
    # Predict the y value generated through model parameters (y hat)
    y_train_prediction = predict(model_params, X_train)
    # Force y_train to numpy array
    y_train = np.array(y_train)
    # Calculate the mean square error
    mse_cost_array =((y_train - y_train_prediction) ** 2)/np.size(y_train)
    # Sum the error created by each observation
    mse_cost = sum(mse_cost_array)
    # Next, calculate the gradient
    # Add 1 to X_train
    X_train_with_intercept = np.c_[np.ones(X_train.shape[0]), X_train]
    # Calculate gradient based on in-class derivation
    mse_gradient = (y_train - y_train_prediction) @ X_train_with_intercept * (-2/np.size(y_train))
    print(f"cost: {mse_cost}"), print(f"gradient: w0: {mse_gradient[0]}; w1, w2: {mse_gradient[1], mse_gradient[2]} ")
    return mse_cost, mse_gradient

calc_cost_gradient(toy_model_params, toy_X_train, toy_y_train)

# Should return
    # cost: 600.6332042982852
    # gradient: w0: -47.79056396000001; w1,w2: -161.15519087,-208.94575483

cost: 600.6332042982854
gradient: w0: -47.79056396; w1, w2: (-161.15519086666666, -208.94575482666664) 


(600.6332042982854, array([ -47.79056396, -161.15519087, -208.94575483]))

In [None]:
#Reserve

def GradientDescent(model_params, X_train, y_train, lr, iters):
    # Generate an empty list of gradient, params, and mse_cost
    param_list = [model_params]
    gradient_list = []
    mse_cost_list = []
    for i in range(0, iters): # Repeat the 3rd and 4th step
        # Get the mse_cost and the array of all gradients at the position -1 of array (last)
        mse_cost, gradient_array  = calc_cost_gradient(param_list[-1], y_train, X_train) 
        # put gradient of w0 (of the latest one you just derived) in the gradient list
        gradient_list.append(gradient_array)
        # Also append the mse_cost in the mse_list
        mse_cost_list.append(mse_cost)
        # Do the fourth step (Calculate w1 from w0, or the last vector w in your param_list)
        new_params =  param_list[-1] - lr * gradient_array
        # Put new_params in the list
        param_list.append(new_params)
    # Put the last mse_cost into the list
    last_mse_cost, gradient_array = calc_cost_gradient(param_list[-1], y_train, X_train)
    mse_cost_list.append(last_mse_cost)
    # Print the lowest mse cost from the list
    min_mse = min(mse_cost_list)
    print(f"min_mse: {min_mse}.")
    # Visualize the mse through iterations
    x = [i for i in range(iters)]
    y = [mse_cost_list[i] for i in range(iters)]
    plt.figure(figsize = (10, 6))
    plt.plot(x, y)
    plt.title("mse_cost through number of iterations")
    plt.xlabel("Number of iterations")
    plt.ylabel("MSE COST")
    plt.grid(True, which = "both", linestyle = "--")
    plt.show()
    # Return the last (optimal) params in the param list
    model_params = param_list[-1]
    return  model_params
