## Instructions {-}

1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity. 

2. Write your code in the **Code cells** and your answers in the **Markdown cells** of the Jupyter notebook. Ensure that the solution is written neatly enough to for the graders to understand and follow.

3. Use [Quarto](https://quarto.org/docs/output-formats/html-basics.html) to render the **.ipynb** file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: `quarto render filename.ipynb --to html`. Submit the HTML file.

4. The assignment is worth 100 points, and is due on **Thursday, 11th April 2025 at 11:59 pm**. 

5. **Five points are properly formatting the assignment**. The breakdown is as follows:
    - Must be an HTML file rendered using Quarto **(2 points)**. *If you have a Quarto issue, you must mention the issue & quote the error you get when rendering using Quarto in the comments section of Canvas, and submit the ipynb file.* 
    - There aren’t excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.) **(1 point)**
    - Final answers to each question are written in the Markdown cells. **(1 point)**
    - There is no piece of unnecessary / redundant code, and no unnecessary / redundant text. **(1 point)**

## 1) Bias-Variance Trade-off for Regression **(50 points)**

The main goal of this question is to understand and visualize the bias-variance trade-off in a regression model by performing repetitive simulations.

The conceptual clarity about bias and variance will help with the main logic behind creating many models that will come up later in the course.

### a) Define the True Relationship (Signal)

First, you need to implement the underlying true relationship (Signal) you want to sample data from. Assume that the function is the [Bukin function](https://www.sfu.ca/~ssurjano/bukin6.html). Implement it as a user-defined function and run it with the test cases below to make sure it is implemented correctly. **(5 points)**

**Note:** It would be more useful to have only one input to the function. You can treat the input as an array of two elements.

In [None]:
print(Bukin(np.array([1,2]))) # The output should be 141.177
print(Bukin(np.array([6,-4]))) # The output should be 208.966
print(Bukin(np.array([0,1]))) # The output should be 100.1

### b) Generate Test Set (No Noise)

Generate a **noiseless** test set with **100 observations** sampled from the true underlying function. This test set will be used to evaluate **bias and variance**, so make sure it follows the correct data generation process. 

**(5 points)**

**Instructions:**

- **Do not use loops** for this question.
- `.apply` will be especially helpful (and often simpler).

**Data generation assumptions:**

- Use `np.random.seed(100)` for reproducibility.
- The first predictor, $x_1$, should be drawn from a **uniform distribution** over the interval $[-15, -5]$, i.e., $x_1 \sim U[-15, -5]$.
- The second predictor, $x_2$, should be drawn from a **uniform distribution** over the interval $[-3, 3]$, i.e., $x_2 \sim U[-3, 3]$.
- Compute the true function values using the underlying model as your response $y$


### c) Initialize Results DataFrame

Create an empty DataFrame with the following columns: **degree**, **bias_sq**, **var**, and **mse**.  
You will use this DataFrame to store the results of your bias-variance tradeoff analysis, which will be used for generating plots and interpreting model performance.

**(3 points)**

### d) Generate Training Sets (With Noise)

To estimate the **bias** and **variance** of a Linear Regression model fitted on noisy data from the underlying Bukin function, follow these steps:


**🔁 Generate 100 Training Sets:**

- Create **100 independent training datasets**, each containing **100 observations** (same size as the test set).
- For each training set:
  - Use `np.random.seed(i)` to ensure reproducibility, where `i` is the dataset index (0 to 99).
  - Draw predictors from the same distributions as in the test set.
  - Add **Gaussian noise** with mean 0 and **standard deviation 10**, i.e., $\mathcal{N}(0, 10)$, to the target values.



**📈 Train Polynomial Models (Degrees 1 to 7):**

- For each training set, train polynomial models of degrees **1 through 7**.
- Use polynomial feature transformations that include both:
  - **Higher-order terms** (e.g., $x_1^2$, $x_2^3$)
  - **Interaction terms** (e.g., $x_1 \cdot x_2$)
- Use the **same fixed, noiseless test set** to generate predictions from each model.



**📊 Estimate Bias², Variance, and MSE for Each Degree:**

- For each **test point**, you will obtain **100 predicted values** (one from each trained model).
- Use these predictions to compute the following quantities **per degree** (averaged over all test points):
  - **Bias squared**
  - **Variance**
  - **Mean Squared Error (MSE)**
- Store these values in your results DataFrame for plotting and analysis.

**(25 points)**


### 💡 Reminder: Computing MSE on the Noiseless Test Set

When computing the **MSE**, you should **not** include the irreducible error (noise variance). Here's why:

- The **irreducible error** ($\sigma^2 = 100$) comes from the noise **in the training data**.
- The **test set is noiseless**, so your predictions are evaluated against the **true function values**.
- Therefore:

$
  \text{MSE} = \text{Bias}^2 + \text{Variance}
 $

**✅ Conclusion:**  
The MSE you compute is the **expected prediction error** on noiseless test data.  
You should **only sum the estimated bias² and variance** — no additional noise term is needed.


### e) Visualize Bias-Variance Tradeoff

Using the results stored in your summary DataFrame, create a line plot showing how the following quantities change with the **degree** of the polynomial model:

- bias squared
- variance
- mse — computed as the sum of bias squared and variance

**(5 points)**


**Plot requirements:**

- The x-axis should represent the **polynomial degree**.
- The y-axis should represent the **error values**.
- Include a **legend** that clearly labels each line: bias², variance, and mse.  
  **(2 points)**


### f) Identify the Optimal Model

- What is the **optimal polynomial degree** based on the lowest mean squared error (mse)?  
  **(2 points)**

- Report the corresponding values of:  
  - **bias squared**  
  - **variance**  
  - **mean squared error (mse)**  
  for that optimal degree.  
  **(3 points)**


## 2) Building a Low-Bias-Low-Variance Model via Regularization (50 points)

The main goal of this question is to further reduce the **total error** by applying **regularization**. Specifically, you'll implement a **low-bias, low-variance** model for the underlying function and the noisy data.



### a) Why Regularization?

Explain why the model with the optimal degree (as identified in Question 1) is **not guaranteed** to be the true low-bias-low-variance model.  
Why might **regularization** still be necessary to improve generalization performance?

**(5 points)**


### b) Which Degrees to Exclude?

Before repeating the process from Question 1, carefully examine the plot and results from **1e** and **1f**. Identify which polynomial degrees should be **excluded** from regularization experiments because they are already too simple or too complex.

Explain which degrees you will exclude and **why**, considering how **regularization affects bias and variance**.

**(10 points)**



### c) Apply Ridge Regularization

Repeat the steps from **1c** and **1d**, but this time using **Ridge regression** instead of ordinary least squares.

- **Exclude** the degrees you identified in **2b**, as well as **degree 7** (to avoid extreme overfitting).
- Use **5-fold cross-validation** to tune the regularization hyperparameter.
- Use `neg_root_mean_squared_error` as the scoring metric for cross-validation.
- Try regularization strengths in the range `[1, 100]`.

**(10 points)**



### d) Visualize Regularized Results

Repeat the visualization from **1e**, but using the results obtained from **2c** (Ridge regression).

Plot:
- **bias squared**
- **variance**
- **mse**

against the polynomial **degree**, using regularized models.

**(10 points)**


### e) Evaluate the Regularized Model

- What is the **degree** of the optimal Ridge Regression model based on the lowest total error?  
  **(3 points)**

- What are the corresponding values of:  
  - **bias-squared**  
  - **variance**  
  - **mse**  
  for that optimal Ridge model?  
  **(3 points)**

- Compare these results to those of the optimal **Linear Regression** model from Question 1.  
  How did **regularization** affect the bias, variance, and total error?  
  **(4 points)**


### f) Interpreting the Impact of Regularization

- Was the **regularization successful** in reducing the **total error** of the regression model?  
  **(2 points)**

- Based on the results from **2e**, explain how **bias** and **variance** changed with regularization.  
  How did these changes contribute to the final total error?  
  **(3 points)**
