## Instructions {-}

1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity. 

2. Write your code in the **Code cells** and your answers in the **Markdown cells** of the Jupyter notebook. Ensure that the solution is written neatly enough to for the graders to understand and follow.

3. Use [Quarto](https://quarto.org/docs/output-formats/html-basics.html) to render the **.ipynb** file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: `quarto render filename.ipynb --to html`. Submit the HTML file.

4. The assignment is worth 100 points, and is due on **Thursday, 11th April 2025 at 11:59 pm**. 

5. **Five points are properly formatting the assignment**. The breakdown is as follows:
    - Must be an HTML file rendered using Quarto **(2 points)**. *If you have a Quarto issue, you must mention the issue & quote the error you get when rendering using Quarto in the comments section of Canvas, and submit the ipynb file.* 
    - There aren‚Äôt excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren‚Äôt long printouts of which iteration a loop is on, there aren‚Äôt long sections of commented-out code, etc.) **(1 point)**
    - Final answers to each question are written in the Markdown cells. **(1 point)**
    - There is no piece of unnecessary / redundant code, and no unnecessary / redundant text. **(1 point)**

## 1) Bias-Variance Trade-off for Regression **(50 points)**

The main goal of this question is to understand and visualize the bias-variance trade-off in a regression model by performing repetitive simulations.

The conceptual clarity about bias and variance will help with the main logic behind creating many models that will come up later in the course.

### a) Define the True Relationship (Signal)

First, you need to implement the underlying true relationship (Signal) you want to sample data from. Assume that the function is the [Bukin function](https://www.sfu.ca/~ssurjano/bukin6.html). Implement it as a user-defined function and run it with the test cases below to make sure it is implemented correctly. **(5 points)**

**Note:** It would be more useful to have only one input to the function. You can treat the input as an array of two elements.

In [None]:
print(Bukin(np.array([1,2]))) # The output should be 141.177
print(Bukin(np.array([6,-4]))) # The output should be 208.966
print(Bukin(np.array([0,1]))) # The output should be 100.1

### b) Generate Test Set (No Noise)

Generate a **noiseless** test set with **100 observations** sampled from the true underlying function. This test set will be used to evaluate **bias and variance**, so make sure it follows the correct data generation process. 

**(5 points)**

**Instructions:**

- **Do not use loops** for this question.
- `.apply` will be especially helpful (and often simpler).

**Data generation assumptions:**

- Use `np.random.seed(100)` for reproducibility.
- The first predictor, $x_1$, should be drawn from a **uniform distribution** over the interval $[-15, -5]$, i.e., $x_1 \sim U[-15, -5]$.
- The second predictor, $x_2$, should be drawn from a **uniform distribution** over the interval $[-3, 3]$, i.e., $x_2 \sim U[-3, 3]$.
- Compute the true function values using the underlying model as your response $y$


### c) Initialize Results DataFrame

Create an empty DataFrame with the following columns:

- **degree**: the degree of the polynomial model  
- **bias_sq**: estimated squared bias (averaged over test points)  
- **var**: estimated variance of predictions  
- **bias_var_sum**: sum of bias squared and variance  
- **empirical_mse**: mean squared error calculated using sklearn‚Äôs `mean_squared_error()` on model predictions vs. true function values

This DataFrame will be used to store the results of your bias‚Äìvariance tradeoff analysis and for generating comparison plots.

**(3 points)**


### d) Generate Training Sets (With Noise)

To estimate the **bias**, **variance**, and **total error (MSE)** of a Linear Regression model trained on noisy data from the underlying Bukin function, follow the steps below.


**üîÅ Step 1: Generate 100 Training Sets**

- Create **100 independent training datasets**, each with **100 observations** (same size as the test set).
- For each training dataset:
  - Use `np.random.seed(i)` to ensure reproducibility, where `i` is the dataset index (0 to 99).
  - Sample predictors from the **same distributions** used to generate the test set.
  - Add **Gaussian noise** with mean 0 and standard deviation 10:  
    $\varepsilon \sim \mathcal{N}(0, 10)$


**üß† Step 2: Train Polynomial Models (Degrees 1 to 7)**

- For each training dataset, train polynomial models with degrees **1 through 7**.
- Use polynomial feature transformations that include both:
  - **Higher-order terms** (e.g., $x_1^2$, $x_2^3$)
  - **Interaction terms** (e.g., $x_1 \cdot x_2$)
- Make predictions on the **fixed, noiseless test set** for each trained model.


**üìä Step 3: Estimate Bias¬≤, Variance, and MSE**

- For each **degree**, and each **test point**, collect the 100 predicted values from the models trained on the different training sets.
- Using these predictions, compute:
  - **Bias squared**: squared difference between the mean prediction and the true value.
  - **Variance**: variance of the predictions.
  - **Theoretical MSE**: sum of bias squared and variance.
  - **Empirical MSE**: compute using `sklearn.metrics.mean_squared_error` between each model‚Äôs prediction and the true values, then average over the 100 training runs.

- Store all four quantities for each degree in your results DataFrame:
  - `degree`
  - `bias_sq`
  - `var`
  - `bias_var_sum` (bias squared + variance)
  - `empirical_mse`


**(25 points)**


üí° **Reminder: Comparing Theoretical vs. Empirical MSE**

When evaluating model performance on the **noiseless test set**:

- The **irreducible error** (i.e., noise in training data) does **not** affect the test targets.
- Therefore, the test error (MSE) can be decomposed as:

  $MSE$ = ${Bias}^2$ + ${Variance}$

- The **empirical MSE** (from sklearn) should closely match the **sum of bias¬≤ and variance**, since the test data contains no noise.

### e) Visualize Bias‚ÄìVariance Decomposition

Using the results stored in your DataFrame, create a plot with **four lines**, each plotted against the polynomial **degree**:

1. **Bias squared**
2. **Variance**
3. **Bias squared + Variance** (i.e., the theoretical decomposition of MSE)
4. **Empirical MSE** calculated using `sklearn.metrics.mean_squared_error()`  
   (computed from the predicted values vs. true function values on the noiseless test set)


**Plot requirements:**
- Use a single line plot with the polynomial degree on the x-axis and error values on the y-axis.
- Include a **legend** to clearly label each line.
- Use different line styles or markers for easy visual comparison.


**Goal:**
- Compare the **empirical MSE** to the **sum of bias squared and variance**.
- If everything is implemented correctly, the two lines should be very close (or even identical, up to numerical precision).


### f) Identify the Optimal Model

- What is the **optimal polynomial degree** based on the **lowest empirical MSE** (calculated using sklearn)?  
  **(2 points)**

- Report the corresponding values of:  
  - **Bias squared**  
  - **Variance**  
  - **Bias squared + Variance**  
  - **Empirical MSE**  
  for that degree.  
  **(3 points)**


## 2) Building a Low-Bias, Low-Variance Model via Regularization (50 points)

The main goal of this question is to further reduce the **total prediction error** by applying **regularization**.  
Specifically, you‚Äôll use **Ridge regression** to build a **low-bias, low-variance** model for data generated from the underlying Bukin function with noise.



### a) Why Regularization?

Explain why the model with the optimal polynomial degree (as identified in Question 1) is **not guaranteed** to be the true low-bias, low-variance model.

Why might **regularization** still be necessary to improve generalization performance, even after selecting the degree that minimizes MSE?

**(5 points)**



### b) Which Degrees to Exclude?

Based on your plot and results from **1e** and **1f**, identify which polynomial degrees should be **excluded** from regularization experiments because they are already too simple (high bias) or too complex (high variance).

Explain which degrees you will exclude and **why**, using your understanding of how **regularization affects bias and variance**.

**(10 points)**



### c) Apply Ridge Regularization

Repeat the steps from **1c** and **1d**, but this time use **Ridge regression** instead of ordinary least squares.

- Use only the degrees **not excluded** in 2b (and also exclude degree 7 to avoid extreme overfitting).
- Use **5-fold cross-validation** to tune the Ridge regularization strength.
- Use `neg_root_mean_squared_error` as the scoring metric for cross-validation.
- Tune over a range of regularization strengths (e.g., from 1 to 100).
- For each retained degree, compute:
  - **Bias squared**
  - **Variance**
  - **Bias squared + Variance**
  - **Empirical MSE** (from `sklearn.metrics.mean_squared_error`)

Store your results in a new DataFrame with the same structure as in Question 1.

**(10 points)**



### d) Visualize Regularized Results

Repeat the visualization from **1e**, but using the results from **2c** (Ridge regression).

Your plot should include **four lines** plotted against polynomial degree:

1. **Bias squared**
2. **Variance**
3. **Bias squared + Variance**
4. **Empirical MSE** (computed using sklearn)

Include a clear **legend** and label your axes.  
This will help you visually assess how regularization impacts bias, variance, and overall model error.

**(10 points)**


### e) Evaluate the Regularized Model

- What is the **optimal polynomial degree** for the Ridge Regression model, based on the **lowest empirical MSE**?  
  **(3 points)**

- Report the corresponding values of:  
  - **Bias squared**  
  - **Variance**  
  - **Empirical MSE**  
  for that optimal Ridge model.  
  **(3 points)**

- Compare these results to those of the optimal **Linear Regression** model from Question 1.  
  Discuss how **regularization** influenced the **bias**, **variance**, and **overall prediction error (MSE)**.  
  **(4 points)**


### f) Interpreting the Impact of Regularization

- Was **regularization successful** in reducing the **total prediction error (MSE)** compared to the unregularized model?  
  **(2 points)**

- Based on your results from **2e**, explain how **bias** and **variance** changed as a result of regularization.  
  How did these changes affect the final total error?  
  Support your explanation with values or observations from your analysis.  
  **(3 points)**

