## Instructions {-}

1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity. 

2. Write your code in the **Code cells** and your answers in the **Markdown cells** of the Jupyter notebook. Ensure that the solution is written neatly enough to for the graders to understand and follow.

3. Use [Quarto](https://quarto.org/docs/output-formats/html-basics.html) to render the **.ipynb** file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: `quarto render filename.ipynb --to html`. Submit the HTML file.

4. The assignment is worth 100 points, and is due on **Thursday, 11th April 2025 at 11:59 pm**. 

5. **Five points are properly formatting the assignment**. The breakdown is as follows:
    - Must be an HTML file rendered using Quarto **(2 points)**. *If you have a Quarto issue, you must mention the issue & quote the error you get when rendering using Quarto in the comments section of Canvas, and submit the ipynb file.* 
    - There aren’t excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.) **(1 point)**
    - Final answers to each question are written in the Markdown cells. **(1 point)**
    - There is no piece of unnecessary / redundant code, and no unnecessary / redundant text. **(1 point)**

## 1) Bias-Variance Trade-off for Regression **(50 points)**

The main goal of this question is to understand and visualize the bias-variance trade-off in a regression model by performing repetitive simulations.

The conceptual clarity about bias and variance will help with the main logic behind creating many models that will come up later in the course.

### a) Define the True Relationship (Signal)

First, you need to implement the underlying true relationship (Signal) you want to sample data from. Assume that the function is the [Bukin function](https://www.sfu.ca/~ssurjano/bukin6.html). Implement it as a user-defined function and run it with the test cases below to make sure it is implemented correctly. **(5 points)**

**Note:** It would be more useful to have only one input to the function. You can treat the input as an array of two elements.

In [None]:
print(Bukin(np.array([1,2]))) # The output should be 141.177
print(Bukin(np.array([6,-4]))) # The output should be 208.966
print(Bukin(np.array([0,1]))) # The output should be 100.1

### b) Generate Test Set (No Noise)

Create a noiseless test set with 100 observations from the underlying function to isolate bias and variance. Remember how the test dataset is supposed to be sampled for bias-variance calculations. **No loops are allowed for this question - `.apply` should be very useful and actually simpler to use.** **(5 points)**

Assumptions:

- The first predictor, $x_1$, comes from a uniform distribution between -15 and -5. ($U[-15, -5]$)
- The second predictor, $x_2$, comes from a uniform distribution between -3 and 3. ($U[-3, 3]$)
- Use `np.random.seed(100)` for reproducibility.

### c)

Create an empty DataFrame with columns named **degree**, **bias_sq** and **var**. This will be useful to store the analysis results in this question. **(3 point)**

### d) Generate Training Set (With Noise)

Sample 100 training datasets to calculate the bias and variance of a Linear Regression model that predicts data coming from the underlying Bukin function. You need to repeat this process with polynomial transformations from degree 1 (which is the original predictors) to degree 7. For each degree, store the `degree`, `bias-squared` and `variance` values in the DataFrame. **(25 points)**

**Note:**

- For a linear regression model, `bias` refers to squared bias
- Assume that the noise in the population is a zero-mean Gaussian with a standard deviation of 10. ($N(0,10)$)
- Keep the training data size the same as the test data size.
- You need both the interactions and the higher-order transformations in your polynomial predictors.
- For $i^{th}$ training dataset, you can consider using `np.random.seed(i)` for reproducibility.

### e)

Using the results stored in the DataFrame, plot the (1) expected mean squared error, (2) expected squared bias, (3) expected variance, and (4) the expected sum of squared bias, variance and noise variance *(i.e., summation of 2, 3, and noise variance)*, against the `degree` of the predictors in the model. **(5 points)** 

Make sure you add a legend to label the three lineplots. **(2 point)**

### f)

* What is the `degree` of the optimal model? **(2 point)** 
* What are the squared bias, variance and mean squared error for that degree? **(3 points)**

## 2) Low-Bias-Low-Variance Model via Regularization (50 points)

The main goal of this question is to further reduce the total error by regularization - in other words, to implement the low-bias-low-variance model for the underlying function and the data coming from it.

### a) 

First of all, explain why it is not guaranteed for the optimal model (with the optimal `degree`) in Question 1 to be the low-bias-low-variance model. **(2 points)** Why would regularization be necessary to achieve that model? **(5 points)**

### b)

Before repeating the process in Question 1, you should see from the figure in 1e and the results in 1f that there is no point in trying some degrees again with regularization. Find out these degrees and explain why you should not use them for this question, **considering how regularization affects the bias and the variance of a model.** **(10 points)**

### c)

Repeat 1c and 1d with Ridge regularization. **Exclude the degrees you found in 2b and also degree 7**. Use 5-fold cross-validation (CV) to tune the model hyperparameter and use `neg_root_mean_squared_error` as the scoring metric. **(10 points)**

Consider hyperparamter values in the range \[1, 100\].

### d)

Repeat part 1e with Ridge regularization, using the results from 2c. **(10 points)**

### e) 

What is the degree of the optimal Ridge Regression model? **(3 point)** What are the bias-squared, variance and total error values for that degree? **(3 point)** How do they compare to the Linear Regression model results? **(4 points)**

### f)

Is the regularization successful in reducing the total error of the regression model? **(2 points)** Explain the results in 2e in terms of how bias and variance change with regularization. **(3 points)**