# Assignment 1 Extra Credit - Grid Search and ElasticNet

The following section is only for extra credit. It is here if you have extra time and want to look at something a bit more advanced that you could use in practice.


Recall from the syllabus
> There will sometimes be small extra credit opportunities as well, but these will not make a major impact in course grades. The extra credit can affect your grade by potentially pushing you up to the next grade point if you are very close (e.g. 3.0 to 3.1). They are meant to be fun extensions rather than required parts of the course. Our advice is to complete extra credit for your own learning or review, but it is unlikely to be an efficient use of your time if you are completing it solely to boost your grade.


Fill in the cells provided marked `TODO` with code to answer the questions. Answers should do the computation stated rather than writing in hard-coded values. So for example, if a problem asks you to compute the average age of people in a dataset, you should be writing Python code in this notebook to do the computation instead of plugging it into some calculator and saving the hard-coded answer in the variable. In other words, we should be able to run your code on a smaller/larger dataset and get correct answers for those datasets with your code.

It is generally a good idea to restart the kernel and run all cells (especially before turning it in) to make sure your code runs correctly. Answer the questions on Gradescope and make sure to download this file once you've finished the assignment and upload it to Canvas as well.

Note, you are not allowed to share any portions of this notebook outside of this class.

> Copyright ©2023 Emily Fox and Hunter Schafer.  All rights reserved.  Permission is hereby granted to students registered for University of Washington CSE/STAT 416 for use solely during Spring Quarter 2024 for purposes of the course.  No other use, copying, distribution, or modification is permitted without prior written consent. Copyrights for third-party components of this work must be honored.  Instructors interested in reusing these course materials should contact the author.

---

We first need to re-process the data so we walk you through steps similar to HW1 to load in and process the data. Note your solutions here will be very similar to your HW1 solutions. The big exception though is that we will not need a validation set since we will be using cross-validation for model selection (described later). This means your code will be almost the same as HW1 for these processing steps, except that there is no explicit validation set.

We start by importing most of the common libraries used.

In [2]:
# Conventionally people rename these common imports for brevity
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Magic command to make the plots appear in-line (it's actually called a "magic command")
%matplotlib inline

We then need to load in the data and compute the relevant features. Note, you still need to fill out part of this code like you did in this part on the main assignment.

In [3]:
from math import sqrt

# Load in data
sales = pd.read_csv('home_data.csv') 
# Selects 1% of the data
sales = sales.sample(frac=0.01, random_state=0) 


# All of the features of interest
selected_inputs = [
    'bedrooms', 
    'bathrooms',
    'sqft_living', 
    'sqft_lot', 
    'floors', 
    'waterfront', 
    'view', 
    'condition', 
    'grade',
    'sqft_above',
    'sqft_basement',
    'yr_built', 
    'yr_renovated'
]

# Compute the square and sqrt of each feature
all_features = []
for data_input in selected_inputs:
    square_feat = data_input + '_square'
    sqrt_feat = data_input + '_sqrt'
    
    # TODO compute the square of the column feature_name, add it to sales as a 
    # new column, squared_feature_name
    
    
    # TODO compute the sqrt of the column feature_name, add it to sales as a
    # new column, sqrt_feature_name

    all_features.extend([data_input, square_feat, sqrt_feat])
    
# Split the data into features and price
price = sales['price']
sales = sales[all_features]

sales.head()

Because we will be using cross-validation, we do not need to make a validation set when splitting up our data. Below is some pre-written code to to train test split, but you need to fill in the right value for `<NUM>` to make 80% of the data be for training and 20% for test.

In [4]:
# TODO Fill in the numbers to make datasets of the right size.
from sklearn.model_selection import train_test_split

train_sales, test_sales, train_price, test_price = \
    train_test_split(sales, price, test_size=<NUM>, random_state=6)

Next, you need to preprocess the data so that it is standardized. Use the same procedure for standaridization that you used in HW2 and save your results to `train_sales` and `test_sales`.

In [5]:
# TODO Standardize the data


# Grid Search and ElasticNet
As we discused in lecture, there are pros to using Ridge and pros to using LASSO. ElasticNet is a model that allows you to use both and tune how much importance you put to one vs the other. The quality metric for ElasticNet is: 

$$\hat{w}_{ElasticNet} = \min_w RSS(w) + \lambda_1 \left\lVert w \right\rVert_1 + \lambda_2 \left\lVert w \right\rVert_2^2$$

However, the `sklearn` implementation asks you to specify the paramters slightly differently. Instead of specifying a $\lambda_1$ and $\lambda_2$, they ask you to speciy `alpha` ($\alpha$) and `l1_ratio` ($\rho$) .Where $\alpha$ is the penalty strength and $\rho$ is the ratio of the penalty that goes to the L1 penalty vs the L2 penalty. $\rho$ should be a number between 0 and 1.

$$\hat{w}_{ElasticNet} = \min_w RSS(w) + \alpha*\rho \left\lVert w \right\rVert_1 + \alpha*(1-\rho) \left\lVert w \right\rVert_2^2$$



Grid Search is a process of tuning multiple hyper-parameters at the same time by using cross validation. It is essentially the same as what you did in the main assignment, but uses nested loops to try all possible pairs of settings and uses cross-validation instead of a validation set.

For this exercise, look at the documentation for [ElasticNet](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html) and [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV) to find the optimal settings of the hyper-parameters `alpha` and `l1_ratio`. 

*Some implemenation details*
* Use $k$-fold cross validation with $k=4$.
* Store your `GridSearchCV` object in a variable called `search`.
* Use $\alpha$ with values `np.logspace(2,5,4)` and $\rho$ (`l1_ratio`) with values `np.linspace(0,1,5)`.
* Save the result of the best hyperparamters in a variable called `best_params`. It should be a dictionary with keys `'alpha'` and `'l1_ratio'`.


In [6]:
### edTest(test_ec_grid_search_elastic_net) ###
