# Polynomial Regression: The Bias-Variance Tradeoff

## Main objective
> __Fit a polynomial regression model to the given data. Consider the order of the polynomial as a hyperparameter and find its best value by applying *grid search*.__  

The data file is named `example_data.csv` and it is found in the `./data/` folder. The response variable is denoted as `y` and the explanatory variable as `x`, respectively.

### Suggested workflow

* Load the relevant Python modules and libraries.
* Load the data set and inspect the data by plotting it.
* Split the data into a training and a test set.
* Define a reasonable model metric (e.g. root mean square error).
* Model building: Build 6 different polynomial regression models, with degrees of $k = 1,2,3,5,9,14$. 
* For each model calculate the model metric on the training set and on the validation.
* Plot the data together with the regression line, given by each particular model. 
* Finally report the best `k` with respect to the model metric evaluated on the test set.
***

#### Import libraries

In [1]:
%matplotlib inline
# your code here

#### Load the data set

In [2]:
# your code here

#### Plot the data set

In [3]:
# your code here

####  Train-Validation Split

In [4]:
# your code here

#### Model building

The Learning Algorithm

[Polynomial regression](https://en.wikipedia.org/wiki/Polynomial_regression) is a special type of linear regression in which the relationship between the predictor variable $x$ and the response variable $y$ is modeled by a k<sup>th</sup>-degree polynomial in $x$. The incorporation of k<sup>th</sup>-degree polynomials results in a nonlinear relation between $y$ and $x$, but between the parameters $(\beta_i)$ and the expected observations is linear. The model equation can be written as 

$$\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon$$

Finding the optimal parameter combination is done by minimizing the **sum of squared errors (SSE)**, given by the equation

$$SSE = \sum e^2 = \sum (\hat y - y)^2 $$

By fitting a polynomial to observations there arises the problem of choosing the order $k$ of the polynomial. How to choose the right number for the polynomial is a matter of an important concept called **model comparison** or [**model selection**](https://en.wikipedia.org/wiki/Model_selection). To keep it simple we use the [**root-mean-square error  (RMSE)**](https://en.wikipedia.org/wiki/Root-mean-square_deviation) defined by

$$RMSE = \sqrt{\frac{\sum_{i=1}^n (\hat y - y)^2}{n}}$$

to evaluate the goodness-of-fit of the model. 

The `scikit-learn` library provides many model metrics.


#### Model metric (e.g. root mean square error)

In [5]:
# your code here

#### Hyperparamter: Generate polynomial and interaction features (Feature engineering).


$$\text{e.g. 2nd order:} \qquad (x,y) \to (x,y,x^2, xy,y^2)$$ 



Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.

The `scikit-learn` library provides powerful functionality to create polynomial features.

#### Build 6 different polynomial regression models, with degrees of $k = 1,2,3,5,9,14$.

_Hint: Start building one model and then expand your approach_

In [6]:
# your code here

#### Calculate the model metric on the training set and on the validation set.

In [7]:
# your code here

#### Plot the data together with the regression line, given by each particular model. 

In [8]:
# your code here

#### Report the best `k` with respect to the model metric evaluated on the validation set.

In [9]:
# your code here