# Tutorial: Regularization, Model Selection and Evaluation

Tutorial to the class [Regularization, Model Selection and Evaluation](4_regularization_selection_evaluation.ipynb) based on the same case study as in [Tutorial: Supervised Learning Problem and Least Squares](2_tutorial_supervised_learning_problem_ols.ipynb).

<div class="alert alert-block alert-info">
    <b>Tutorial Objectives</b>
    
- Apply regularization methods: ridge and lasso
- Compute and plot validation curves
- Compare $k$-nearest neighbors to linear least squares (OLS)
</div>

## Getting ready

Let us follow the same procedure as in [Tutorial: Supervised Learning Problem and Least Squares](2_tutorial_supervised_learning_problem_ols.ipynb) to import the required modules and read the data.

In [2]:
# Path manipulation module
from pathlib import Path
# Numerical analysis module
import numpy as np
# Formatted numerical analysis module
import pandas as pd
# Plot module
import matplotlib.pyplot as plt

# Set data directory
data_dir = Path('data')

# Set keyword arguments for pd.read_csv
kwargs_read_csv = dict()

# Set first and last years
FIRST_YEAR = 2014
LAST_YEAR = 2019

# Define temperature filepath
temp_filename = 'surface_temperature_merra2_{}-{}.csv'.format(
    FIRST_YEAR, LAST_YEAR)
temp_filepath = Path(data_dir, temp_filename)

# Define electricity demand filepath
dem_filename = 'reseaux_energies_demand_demand.csv'
dem_filepath = Path(data_dir, dem_filename)

# Read hourly temperature and demand data averaged over each region
df_temp_hourly = pd.read_csv(temp_filepath, index_col=0, parse_dates=True, header=0)
df_dem_hourly = pd.read_csv(dem_filepath, index_col=0, header=0, parse_dates=True)

# Get daily-mean temperature and daily demand
df_temp = df_temp_hourly.resample('D').mean()
df_dem = df_dem_hourly.resample('D').sum()

# Select Île-de-France region
region_name = 'Île-de-France'
df_temp_idf = df_temp[region_name]
df_dem_idf = df_dem[region_name]

## Regularization

### Ridge regression

> ***Question***
> - Apply the ridge regression using `Ridge` from `sklearn.linear_model` for varying regularization parameter values.
> - Represent the resulting predictions above the scatter plot of the train data.

In [None]:
# Your answer


> ***Question***
> - Compute the corresponding validation curves. To do so:
>   - Compute and plot the train and test error (using cross-validation) for varying values of the regularization parameter.
> - What is the best value of the regularization parameter according to your estimations?

In [None]:
# Your answer


> ***Question***
> - Estimate the test error for the optimal value of the regularization parameter, making sure to test the choice of regularization parameter.
> - How does the ridge model performs compared to the linear models analyzed so far?

In [None]:
# Your answer


### Lasso regression

> ***Question***
> - Same questions as for the ridge but for the lasso (using `Lasso` from `sklearn.linear_model`).

In [None]:
# Your answer


## $K$-nearest neighbor model

> ***Question***
> - Apply the $k$-nearest neighbor model using `KNeighborsRegressor` from `sklearn.neighbors` for varying $k$.
> - Represent the resulting predictions above the scatter plot of the train data.

In [None]:
# Your answer


> ***Question***
> - Compute the corresponding validation curves. To do so:
>   - Compute and plot the train and test error (using cross-validation) for varying $k$.
> - What is the best value of $k$ according to your estimations?
> - How does the best $k$-nearest neighbor model performs compared to the linear models analyzed so far?

In [None]:
# Your answer


Answer:

***
## Credit

[//]: # "This notebook is part of [E4C Interdisciplinary Center - Education](https://gitlab.in2p3.fr/energy4climate/public/education)."
Contributors include Bruno Deremble and Alexis Tantet.
Several slides and images are taken from the very good [Scikit-learn course](https://inria.github.io/scikit-learn-mooc/).

<br>

<div style="display: flex; height: 70px">
    
<img alt="Logo LMD" src="images/logos/logo_lmd.jpg" style="display: inline-block"/>

<img alt="Logo IPSL" src="images/logos/logo_ipsl.png" style="display: inline-block"/>

<img alt="Logo E4C" src="images/logos/logo_e4c_final.png" style="display: inline-block"/>

<img alt="Logo EP" src="images/logos/logo_ep.png" style="display: inline-block"/>

<img alt="Logo SU" src="images/logos/logo_su.png" style="display: inline-block"/>

<img alt="Logo ENS" src="images/logos/logo_ens.jpg" style="display: inline-block"/>

<img alt="Logo CNRS" src="images/logos/logo_cnrs.png" style="display: inline-block"/>
    
</div>

<hr>

<div style="display: flex">
    <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0; margin-right: 10px" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
    <br>This work is licensed under a &nbsp; <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
</div>