Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# CSE204 - Introduction to Machine Learning - Lab Session 2: parametric models

<img src="https://raw.githubusercontent.com/adimajo/polytechnique-cse204-2019-releases/master/logo.jpg" style="float: left; width: 15%" />

[CSE204-2019](https://moodle.polytechnique.fr/course/view.php?id=7862) Lab session #02

Jérémie DECOCK - Adrien EHRHARDT

[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adimajo/polytechnique-cse204-2019-releases/blob/master/lab_session_02/lab_session_02.ipynb)

[![My Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/adimajo/polytechnique-cse204-2019-releases/master?filepath=lab_session_02%2Flab_session_02.ipynb)

[![Local](https://img.shields.io/badge/Local-Save%20As...-blue)](https://github.com/adimajo/polytechnique-cse204-2019-releases/raw/master/lab_session_02/lab_session_02.ipynb)

## Objectives

- Introduction to parametric models
- Implement a linear regressor
- Approximate the optimal parameters using a gradient descent algorithm
- Linear regression with Scikit Learn
- Implement a polynomial regressor

## Imports and tool functions

In [None]:
colab_requirements = [
    "matplotlib>=3.1.2",
    "pandas>=0.25.3",
    "numpy>=1.18.1",
    "scikit-learn>=0.22.1",
    "nose>=1.3.7"
]
import sys, subprocess
def run_subprocess_command(cmd):
    # run the command
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    # print the output
    for line in process.stdout:
        print(line.decode().strip())
        
if "google.colab" in sys.modules:
    for i in colab_requirements:
        run_subprocess_command("pip install " + i)

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn
import sklearn.linear_model
import sklearn.pipeline
import sklearn.preprocessing

In [None]:
def gen_1d_linear_regression_samples(n_samples = 20):
    x = np.random.uniform(low=-10., high=10., size=n_samples)
    y = 2. * x + 3. + np.random.normal(scale=2., size=x.shape)
    df = pd.DataFrame(np.array([x, y]).T, columns=['x', 'y'])
    df = sklearn.utils.shuffle(df).reset_index(drop=True)
    return df

In [None]:
def gen_1d_polynomial_regression_samples(n_samples = 15):
    x = np.random.uniform(low=0., high=10., size=n_samples)
    y = 3. - 2. * x + x ** 2 - x ** 3 + np.random.normal(scale=10., size=x.shape)
    df = pd.DataFrame(np.array([x, y]).T, columns=['x', 'y'])
    df = sklearn.utils.shuffle(df).reset_index(drop=True)
    return df

In [None]:
def plot_1d_regression_samples(dataframe, model=None):
    fig, ax = plt.subplots(figsize=(8, 8))
    
    df = dataframe  # make an alias
    
    ERROR_MSG1 = "The `dataframe` parameter should be a Pandas DataFrame having the following columns: ['x', 'y']"
    assert df.columns.values.tolist() == ['x', 'y'], ERROR_MSG1
    
    if model is not None:
        
        # Compute the model's prediction
        
        x_pred = np.linspace(df.x.min(), df.x.max(), 100).reshape(-1, 1)
        y_pred = model.predict(x_pred)
        
        df_pred = pd.DataFrame(np.array([x_pred.flatten(), y_pred.flatten()]).T, columns=['x', 'y'])
        
        df_pred.plot(x='x', y='y', style='r--', ax=ax)

    # Plot also the training points
    
    df.plot.scatter(x='x', y='y', ax=ax)
    
    delta_y = df.y.max() - df.y.min()
    
    plt.ylim((df.y.min() - 0.15 * delta_y,
              df.y.max() + 0.15 * delta_y))

In [None]:
def plot_ex2(X, y, theta_0=None, theta_1=None):
    df = pd.DataFrame(np.array([X, y]).T, columns=['x', 'y'])

    ax = df.plot.scatter(x="x", y="y")

    if theta_0 is not None and theta_1 is not None:
        x = np.array([1, 9])
        y = theta_0 + theta_1 * x

        ax.plot(x, y, "--r")

In [None]:
def plot_ex4(X, y, theta_1=None, theta_2=None):
    df = pd.DataFrame(np.array([X, y]).T, columns=['x', 'y'])

    ax = df.plot.scatter(x="x", y="y")

    if theta_1 is not None and theta_2 is not None:
        x = np.linspace(0, 6, 50)
        y = theta_1 * x + theta_2 * x**2

        ax.plot(x, y, "--r")

In [None]:
import matplotlib.colors as colors
from mpl_toolkits.mplot3d import axes3d

def plot_contour_2d_solution_space(func,
                                   fig=None,
                                   ax=None,
                                   show=True,
                                   theta_min=-np.ones(2),
                                   theta_max=np.ones(2),
                                   theta_star=None,
                                   theta_visited=None,
                                   title=""):
    """Plot points visited during the execution of an optimization algorithm."""
    if (fig is None) or (ax is None):
        fig, ax = plt.subplots(figsize=(12, 8))

    if theta_visited is not None:
        theta_min = np.amin(np.hstack([theta_min.reshape([-1, 1]), theta_visited]), axis=1)
        theta_max = np.amax(np.hstack([theta_max.reshape([-1, 1]), theta_visited]), axis=1)

    x1_space = np.linspace(theta_min[0], theta_max[0], 200)
    x2_space = np.linspace(theta_min[1], theta_max[1], 200)

    x1_mesh, x2_mesh = np.meshgrid(x1_space, x2_space)

    zz = func(np.array([x1_mesh.ravel(), x2_mesh.ravel()])).reshape(x1_mesh.shape)

    ############################

    if theta_star is not None:
        min_value = func(theta_star)
    else:
        min_value = zz.min()
        
    max_value = zz.max()

    levels = np.logspace(0.1, 3., 5)

    im = ax.pcolormesh(x1_mesh, x2_mesh, zz,
                       vmin=0.1,
                       vmax=max_value,
                       norm=colors.LogNorm(),
                       shading='gouraud',
                       cmap='gnuplot2')

    plt.colorbar(im, ax=ax)

    cs = plt.contour(x1_mesh, x2_mesh, zz, levels,
                     linewidths=(2, 2, 2, 2, 3),
                     linestyles=('dotted', '-.', 'dashed', 'solid', 'solid'),
                     alpha=0.5,
                     colors='white')
    ax.clabel(cs, inline=False, fontsize=12)

    ############################

    if theta_visited is not None:
        ax.plot(theta_visited[0],
                theta_visited[1],
                '-og',
                alpha=0.5,
                label="$visited$")

    ############################

    if theta_star is not None:
        sc = ax.scatter(theta_star[0],
                   theta_star[1],
                   c='red',
                   label=r"$\theta^*$")
        sc.set_zorder(10)        # put this point above every thing else

    ############################

    ax.set_title(title, fontsize=16)

    ax.set_xlabel(r"$\theta_0$", fontsize=16)
    ax.set_ylabel(r"$\theta_1$", fontsize=16)

    ax.legend(fontsize=16)

    if show:
        plt.show()

    return fig, ax

In [None]:
def plot_2d_solution_space(func,
                           fig=None,
                           ax=None,
                           show=True,
                           theta_min=-np.ones(2),
                           theta_max=np.ones(2),
                           theta_star=None,
                           theta_visited=None,
                           angle_view=None,
                           title=""):
    """Plot points visited during the execution of an optimization algorithm."""
    if fig is None or ax is None:
        fig = plt.figure(figsize=(12, 8))
        ax = axes3d.Axes3D(fig)

    if angle_view is not None:
        ax.view_init(angle_view[0], angle_view[1])

    x1_space = np.linspace(theta_min[0], theta_max[0], 100)
    x2_space = np.linspace(theta_min[1], theta_max[1], 100)

    x1_mesh, x2_mesh = np.meshgrid(x1_space, x2_space)

    zz = func(np.array([x1_mesh.ravel(), x2_mesh.ravel()])).reshape(x1_mesh.shape)

    ############################

    surf = ax.plot_surface(x1_mesh,
                           x2_mesh,
                           zz,
                           cmap='gnuplot2',
                           norm=colors.LogNorm(),
                           rstride=1,
                           cstride=1,
                           shade=False)

    ax.set_zlabel(r"$E(\theta)$")

    fig.colorbar(surf, shrink=0.5, aspect=5)

    ############################

    if theta_star is not None:
        ax.scatter(theta_star[0],
                   theta_star[1],
                   func(theta_star),
                   c='red',
                   alpha=1,
                   label=r"$\theta^*$")

    ax.set_title(title, fontsize=16)
    ax.set_xlabel(r"$\theta_0$", fontsize=16)
    ax.set_ylabel(r"$\theta_1$", fontsize=16)

    if show:
        plt.show()

    return fig, ax

## Introduction

Today you will learn to solve regression problems using **parametric models** (the application of parametric models to classification problems will be the subject of another session): you will use a parametric function $f_{\boldsymbol{\theta}}: \boldsymbol{x} \mapsto y$ to infer the link existing between input vectors $\boldsymbol{x} \in \mathbb{R}^p$ and output values $y \in \mathbb{R}$ in a *learning set* $\mathcal{D} = \{(y^{(i)}, \boldsymbol{x^{(i)}})\}_{1 \leq i \leq n}$ of $n$ examples.

The *hypothesis space* $\mathcal{H}$ of $f_{\boldsymbol{\theta}}$ is a priori chosen so that the model fits reasonably well the data in $\mathcal{D}$. For instance, $\mathcal{H}$ can be the space of linear functions if data seems to be distributed along a line in $\mathcal{D}$. The space of polynomial function of degree $d>1$ may be a good choice otherwise.

The parameter $\boldsymbol{\theta}^* = \begin{pmatrix} \theta_0^* & \dots & \theta_p^* \end{pmatrix}^T$ is then searched to obtain the best fit between $f_{\boldsymbol{\theta}}$ and $\mathcal{D}$. This is an optimization problem.

For instance, assuming you have chosen the space of linear functions to make a model that describes data you have in $\mathcal{D}$. Your model is then $y = \theta_0 + \theta_1 x$ and the regression problem consists in finding the best parameters (or estimators) $\theta_0$ and $\theta_1$ for it.

**Note**: there are some differences in notations with the lecture slides: parameters are noted $w$ (for "weights": machine learning community) in lectures but they are noted $\theta$ (for parameters: statistics community) here.

## Linear regression: an analytic definition of the optimal parameters

We have a *learning set* $\mathcal{D} = \{(y^{(i)}, \boldsymbol{x^{(i)}})\}_{1 \leq i \leq n}$.

We assume:
- Errors (difference between actual labels $y$ and predicted labels $f_{\theta}(\boldsymbol{x})$) are gaussian random values centered on 0: $y = f_{\boldsymbol{\theta}}(\boldsymbol{x}) + \epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma^2)$.
- Data is modeled with a linear function: $f_{\boldsymbol{\theta}}(\boldsymbol{x}) = \theta_0 + \sum_{j=1}^p \theta_j \boldsymbol{x}_j$.


- Observations $\boldsymbol{x} \in \mathbb{R}^p$ can be defined as $p$ random values $X_1, X_2, \dots, X_p$
- Labels $y$ are then realization of a random value $Y$ so that:

$$Y \sim \mathcal{N}(\underbrace{f(\boldsymbol{x} | \boldsymbol{\theta})}_{\mu}, \sigma^2)$$

We want to find the estimator $\boldsymbol{\theta}^* = \begin{pmatrix} \theta_0^* & \dots & \theta_p^* \end{pmatrix}^T$ that gives the best fit between $f_{\boldsymbol{\theta}}$ and $\mathcal{D}$ (optimization problem).

Finding the best $\boldsymbol{\theta}^*$ is a maximum likelihood problem : $\boldsymbol{\theta}^* \leftarrow \arg\max_{\boldsymbol{\theta}} \mathbb{P}(\mathcal{D}|\boldsymbol{\theta})$.
Here, this is equivalent to apply the method of *least squares* or to minimize the Mean Square Error (MSE).
Using the matrix notation, we define the linear regression problem as:

$$\boldsymbol{\theta}^* \leftarrow \arg\min_{\boldsymbol{\theta}} E(\boldsymbol{\theta}) \quad \text{with} \quad E(\boldsymbol{\theta}) = \frac12 (\boldsymbol{y} - \boldsymbol{X} \boldsymbol{\theta})^2$$

and with

$$
\boldsymbol{X} = \begin{pmatrix} 1 & x_1^{(1)} & \dots & x_p^{(1)} \\ \vdots & \vdots & \dots & \vdots \\ 1 & x_1^{(n)} & \dots & x_p^{(n)} \end{pmatrix}
\quad \quad
\boldsymbol{y} = \begin{pmatrix} y^{(1)} \\ \vdots \\ y^{(n)} \end{pmatrix}
\quad \quad
\boldsymbol{\theta} = \begin{pmatrix} \theta_0 \\ \vdots \\ \theta_p \end{pmatrix}
$$

$E(\boldsymbol{\theta})$ is a quadratic form (convex function) thus it has a unique global minimum $\boldsymbol{\theta^*}$ where $\nabla_{\boldsymbol{\theta^*}} E(\boldsymbol{\theta^*}) = \boldsymbol{0}$

### Exercise 1

On a sheet of paper:
- Compute the analytic formulation of the gradient $\nabla_{\boldsymbol{\theta}} E(\boldsymbol{\theta})$ of the Mean Square Error $E(\boldsymbol{\theta}) = \frac12 (\boldsymbol{y} - \boldsymbol{X} \boldsymbol{\theta})^2$
- Compute the analytic formulation of the optimal parameter $\boldsymbol{\theta^*}$

YOUR ANSWER HERE

## Example

### Exercise 2

#### Question 1

Use the previous equations to compute **by hand** (i.e. on a sheet of paper) the optimal parameters $\theta_0$ and $\theta_1$ of the model $y = \theta_0 + \theta_1 x$ to best fit the following dataset (of four observations):

$$\mathcal{D} = \left\{
\begin{pmatrix} 2 \\ 1 \end{pmatrix},
\begin{pmatrix} 5 \\ 2 \end{pmatrix},
\begin{pmatrix} 7 \\ 3 \end{pmatrix},
\begin{pmatrix} 8 \\ 3 \end{pmatrix}
\right\}$$

In [None]:
X = [2, 5, 7, 8]
y = [1, 2, 3, 3]

plot_ex2(X, y)

YOUR ANSWER HERE

#### Question 2

Check graphically that the model you obtained fits well with the data using the following cell (uncomment and complete the first two lines and uncomment the last one).

In [None]:
X = [2, 5, 7, 8]
y = [1, 2, 3, 3]

#theta_0 =                          # <- TO UNCOMMENT AND TO COMPLETE (intercept)
#theta_1 =                          # <- TO UNCOMMENT AND TO COMPLETE (slope)

#plot_ex2(X, y, theta_0, theta_1)   # <- TO UNCOMMENT

# YOUR CODE HERE
raise NotImplementedError()

#### Question 3

Plot the MSE $E(\theta)$ with the following cells.
What is plotted ? What is the input space and the output space ?

What can you say about these plots ?

In [None]:
X = np.array([[1, 1, 1, 1], [2, 5, 7, 8]]).T
y = np.array([1, 2, 3, 3]).reshape(-1, 1)

In [None]:
class MSE:
    def __init__(self, X, y):
        self.X = np.copy(X)
        self.y = np.copy(y)
        
    def __call__(self, theta):
        return 1./2. * ((np.tile(self.y, theta.shape[1]) - np.dot(self.X, theta))**2).sum(axis=0)
    
mse = MSE(X, y)

In [None]:
plot_contour_2d_solution_space(mse,
                               theta_min=np.array([-5, -1]),
                               theta_max=np.array([5, 1]),
                               theta_star=np.array([[theta_0], [theta_1]]));

In [None]:
plot_2d_solution_space(mse,
                       theta_min=np.array([-5, -1]),
                       theta_max=np.array([5, 1]));

YOUR ANSWER HERE

## Linear regression: an approximated solution using a *gradient descent* method

When $(X^TX)^{-1}$ cannot be easily computed (e.g. no analytical solution or $\mathcal{D}$ contains a lot of examples or the dimension of the solution space $\mathcal{X}$ is too large), an approximated solution can be computed using a *gradient descent method*.

$\nabla_{\theta}E(\hat{\theta})$ gives the direction of the largest slope at the point $\hat{\theta}$.
Thus, if we explore iteratively the parameter's space by following the opposite direction of this gradient as described in the following definition, we should converge to the parameter $\theta^*$ that minimize the MSE i.e. the parameter $\theta^*$ such that $\nabla_{\theta^*}E(\hat{\theta^*}) = 0$.

Starting from a random point $\theta$, the gradient descent method proposes a new point 
$\theta \leftarrow \theta - \eta \nabla_{\theta}E(\theta)$ at each iteration until a stopping criterion has been reached: e.g. $||\nabla_{\theta}E(\theta)||_2^2 > \epsilon_{\delta}$ with $\epsilon_{\delta}$ a chosen minimal length for the gradient to continue iterations.

The *learning rate* $\eta \in \mathbb{R}_+^*$ is a parameter to tweak for the considered problem.
- If $\eta$ is too large, the optimization may not converge toward 0.
- If $\eta$ is too small, the optimization may require a lot of iterations to converge.

### Exercise 3

#### Question 1

Implement a gradient descent method to solve exercise 2 with an approximated solution.
Use the analytic formulation of $\nabla_{\theta}E(\theta)$ that has been computed in exercise 1.

You can use a very basic stopping criteria: the number of iterations (e.g. 10000).
You can start with $\eta = 0.001$.

YOUR ANSWER HERE

In [None]:
def gradient_descent(gradient, eta=0.001, max_iteration=10000, initial_theta=None):

    if initial_theta is None:
        # The initial solution is selected randomly
        # YOUR CODE HERE
        raise NotImplementedError()
    else:
        # YOUR CODE HERE
        raise NotImplementedError()

    grad_list = []      # Keep the gradient of all iterations
    theta_list = []     # Keep the solution of all iterations

    for i in range(max_iteration):
        # Perform the gradient descent here
        # YOUR CODE HERE
        raise NotImplementedError()

    return grad_list, theta_list

# YOUR CODE HERE
raise NotImplementedError()

#### Question 2

Print or plot the value of $\theta$ and $E(\theta)$ obtained at each iteration.
Check that $E(\theta)$ converges near to 0 and that $\theta$ converges near to the solution obtained in exercise 2.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 3

Print or plot the norm of the gradient. How do you interpret it ?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 4

Restart the optimization using a different *learning rate* $\eta$. What do you observe ?

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Linear regression with Scikit Learn

Let's play with the Scikit Learn implementation of linear regression.
The official documentation is there: https://scikit-learn.org/stable/modules/linear_model.html

Use the `gen_1d_linear_regression_samples()` function (defined above) to generate a dataset and `plot_1d_regression_samples()` to plot it.

In [None]:
df = gen_1d_linear_regression_samples()

plot_1d_regression_samples(df)

Once the dataset is ready, let's make the regressor and train it with the following code:

In [None]:
model = sklearn.linear_model.LinearRegression()

model.fit(df[['x']], df[['y']])

The following cell plots the learned model (the red dashed line) and the dataset $\mathcal{D}$ (blue points).

In [None]:
plot_1d_regression_samples(df, model=model)

### Exercise 4

#### Question 1

What are the optimal parameters $\theta_1$ (intercept) and $\theta_2$ obtained ?
(use `model.coef_` and `model.intercept_` attributes)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 2

Write the mathematical definition of your model.

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 3

Use the `model.predict()` function to guess the class of the following points:

$$x_{p1} = -2, \quad x_{p2} = 2, \quad x_{p3} = 6$$

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Polynomial regression

It is a common practice to use linear models trained on nonlinear functions of the data in machine learning. This approach maintains the generally fast performance of linear methods, while allowing them to fit a much wider range of data.

For instance, a linear model can be extended by making polynomial features from the coefficients. Linear model in exercises 1 and 2 looks like this (one-dimensional data):

$$f_{\theta}(x) = \theta_0 + \theta_1 x$$

If we want to fit a quadratic curve to the data instead of a line, we can combine the features in second-order polynomials, so that the model looks like this:

$$f_{\theta}(x) = \theta_0 + \theta_1 x + \theta_2 x^2$$

This is still a linear model: to illustrate this, imagine creating a new variable

$$z = [x, x^2]$$

With this re-labeling of the data, our problem can be written

$$f_{\theta}(x) = \theta_0 + \theta_1 z_1 + \theta_2 z_2$$

The resulting polynomial regression is in the same class of linear models we'd considered above (i.e. the model is linear in $\theta$) and can be solved by the same techniques. Thus the linear model has the flexibility to fit a much broader range of data.

### Exercise 5

#### Question 1

Use the previous equations to compute **by hand** (i.e. on a sheet of paper) the optimal parameters $\theta_1$ and $\theta_2$ of the model $y = \theta_1 x + \theta_1 x^2$ to best fit the following dataset (of four examples):

$$\mathcal{D} = \left\{
\begin{pmatrix} 1 \\ 1.8 \end{pmatrix},
\begin{pmatrix} 2 \\ 2.7 \end{pmatrix},
\begin{pmatrix} 3 \\ 3.4 \end{pmatrix},
\begin{pmatrix} 4 \\ 3.8 \end{pmatrix},
\begin{pmatrix} 5 \\ 3.9 \end{pmatrix}
\right\}$$

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
X = [1, 2, 3, 4, 5]
y = [1.8, 2.7, 3.4, 3.8, 3.9]

plot_ex4(X, y)

#### Question 2

Check graphically that the model you obtained fits well with the data using the following cell (complete the first two lines).

In [None]:
X = [1, 2, 3, 4, 5]
y = [1.8, 2.7, 3.4, 3.8, 3.9]

#theta_1 =                          # <- TO UNCOMMENT AND TO COMPLETE
#theta_2 =                          # <- TO UNCOMMENT AND TO COMPLETE
# YOUR CODE HERE
raise NotImplementedError()

plot_ex4(X, y, theta_1, theta_2)

## Polynomial regression with Scikit Learn

Let's play with the Scikit Learn implementation of polynomial regression.
The official documentation is there: https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression-extending-linear-models-with-basis-functions

First we make the dataset, plot it, make the regressor and train it with the following code:

In [None]:
df = gen_1d_polynomial_regression_samples(n_samples=20)

plot_1d_regression_samples(df)

polynomial_features = sklearn.preprocessing.PolynomialFeatures(degree=3)  # In Q. 4, try with degree = 1, 4 and 15
linear_regression = sklearn.linear_model.LinearRegression(fit_intercept=False)

model = sklearn.pipeline.Pipeline([("polynomial_features", polynomial_features),
                                   ("linear_regression", linear_regression)])

model.fit(df[['x']], df[['y']])

In `sklearn.preprocessing.PolynomialFeatures()`, `degree` is the degree of the polynomal function.

The following cell plots the learned model (the red dashed line) and the dataset $\mathcal{D}$ (blue points).

In [None]:
plot_1d_regression_samples(df, model=model)

### Exercise 6

#### Question 1

What are the optimal parameters $\theta_0, \theta_1, \theta_2, \theta_3$ obtained ?
(use the `linear_regression.coef_[0][0]` attribute for the intercept and `linear_regression.coef_[0][1:]` for the others coefficients)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 2

Write the mathematical definition of your model.

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 3

Use the `model.predict()` function to guess the class of the following points:

$$x_{p1} = 1, \quad x_{p2} = 2, \quad x_{p3} = 6$$

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Question 4

In `sklearn.preprocessing.PolynomialFeatures()`, change the value of `degree` and describe what happen on the plot (use e.g. 1 and 15).
What is the name of the observed phenomenons ?

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## CO2 Emission Forecast (bonus)

In this exercise, you will forecast 5 years of future CO2 emission from power generation using natural gas.

This exercise use a dataset taken from https://www.kaggle.com/berhag/co2-emission-forecast-with-python-seasonal-arima.

This public dataset contain monthly carbon dioxide emissions from electricity generation. The dataset includes CO2 emissions starting January 1973 to July 2016.

In [None]:
URL = "https://raw.githubusercontent.com/adimajo/polytechnique-cse204-2019-releases/master/data/natural_gas_co2_emissions_for_electric_power_sector.csv"

df = pd.read_csv(URL,
                 parse_dates=[0]) #, index_col=0) #, squeeze=True)
df.head()

In [None]:
df.plot(x='date', y='co2_emissions', figsize=(15,10), title='Natural Gas Electric Power Sector CO2 Emissions');

### Exercise 7 (bonus)

Implement a model to make predictions on this dataset.
Use polynomial basis functions plus two sinusoids to handle the seasonality of this time series: $\sin(\frac{2 \pi}{12} x)$ and $\cos\left(\frac{2 \pi}{12} x \right)$. This signal contains a periodic component of 12 time steps (with one time step equals to one month).

We use both $\sin$ and $\cos$ to avoid unaligned phases with the time series. Eventually we could use only $\sin\left(\frac{2 \pi}{12} (x + \phi)\right)$ or $\cos\left(\frac{2 \pi}{12} (x + \phi)\right)$ as long as $\phi$ is properly set: $\phi = \pi / 2$ in the first case and $\phi = 0$ in the second one.

What are the limitations of this model?

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()