# régression polynomiale

les données $X, y$
- $X$, l'entrée, est de dimension $n$  
$n$ est le nombre de caractéristiques (i.e. le nombre de colonnes pas le nombre d'observations)
- $y$, la sortie, est de dimension $1$

à la place d'une régression linéaire
- $h^1_\theta(X) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n$ 
   
on va faire une régression polynomiale de degré $p$
- $h_\theta^p(X) = \theta_0 + \sum_{d=1}^p \sum_{i=1}^n\theta_{i,d} x^d_i$ 

l'algorithme utilisé ?
- à la place d'être exécuté sur $X$
- il sera exécuté sur $X$ augmenté des nouvelles features (nouvelles colonnes)  
par exemple $X$ auquel on ajoute les colonnes pour tous les degrés $\leq 4$ 

exemple
- si $X$ est constitué de 3 colonnes $x_1, x_2, x_3$
- pour une régression polynomiale de degré $3$, on rajoute à $X$ les colonnes $x^2_1, x^2_2, x^2_3, x^3_1, x^3_2, x^3_3$
 


# overfitting

   - we **train** a model with polynomial regressions of increasing dimension until **over-fitting**
   - we find that the $\theta $ become **very large** compared to data ($\theta$ takes a lot of importance)
   - we **use** a `regularization` in order to reduce the effect of the parameters

here is an image to quickly give an idea of *over*, *optimal* and *under* fitting

<img src='over-under-optimal-fitting.png'>

In [None]:
import numpy as np
import pandas as pd
# %matplotlib inline
import matplotlib.pyplot as plt

In [None]:
m = 30
X = np.linspace(0, np.pi*2, m)

radius = 6
y = np.sin(X)*radius

noise = 0.5 - np.random.normal(0, 1, m)
y_noise = y + noise

plt.plot(X, y, 'r-')
plt.plot(X, y_noise, 'b.')

we learn the sinus function from $(X, y_{noise})$ 

In [None]:
data = pd.DataFrame(columns=['X^1','y'])
# we built a pandas.DataFrame with two columns

data['X^1'] = X
data['y'] = y_noise

data.head()

We add columns to obtain a polynomial of degree `p`
   - we will train a polynomial regression

In [None]:
d = 2   # our degree
p = 150 # the greathest degree

we generate the columns and concatenate the columns in a data frame

In [None]:
l = []
for i in range(2, p+1):
    col = f'X^{i}'
    l.append(pd.DataFrame(np.power(data['X^1'].to_numpy(), i), columns=[col]))
    
data = pd.concat([data]+l, axis=1)

data.head(3)

we apply (*polynomial*) regression 

In [None]:
from sklearn.linear_model import LinearRegression

we will compute the polynomial regression fron degrees variant de $d=2$ à $p$ 
   1. [$X$, $X^2$]
   1. [$X$, $X^2$, $X^3$]   
   1. [$X$, $X^2$, $X^3$, $X^4$]
   1. ...
   
until the model over-fits

we compute the name of $X^1$ to $X^d$ columns

In [None]:
def column_names (d):
    return [f'X^{i}' for i in range (1, d+1)]

In [None]:
print(column_names(4))

we  now have to select the list of columns in the dataframe

In [None]:
data[column_names(4)].head()

the algorithm

   1. we **compute the regressions** for increasing values of the degree of the polynomial
   1. we plot the **measured** $y$ in red and the **predicted** $y$ in blue
   1. we compute the **quadratic error**
   1. we look at  the **intercept**, the **minimum** and the **maximum** coefficients

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
linreg = LinearRegression(fit_intercept=True)

In [None]:
degrees = [1, 2, 3, 4, 8, 10, 15, 30, 36, 38, 40, 45, 50, 51, 52, 53, 54, 55, 60, 65, 70, 100, 120, 150]

In [None]:
for d in degrees:    # the successive degrees
    # the features to train the model
    features = column_names(d)
    # we normalize
    std = StandardScaler()
    X_std = std.fit_transform(data[features])
    # we train the model
    linreg.fit(X_std, data['y'])
    # we predict
    y_pred = linreg.predict(X_std)
    # we compute the quadratic error
    e = np.sqrt(np.sum((y_pred - data['y'])**2))/m  

    # we plot the sinus
    plt.plot(data['X^1'], y_pred, 'b-')   # predict in blue
    plt.plot(data['X^1'], data['y'],'g.') # measured in green
    plt.title(f'degree {d}, error {e:.2f}')
    # we print the parameters
    print(f'     intercept {linreg.intercept_}')
    print(f'     coefficients min {min(linreg.coef_)} and max {max(linreg.coef_)}')
    plt.show();

   - when the degree of the polynomial increases
   - the model starts to overfit
   - the coefficients become very large compared to the data
   - the difference between the min and the max coefficients increases 
   - a big coefficient will give a lot of importance to the feature it corresponds to

# PolynomialFeatures

In [None]:
from sklearn.preprocessing import PolynomialFeatures

utiliser `PolynomialFeatures` pour faire la même chose en `sklearn`

END