## How to approximate a function with polynomials

* `PolynomialFeatures` generates all monomials up to degree. 
    * This gives us the so called Vandermonde matrix with `n_samples` rows and `degree + 1` columns
* `SplineTransformer` generates B-spline basis functions. 
    * A basis function of a B-spline is a piece-wise polynomial function of degree `d` that is non-zero only between `degree+1` consecutive knots. 
    * Given `n_knots` number of knots, this results in matrix of `n_samples` rows and `n_knots + degree - 1` columns:

These two transformers are well suited to model non-linear effects with a linear model, using a pipeline to add non-linear features. 

Kernel methods extend this idea and can induce very high (even infinite) dimensional feature spaces.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures, SplineTransformer
from sklearn.pipeline import make_pipeline

### Defining a function that we intend to approximate.

In [2]:
def f(x):
    return x * np.sin(x)

#### Data

In [7]:
data = np.linspace(start=-1, stop=11, num=100)
# we are only taking a subset to train on
train_data = np.linspace(start=0, stop=10, num=100)


####  Create 2D-array versions of these arrays to feed to transformers and target values

In [8]:
x_data =data[:, np.newaxis]
x_train_data = train_data[:, np.newaxis]


y_data = f(data)
y_train = f(train_data)

## Approximate using Polynomial Features


* Higher degree polynomials can fit the data better.

* But too high powers can show unwanted oscillatory behaviour and are particularly dangerous for extrapolation beyond the range of fitted data. 




In [60]:
poly_feat = PolynomialFeatures(degree=3)
x_transformed = poly_feat.fit_transform(x_train_data)
print('Shape of transformed data : ',x_transformed.shape)


for i, x in enumerate(x_transformed.T):
    plt.plot(x, label=f'x^{i}')
plt.legend();

Shape of transformed data :  (100, 4)


<img src='./plots/polynomial-features.png'>

In [52]:
def polynomial_approx(degree):
    poly_feat = PolynomialFeatures(degree=degree)
    ridge = Ridge()

    model = make_pipeline(poly_feat, ridge)

    # train using a subset of data
    model.fit(x_train_data, y_train)

    # predict for entire data
    return model.predict(x_data)

In [40]:
for deg in [3,4,5]:
    plt.plot(polynomial_approx(deg), label=f'polynomial-degree-{deg}')


plt.plot(y_data, label='y-true')
plt.legend()
plt.title('PolynomialFeatures is used to approximate the funtion')

<img src='./plots/polynomial-features-curve-fitting.png'>

In [41]:
fig, ax = plt.subplots(nrows=2, ncols=3)
ax = ax.ravel()

ax[0].plot(y_data, c='seagreen')
ax[0].set(title="True function")
for deg, frame in zip([3,4,5,6,7], ax[1:]):
    frame.plot(polynomial_approx(deg), c='salmon')
    frame.set(title=f'polynomial-degree-{deg}')
    frame.plot(y_data, label='y-true', c='seagreen')




fig.suptitle('PolynomialFeatures is used to approximate the funtion')
plt.tight_layout()

<img src='./plots/polynomial-features-curve-fitting-subplots.png'>

## B-splines : SplineTransform

* The advantage of B-splines is that they usually fit the data as well as polynomials 
* They show very nice and smooth behaviour. 
* They have also good options to control the extrapolation, which defaults to `continue` with a constant. 
* Note that most often, you would rather increase the number of `knots` but keep `degree=3`.

In [67]:
spline = SplineTransformer(degree=3, n_knots=4)
x_transformed = spline.fit_transform(x_train_data)
print('Shape of transformed data : ',x_transformed.shape)

for i, x in enumerate(x_transformed.T):
    plt.plot(x, label=f'spline-{i}')


plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0);

Shape of transformed data :  (100, 6)


<img src='./plots/bsplines.png'>

In [45]:
spline = SplineTransformer(degree=3, n_knots=4)
model = make_pipeline(spline, Ridge(alpha=1e-4))

# train only on subset of data
model.fit(x_train_data, y_train)

# predict for entire data
y_pred_spline = model.predict(x_data)


In [51]:
plt.plot(y_data, c='seagreen', label='True function')
plt.plot(y_pred_spline, color='salmon', label='Spline approx')
plt.legend();

<img src='./plots/bspline-features-curve-fitting.png'>

## Periodic Splines

* Seasonal effects can be modelled using periodic splines, which have equal function value and equal derivatives at the first and last knot.

* The splines period is the distance between the first and last knot (which we specify manually, if known)

* Periodic splines provide a better fit both within and outside of the range of training data given the additional information of periodicity. 

* Periodic splines can also be useful for naturally periodic features (such as day of the year), as the smoothness at the boundary knots prevents a jump in the transformed values (e.g. from Dec 31st to Jan 1st). 

* For naturally periodic features or more generally features where the period is known, it is advised to explicitly pass this information to the SplineTransformer by setting the knots manually.


#### Function to be approximated by periodic spline interpolation.

In [69]:
def g(x):
    return np.sin(x) - 0.7 * np.cos(3*x)

#### Data, Features and target

In [71]:
# DATA
data = np.linspace(-1, 21, 200)
y_data = g(data)
# train data
data_train = np.linspace(0,10,100)
y_train = g(data_train)

# features for training
x_data = data[:, np.newaxis]
x_train_data = data_train[:, np.newaxis]

#### Periodic spline

In [116]:
n_knots = 10
knots = np.linspace(0, 2*np.pi, 10)[:,np.newaxis]

spline = SplineTransformer(n_knots=n_knots, knots=knots, extrapolation='periodic')

model = make_pipeline(spline, Ridge(alpha=1e-3))
# train on train-data
model.fit(x_train_data, y_train)
# predict on all data
y_preds = model.predict(x_data)

In [110]:
plt.plot(x_data, y_data, c='r', label='Data')
plt.scatter(x_train_data, y_train, c='k', s=6, label='Train-Data')
plt.plot(x_data, y_preds, c='b', linewidth=5, alpha=0.4, label='prediction')

plt.legend();

<img src='./plots/periodic-splines-interpolation-and-extrapolation.png'>