## Features
Sometimes the best attributes to input into a machine learning algorithm are not the variables measured in the data, but new variables derived from the variables measured in the data. Such derived variables are called *features*, and in this notebook we will introduce different types of features and explore the art and science of good *feature engineering*.

Let's begin by importing the required packages.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
%matplotlib inline

### Fit a *linear* model to *non-linear* data
Let's begin with a some 2 dimensional data, where we think there might be a non-linear relationship between atrribute $X$ and target $y$.

Run the following code.

In [None]:
df_nonlinear = pd.read_csv('Data/nonlinear.csv', sep=',', header=0)
df_nonlinear.plot.scatter(x='X', y='y');

Let's now try to fit a linear regression model. Using tools from practical 2, let's execute the following steps. 

In [None]:
# 1. prepare the data
y = df_nonlinear['y']
X = df_nonlinear['X'][:, np.newaxis]
X=(X-X.mean())/X.std()

# 2. fit a linear model to the data
lmod = LinearRegression()
lmod.fit(X, y);

# 3. make predictions on the data
y_pred = lmod.predict(X)

# 4. visualise the results.
results = pd.DataFrame({'X':X[:,0],'y':y,'y_pred':y_pred})
ax1 = results.plot.scatter(x='X', y='y');
ax2 = results.plot.scatter(x='X', y='y_pred', ax=ax1, c='k');

Let's also compute the Root Mean Squared Error (RMSE) on the data.

In [None]:
rmse = (np.sqrt(mean_squared_error(y, y_pred)))
print(f"Linear model: root mean squared error = {rmse}")

Do you think the linear model is a good fit?

The best way to answer this question is to try to fit *other* types of models and see if they do better or worse when compared to the linear model with respect to performance on the RMSE.

Before we do this, let's review some math.

### Review of linear and nonlinear models 
Given a one-dimensional $x$ and $y$, the relationship between $x$ and $y$ is *linear* if we can write the relationship as the equation:

$$ y = a x + b $$

where $a$ and $b$ represent the *gradient* and the *intercept* of the linear model.

Indeed, we can print out the gradient and intercecpt of the linear model, `lmod`, as follows. 

In [None]:
print(f"Linear model: gradient = {lmod.coef_[0]}")
print(f"Linear model: intercept = {lmod.intercept_}")

The relationship between $x$ and $y$ is *non-linear* simply if the relationship is not of the form $y = ax + b$. Some examples:
1. $y = ae^{bx}$
2. $y = a_{1} + a_{2}x + a_{3}x^{2} + a_{4}x^3$ + ...

The relationship between $x$ and $y$ is *exponential* if we can write the relationship as equation (1), and the relationship between $x$ and $y$ is *polynomial* if we can write the relationship as equation (2). Both are examples of one-dimensional *non-linear* models.

Let's try fitting a polynomial model to data.

### Fit a *non-linear* model to *non-linear* data
The first non-linear model we will explore will be the model described in (2). Looking at equation (2) we see that it is in fact a *linear model with polynomial features*. This we can use the same linear regression algorithm from before, but applied to a non-linear mapping of the original data. This is a common approach you will see all throughout machine learning.

Let's first compute and visualise some polynomial features up to degree $3$.

In [None]:
X_cubic = PolynomialFeatures(3).fit_transform(X)

feature_names = ['1', 'x', 'x^2', 'x^3']
df_X_cubic = pd.DataFrame(X_cubic, columns=feature_names)

_, axes = plt.subplots(1, 4, figsize=(10,2))
for i in range(len(feature_names)):
    df_X_cubic.plot.scatter(x='x', y=feature_names[i],ax=axes[i],xticks=[],yticks=[]);


Now let's fit the linear model from before, but to the above polynomial (cubic) features rather than the raw values of $X$.

In [None]:
lmod2 = LinearRegression()
lmod2.fit(X_cubic, y);

Now let's use the regression model to make predictions on the data and plot the results.

In [None]:
# make predictions on the data
y_pred2 = lmod2.predict(X_cubic)

# visualise the results
results = pd.DataFrame({'X':X[:,0],'y':y,'y_pred':y_pred2})
ax1 = results.plot.scatter(x='X', y='y');
ax2 = results.plot.scatter(x='X', y='y_pred', ax=ax1, c='k');

Do you think the cubic model looks a better fit than the linear model? 

To assess if the cubic model is a *better fit* to the data than the linear model, we can compare the root mean squared error (RMSE) of each model on the data. Let's compute the RMSE of the cubic model.

In [None]:
rmse = (np.sqrt(mean_squared_error(y, y_pred2)))
print(f"Cubic model: root mean squared error = {rmse}")

How does it compare to the RMSE of the linear model?