# Polynomial Regression

Regression is the method of fitting a function to a collection of observations in order to identify and trace patterns and thereby predict. Depending on how well the function traces/fits the observations, the accuracy of predictions would vary.

In the previous section we studied about linear regression, where we fit a linear function to data. We assume that the relationship between predictors (Xs) and  predicted variable(s) (Ys) can be represented by a simple 1st degree expression. In this lesson, we will learn about 

* Polynomial regression
* The theoretical significance of degree of a polynomial function
* Implementation of polynomial regression model

## What is a polynomial regression model?

While learning linear regression we tried to fit a straight line to the observations. One of the biggest assumptions we make while performing a linear regression is to assume that the relationship between predicted and predictor variables is linear in nature.

Lets take a scenario where we are, say conducting an experiment with highly sophisticated gastric and neural sensors fitted to human subjects. The aim to see how hungry a person feels and how much of that hunger is satisfied by each unit of food. Lets say the unit of food considered for this experiment is a slice of pepperoni pizza and there are a hundred voluntary participants. The sensors fitted to the participants measure hunger in the form of hunger index and there is another feature which measures the number of slices of pizza consumed.

We know that hunger goes down as we eat food. If we try to model the hunger index in terms of number of slices consumed, using a simple linear function, that would mean we are assuming that the more we eat, the less hungry we would feel and that <b>each unit of food consumed makes us equally less hungry</b>. 

In such a case, if 
* $H$ is hunger index
* $F$ being number of units of food consumed
* $\beta_0$ be the measure of how hungry we feel (on an average) when we have not yet consumed any food
* $\beta_1$ be the coefficient determining how much the hunger index goes down on (again, on an average) consumption of each unit of food
* $\epsilon$ be the error in the observations

then the relationship between hunger index and food consumed can be given by,

$H = \beta_0 + (\beta_1F) + \epsilon$

In general terms, this refers to the simple linear regression expression of,

$y = \beta_0 + (\beta_1x) + \epsilon$

This is called "bias" as we would be studying in detail in the forthcoming notebooks.

However, in reality, experience and intuition suggests that once have consumed enough food so as to fill our stomach, each extra slice of pizza's marginal contribution to quell hunger goes down significantly. This is similar to the concept of 'marginal utility' in economics, which says that with increasing units of consumption, the amount of untility derived keeps falling significantly. This would mean that the relationship between hunger index and food consumed may in fact be non-linear in nature. In such case, the dependency of hunger index $H$ on units of food consumed $F$, could be better modeled using a polynomial function of $F$ which may look like the following:

$H = \beta_0 + (\beta_1F) + (\beta_2F^2) + (\beta_3F^3)...(\beta_nF^n) + \epsilon$

Here, $H$ is represented by an $n^{th}$ degree function of $F$, where $n$ is said to be the <b>order</b> or <b>degree of the polynomial.</b>

A generalized equation of the polynomial would look like the following:

$y = \beta_0 + (\beta_1x) + (\beta_2x^2) + (\beta_3x^3)...(\beta_nx^n) + \epsilon$

This is a form to represent prediction of a scalar valued variable using a scalar valued predictor. If there are multiple predicted variables, i.e. multiple y's and multiple predictors, i.e. multiple x's, the vector representation of this function would be:

$\left[\begin{array}{cc}y_1\\y_2\\y_3\\...\\y_m\end{array}\right] = \left[\begin{array}{cc}1 & x_1 & {x_1}^2 & ... & {x_1}^n\\1 & x_2 & {x_2}^2 & ... & {x_2}^n\\1 & x_3 & {x_3}^2 & ... & {x_3}^n\\...\\1 & x_m & {x_m}^2 & ... & {x_m}^n\end{array}\right]\left[\begin{array}{cc}\beta_0\\\beta_1\\\beta_2\\...\\\beta_n\end{array}\right] + \left[\begin{array}{cc}\epsilon_1\\\epsilon_2\\\epsilon_3\\...\\\epsilon_m\end{array}\right]$

Reference: https://en.wikipedia.org/wiki/Polynomial_regression

Note that though $y$ is represented as a polynomial function of $x$ of $n^{th}$ degree, the weights $\beta_0,\beta_1,\beta_2...\beta_n$ are all linear, i.e. degree is one. Hence, as far as statistical estimation is concerned this is still a linear function.

## Fitting a Polynomial in Python

The way we implement polynomial regression in Python is by using the "Polynomial Features" function from scikit learn's preprocessing sub-module. Let us take a simple example to showcase polynomial regression.

We can start by creating data - a set of x values and a set of y values, where the y values would be derived from a polynomial function of x.

In [1]:
# Importing numpy to generate data
import numpy as np

# Setting seed to recreate same random sampling
np.random.seed(1111)

# Sampling x from a normal distribution and deriving y from a polynomial function of x
x = 0.73 - np.random.normal(0,1,100)
y = 3.4 - 2.6*x + 4.7*(x**2) - 3.9*(x**3) + 6.1*(x**4) + np.random.normal(-8,12,100)

 ### Solution code
 
 ```python
# Just run above code
```

Let us visualize the data in a simple plot

In [2]:
# Importing Bokeh modules
from bokeh.plotting import figure, show
from bokeh.io import show, output_notebook
from bokeh import plotting as pl
from bokeh.models import HoverTool
 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly import graph_objs as go
init_notebook_mode(connected=True)

output_notebook()

# Simple scatter plot with x and y values
p = figure(plot_width=800, plot_height=400)
p.circle(x,y, size=10, color="navy", alpha=0.5)
show(p)

 ### Solution code
 
 ```python
# Just run above code
```

### Exercise

Perform linear regression on above data and show the line of best fit in a plot (you may use any plotting library, Bokeh is not mandatory)

### Solution code

```python
# Importing linear regression module
from sklearn.linear_model import LinearRegression

# Converting 1 dimensional arrays, x and y, into 2-D arrays
x_2d = x[:, np.newaxis]
y_2d = y[:, np.newaxis]

# Fitting the model and performing predictions
model = LinearRegression()
model.fit(x_2d, y_2d)
y_pred = model.predict(x_2d)

# Reshaping predicted values array to facilitate easy plotting
y_pred = y_pred.reshape(-1)

# Plotting original data points and line of best fit using y_pred values
p = figure(plot_width=800, plot_height=400)
p.circle(x,y, size=10, color="navy", alpha=0.5)
p.line(x,y_pred,line_width=2,color="red")
show(p)
```

In order to fit a polynomial function to the same data, we use the "polynomial features" function to convert scalar features into vector features and then fit the linear model (Remember, as far as statistical estimation is concerned, the weights of the function are still linear, so we can simply use the linear regression model to fit the polynomial features after transformation).

In [5]:
# importing polynomial features function
from sklearn.preprocessing import PolynomialFeatures

# Converting 1 dimensional arrays, x and y, into 2-D arrays
x_2d = x[:, np.newaxis]
y_2d = y[:, np.newaxis]

# Transforming x into a polynomial feature of degree 3
pf = PolynomialFeatures(degree=3)
x_poly = pf.fit_transform(x_2d)

# Fitting linear regression model and predicting values using the polynomial function
model = LinearRegression()
model.fit(x_poly, y_2d)
y_poly_pred = model.predict(x_poly)

# Reshaping predicted values array to facilitate easy plotting
y_poly_pred = y_poly_pred.reshape(-1)

import operator

# sort the values of x_poly and y_poly_pred before line plot.
# This ensures a clean line instead of a zig-zag trace with unsorted values
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x_2d,y_poly_pred), key=sort_axis)
x_2d, y_poly_pred = zip(*sorted_zip) # creates 2 tuples

# Transforming the values in x_2d.
# Itemgetter iterates through the original array and creates each observation as a separate array
x_2d = tuple(np.array(list(x_2d)).reshape(-1))

# Plotting the polynomial function
p = figure(plot_width=800, plot_height=400)
p.circle(x,y, size=10, color="navy", alpha=0.5)
p.line(x_2d,y_poly_pred,line_width=2,color="red")
show(p)

### Solution code

```python
# Just run the above code
```

We can see in the above plot that we have fit a polynomial function of 3rd degree to the given data (See the 'degree' parameter in polynomial features function).


### Exercise

The below code fits a polynomial of 4th degree to the above data. Run the below code and view the function fit. Try experimenting with the 'degree' parameter by changing the value from 4 to higher values up until 25. Observe how the fit function changes.

In [11]:
# Recreating data so as to allow experimentation with varying degree values

x = 0.73 - np.random.normal(0,1,100)
y = 3.4 - 2.6*x + 4.7*(x**2) - 3.9*(x**3) + 6.1*(x**4) + np.random.normal(-8,12,100)

# Converting 1 dimensional arrays, x and y, into 2-D arrays
x_2d = x[:, np.newaxis]
y_2d = y[:, np.newaxis]

pf = PolynomialFeatures(degree=4)
x_poly = pf.fit_transform(x_2d)

model = LinearRegression()
model.fit(x_poly, y_2d)
y_poly_pred = model.predict(x_poly)

y_poly_pred = y_poly_pred.reshape(-1)

# sort the values of x_poly and y_poly_pred before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x_2d,y_poly_pred), key=sort_axis)
x_2d, y_poly_pred = zip(*sorted_zip)

x_2d = tuple(np.array(list(x_2d)).reshape(-1))

p = figure(plot_width=800, plot_height=400)
p.circle(x,y, size=10, color="navy", alpha=0.5)
p.line(x_2d,y_poly_pred,line_width=2,color="red")
show(p)

### Solution code

```python
# Just run above code
```

### Bias vs Variance

Here we can learn an important concept. As we increase the degree of the function fit, we see that polynomials of higher complexity try to trace the observations much closely, i.e. the fit function tries to move through every observation. This is what is called <b>"high variance"</b>. So at the lowest degree of the polynomial (i.e., degree=1) there is high bias, that a linear function accurately fits the data, and as we fit higher degree polynomials the variance increases. This is the bias vs variance concept that we will study time and again.

In [None]:
# End of notebook

### Solution code

```python
# End of notebook
```