# 06 Polynomial Regression

What if your data doesn't look linear at all? Let's look at some more realistic-looking page speed / purchase data:

In [None]:
%matplotlib inline
from pylab import *
import numpy as np

plt.figure(figsize =(14, 10))
np.random.seed(2)
pageSpeeds = np.random.normal(3.0, 1.0, 1000)
plt.xlabel('Velocidad (s)')
plt.ylabel('Monto comprado (US$)')
purchaseAmount = np.random.normal(50.0, 20.0, 1000) / pageSpeeds

scatter(pageSpeeds, purchaseAmount, color='r', alpha=0.5)

numpy has a handy polyfit function we can use, to let us construct an nth-degree polynomial model of our data that minimizes squared error. Let's try it with a 4th degree polynomial:

In [None]:
x = np.array(pageSpeeds)
y = np.array(purchaseAmount)

p1 = np.poly1d(np.polyfit(x, y, 1))

p3 = np.poly1d(np.polyfit(x, y, 3))

p4 = np.poly1d(np.polyfit(x, y, 4))

p5 = np.poly1d(np.polyfit(x, y, 5))

We'll visualize our original scatter plot, together with a plot of our predicted values using the polynomial for page speed times ranging from 0-7 seconds:

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize =(12, 10))
xp = np.linspace(0, 10, 400)
plt.scatter(x, y, c='r',alpha=0.5)
plt.plot(xp, p5(xp), c='b')
plt.show()

Looks pretty good! Let's measure the r-squared error:

In [None]:
from sklearn.metrics import r2_score

r2 = r2_score(y, p5(x))

print('r2 = {:.3f}'.format(r2))
