Discussion
So far we have only discussed modeling linear relationships. An example of a
linear relationship would be the number of stories a building has and the
building’s height. In linear regression, we assume the effect of number of stories
and building height is approximately constant, meaning a 20-story building will
be roughly twice as high as a 10-story building, which will be roughly twice as
high as a 5-story building. Many relationships of interest, however, are not
strictly linear.
Often we want to model a non-linear relationship—for example, the relationship
between the number of hours a student studies and the score she gets on the test.
Intuitively, we can imagine there is a big difference in test scores between
students who study for one hour compared to students who did not study at all.
However, there is a much smaller difference in test scores between a student who
studied for 99 hours and a student who studied for 100 hours. The effect one
hour of studying has on a student’s test score decreases as the number of hours
increases.
Polynomial regression is an extension of linear regression to allow us to model
nonlinear relationships.

![](./pics/polynomialFunction.jpg)

where d is the degree of the polynomial. How are we able to use a linear
regression for a nonlinear function? The answer is that we do not change how
the linear regression fits the model, but rather only add polynomial features. That
is, the linear regression does not “know” that the x
2
is a quadratic transformation
of x. It just considers it one more variable.
A more practical description might be in order. To model nonlinear relationships,
we can create new features that raise an existing feature, x, up to some power: x^^2 or x^^3
, and so on. The more of these new features we add, the more flexible the
“line” created by our model.

PolynomialFeatures has two important parameters. First, degree determines
the maximum number of degrees for the polynomial features. For example,
degree=3 will generate x
2 and x
3
. Finally, by default PolynomialFeatures
includes a feature containing only ones (called a bias). We can remove that by
setting include_bias=False.

In [1]:
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [4]:
boston=load_boston()
feature=boston.data[:, 0:2]
target=boston.target


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [5]:
polynomial=PolynomialFeatures(degree=3, include_bias=False)
feature_poly=polynomial.fit_transform(feature)
regression=LinearRegression()
model=regression.fit(feature_poly, target)


In [6]:
model.predict(feature_poly)[0]*1000-target[0]*1000

243.18225650302338