You have a feature whose effect on the target variable depends on another
feature

Create an interaction term to capture that dependence using scikit-learn’s
PolynomialFeatures:

Sometimes a feature’s effect on our target variable is at least partially dependent
on another feature. For example, imagine a simple coffee-based example where
we have two binary features—the presence of sugar (sugar) and whether or not
we have stirred (stirred)—and we want to predict if the coffee tastes sweet.
Just putting sugar in the coffee (sugar=1, stirred=0) won’t make the coffee
taste sweet (all the sugar is at the bottom!) and just stirring the coffee without
adding sugar (sugar=0, stirred=1) won’t make it sweet either. Instead it is the
interaction of putting sugar in the coffee and stirring the coffee (sugar=1,
stirred=1) that will make a coffee taste sweet. The effects of sugar and stir
on sweetness are dependent on each other. In this case we say there is an
interaction effect between the features sugar and stirred.
We can account for interaction effects by including a new feature comprising the
product of corresponding values from the interacting features:

![](./pics/PolynomialFeatures.jpg)

In [10]:
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [11]:
boston=load_boston()
feature=boston.data[:, 0:2]
target= boston.target



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [15]:
# Create interaction term
polyFeatures=PolynomialFeatures(interaction_only=True, include_bias=False, degree=3)
feature_interative=polyFeatures.fit_transform(feature)


regression=LinearRegression()
model=regression.fit(feature_interative, target)

In [17]:
model.predict(feature_interative)[0]*1000-target[0]*1000

-365.1770742652552