# Interaction Effects

### Introduction

So far we have spoken about generating different types of features, whether numerical, or categorical.  Sometimes, the effect of one feature may depend on the value of another feature.  Let's further explore when and how to capture this in the next lesson.

### A multiple regression model

One way to think about interactions is simply that the whole is greater than the sum of the parts.  For example, let's say that we are marketing a food delivery service like Postmates.  Postmates is trying to expand it's presence in New York City. To promote the product, it decides upon a two pronged marketing strategy of both offering discounts for the service (five to twenty-five dollars off), and increasing advertising spending. 

It's model to capture the effects of advertising and discounts look like the following:

$$ customers = \theta_2*ad\_spending + \theta_1*discount\_spending + \theta_0 $$ 

Where $\theta_0$ represents the baseline number of customers.

Because Postmates has offering similar campaign strategies to expand their presence in similar markets, they have already gathered data on the effects different mixes of advertising and discounts.  They have found the following parameters for the above model.

* $\theta_2 = .3$
* $\theta_1 = .2$
* $\theta_0 = 35500$

So for every ten dollars spent on advertising, they expect to gain three customers, and they already have a baseline of 35,500 customers.

But if the above numbers are correct why wouldn't a company like Postmates place all of their money into `ad_spending`, and no money in `discount_spending`?

Well the reason our model above should likely account for another effect - the multiplicative effect of advertising and spending together.  That is, in addition to the normal boost we get from advertising, we may get an even larger boost if that advertising coincides with discounts.  And vice versa, we'd get more from our discounts if we are also advertising.  To account for these interaction effects, we update our model from:

$$ \hat{customers} = \hat\theta_2*ad\_spending + \hat\theta_1*discount\_spending + \hat\theta_0 $$ 

to:

$$ \hat{customers} = \hat\theta_1*discount\_spending + \hat\theta_2*ad\_spending  + \hat\theta_3*ad\_spending*discount\_spending + \hat\theta_0 $$ 

Let's focus in on our updated model, and assume the following parameters.  

* $\theta_3 = .01$
* $\theta_2 = .3$
* $\theta_1 = .2$
* $\theta_0 = 35500$

And let's consider we spend $4,000$ on advertising and $2,000$ on discount spending.  So then expected customers equals the following:

$$ customers = .2*2000 + .3*4000 + .01*4,000*2,000$$

$$ customers = 400 + 1,200 + 80,000 $$

So in the our updated model, we can see that we gain a lot more by spending on advertising and discounts together than we spending on either of them separately.

### Modeling with Interaction Effects

Now modeling with interaction effects is easy enough.  It comes right out of the box for us with sklearn with our `PolynomialFeatures` constructor.  Let's see this by walking through [the documentation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) on Polynomial Features.  

First, we'll reveiw how polynomial features works without interaction effects.

In [63]:
X = np.arange(6).reshape(3, 2)
X    

array([[0, 1],
       [2, 3],
       [4, 5]])

In [64]:
poly = PolynomialFeatures(2)
poly.fit_transform(X)

array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])

By initialize `PolynomialFeatures` with degree equals 2, we simply raise each of our terms to a power until we get to each feature being squared.

However, we can just gather the interaction effects with the following.

In [65]:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
poly = PolynomialFeatures(interaction_only=True)
poly.fit_transform(X)

array([[ 1.,  0.,  1.,  0.],
       [ 1.,  2.,  3.,  6.],
       [ 1.,  4.,  5., 20.]])

Now, as you can see, we have a linear model of our intercepts, our two features, and the last column as a product of our previous two features.  

Now let'sÂ apply this to our domain of ad spending and discount spending.

In [66]:
np.random.seed(3)
ad_spending = (np.random.randint(0, 30, 150) * 100).reshape(-1, 1)
discount_spending = (np.random.randint(0, 30, 150) * 100).reshape(-1, 1)

In [74]:
observations = np.hstack((ad_spending, discount_spending))
interactions_transformer = PolynomialFeatures(interaction_only=True)
interaction_features = interactions_transformer.fit_transform(observations)
interaction_features[:5]

array([[1.00e+00, 1.00e+03, 2.00e+03, 2.00e+06],
       [1.00e+00, 2.40e+03, 1.00e+02, 2.40e+05],
       [1.00e+00, 2.50e+03, 1.80e+03, 4.50e+06],
       [1.00e+00, 3.00e+02, 2.00e+02, 6.00e+04],
       [1.00e+00, 2.40e+03, 2.30e+03, 5.52e+06]])

So the last column is a product of our previous two features.

### Interactions with Categorical Variables

### Summary