In [9]:
###
#
# This is a temporary file for developing the section on multiple linear regression before adding it to the 
#     combined scribing file
#
# The contents of this file have been added to the combined scribing file. This file is being retained as a backup
#     but can be deleted
#
###

## Multiple Linear Regression
We use linear regression to determine a general model about how one explanatory variable impacts a response variable, however it is often the case that many different explanatory variables can contribute to the response. To try and model this behavior we use multiple linear regression.

Multiple linear regression (MLR) tries to find a model of a linear relationship between the response variable and the explanatory variables. Generally, the more explanatory variables that can be added to the regression model, the more informed the model is considered to be.

For the examples in this section, we will be using data reporting the amount of dollars a company spent on TV, radio, and newspaper advertising, and what the company made in sales, over some time period. We want to find to what extent spending on each type of advertising impacts sales.


In [10]:
import pandas as pd
import statsmodels.formula.api as sm
advertising = pd.read_csv('data/Advertising.csv', usecols=[1,2,3,4])

### The Naive Approach to Multiple Linear Regression
The naive way of preforming a multiple linear regression is to build independent linear regressions for each explanatory variable, and then to combine the results from each.

Let's look at the linear regressions of TV, radio, and newspaper advertising on sales individually:

In [11]:
est = sm.ols('sales ~ TV', advertising).fit()
est.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,7.0326,0.458,15.360,0.000,6.130,7.935
TV,0.0475,0.003,17.668,0.000,0.042,0.053


In [12]:
est = sm.ols('sales ~ radio', advertising).fit()
est.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,9.3116,0.563,16.542,0.000,8.202,10.422
radio,0.2025,0.020,9.921,0.000,0.162,0.243


In [13]:
est = sm.ols('sales ~ newspaper', advertising).fit()
est.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,12.3514,0.621,19.876,0.000,11.126,13.577
newspaper,0.0547,0.017,3.300,0.001,0.022,0.087


We can see that for each of our explanatory variables, our linear regression predicts a significant effect on sales.

However, now that we have our individual linear regressions, we run into some problems with this naive approach. Firstly, we have no clear-cut way of combining each regression. Secondly, we have no way of accounting for one explanatory variable correlating with another which can over or understate the relationship between the response variable and a specific explanatory variable.

### Approaching Multiple Linear Regression
To begin with, in order to perform a multiple linear regression we rewrite our linear regression equation to account for all variables:

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon
$$

For our new equation, $p$ is the number of paramaters (explanatory variables) in the model and $\beta_i$ is the average effect on $y$ for a unit increase in the predictor $x_i$, holding all other values constant. That is to say, if we increase the value of $x_i$ by one unit, we expect to see a change in the value of $y$ by the amount of $\beta_i$.

The general approach to multiple linear regression is shown in the following image. For a regression of radio and TV advertising on sales, we arrange the data in n-dimensional space (3 dimensions in this case) then try to find the hyperplane that best fits the predictors to the response variable. The values of $\beta_i$ that produce this hyperplane are what our regression reports to us.

<img src="images/plane.png">


### Using Multiple Linear Regression
We can use stats model to compute a MLR on our example data. We rewrite our multiple linear regression equation to match the parameters we want to investigate as follows:

$$
sales = \beta_0 ~~+~~ \beta_1 \times \text{TV_budget} ~~+~~ \beta_2 \times \text{radio_budget} ~~+~~\beta_3 \times \text{newspaper_budget} ~~+~~ \epsilon
$$

In [14]:
est = sm.ols('sales ~ TV + radio + newspaper', advertising).fit()
est.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.9389,0.312,9.422,0.000,2.324,3.554
TV,0.0458,0.001,32.809,0.000,0.043,0.049
radio,0.1885,0.009,21.893,0.000,0.172,0.206
newspaper,-0.0010,0.006,-0.177,0.860,-0.013,0.011


### Interpreting the Results of the Regression
We have to be able to interpret the results of our regression to get any meaning from it. The 'coef' column contains the values of $\beta_i$ for each rows' explanatory variable $x_i$ from our regression equation. Using this we can write out our regression equation as follows:

$$
sales = 2.9389 ~~+~~ 0.0458 \times \text{TV_budget} ~~+~~ 0.1185 \times \text{radio_budget} ~~+~~-0.0010 \times \text{newspaper_budget} ~~+~~ \epsilon
$$

This equation tells us that for each additional \\$1000 spent on TV advertising, we expect on average, an increase in sales of 45 units, and for each additional \\$1000 spent on radio advertising, we expect on average, an increase in sales of 188 units. Since the p value for both of these variables is low (near zero in fact) we can assume that their effects on sales is significant to the model.

Further we can assume that newspaper spending is not significant in our model and thusly does not contribute to it. This is supported by the newspaper parameter's very high p value of 0.86. This means it is quite likely that any effect on sales from newspaper advertising is purely from random noise, which allows us to remove it from our model entirely.

### Interaction
The question then becomes "why was newspaper advertising insignificant on sales in the MLR model when it had a significant effect in the simple linear regression model?". Recall that in the simple linear regression, newspaper advertising was predicted to have an effect of 0.0547 with a p value of 0.001. To explain why in our multiple linear regression this effect dropped to zero, we need to look at the correlation between the variables we used.

In [15]:
advertising.corr()

Unnamed: 0,TV,radio,newspaper,sales
TV,1.0,0.054809,0.056648,0.782224
radio,0.054809,1.0,0.354104,0.576223
newspaper,0.056648,0.354104,1.0,0.228299
sales,0.782224,0.576223,0.228299,1.0


As we can see, the correlation between radio advertising and newspaper advertising is about 0.35. This positive correlation indicates that in some cases, spending on radio advertising is accompanied by spending on newspaper advertising. This leads to a series of events that explains newspaper's perceived significance in the simple linear regression.

First, radio advertising increases leading to increased sales. Second, since radio advertising and newspaper advertising are correlated, newspaper advertising also increases. Then finally, since newspaper advertising and sales both increased, we observe that newspaper advertising increased sales despite the fact that radio advertising was the actual cause of the increase.

Our simple linear regression model sees this chain of events and cannot pick out the contribution to sales by radio advertising from newspaper advertising by itself, so it reports a significant relationship between newspaper advertising and sales. Our multiple linear regression model is able to take radio advertising's effect on sales out of newspaper advertising's effect on sales revealing that newspaper ads do not have a significant effect on sales.

### Important Concepts to Explore
Multiple linear regression is a powerful tool, but it is also accompanied by its share of weaknesses. Multiple linear regression can account for dozens or more predictors (explanatory variables). This means that we can create more robust models than a simple linear regression, however this also means that, as more variables are added to the model, the harder it will become to explain the model, and the harder it will become to account for the correlation between each individual predictor.

Further, if we introduce explanatory variables to the model that do not actually have a significant effect on the response variable, either due to the nature of the variable or its correlation with another variable, we can essentially spend time learning "the wrong thing". 

Because of these two concepts and their potential impacts on the model itself, determining the optimal group of explanatory variables is no simple task. There is no definitive right way to determine arguably the most important aspect of the model itself.
<br><br>
<center>
"Identifying the best subset among many variables to include in a model – is arguably the hardest part of model building." - <a href="https://link.springer.com/article/10.1057/jt.2009.26">Bruce Ratner</a>
</center>

#### Further Reading on MLR
&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://www.investopedia.com/terms/m/mlr.asp" target="_blank">Multiple Linear Regression – MLR Definition</a>

&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf#page=85" target="_blank">An Introduction to Statistical Learning with Applications in R</a>
