### Multiple Regression

### Big Andy's Burger Barn

In [1]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from statsmodels.stats.anova import anova_lm
import matplotlib.pyplot as plt

andy = pd.read_sas('andy.sas7bdat')

In [2]:
print(andy.head())

   sales  price  advert
0   73.2   5.69     1.3
1   71.8   6.49     2.9
2   62.4   5.63     0.8
3   67.4   6.22     0.7
4   89.3   5.02     1.5


In [3]:
mdl_andy = smf.ols('sales~price+advert', data=andy).fit()

print(mdl_andy.summary())

                            OLS Regression Results                            
Dep. Variable:                  sales   R-squared:                       0.448
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     29.25
Date:                Sat, 03 Apr 2021   Prob (F-statistic):           5.04e-10
Time:                        17:24:13   Log-Likelihood:                -223.87
No. Observations:                  75   AIC:                             453.7
Df Residuals:                      72   BIC:                             460.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    118.9136      6.352     18.722      0.0

### Generating an Anova Table with `anova_lm`

This does not generate all the materials we want

In [4]:
print(anova_lm(mdl_andy))

            df       sum_sq      mean_sq          F        PR(>F)
price      1.0  1219.091030  1219.091030  51.063099  5.945932e-10
advert     1.0   177.447900   177.447900   7.432620  8.038182e-03
Residual  72.0  1718.942937    23.874207        NaN           NaN


### Making an Anova Table Manually

- You don't need to change anything in the following cell. Just generate the `anova_table` function first. Please note that you need to `import pandas as pd` in your codes first.
- Call `anova_table` function and insert the regression result as an input argument. `mdl_andy` in this case.

In [14]:
def anova_table(mdl_result):
    anova_dict = {
        'Source':['Model','Error','Total'],
        'DF':[mdl_result.df_model, mdl_result.df_resid, mdl_result.df_model+mdl_result.df_resid],
        'Sum of Squares': [mdl_result.ess, mdl_result.ssr, mdl_result.centered_tss],
        'Mean Square':[mdl_result.mse_model, mdl_result.mse_resid,'']
    }
    anova_df = pd.DataFrame(anova_dict).set_index('Source')
    anova_df['DF'] = anova_df['DF'].astype('int')
    print(anova_df)

In [15]:
anova_table(mdl_andy)

        DF  Sum of Squares Mean Square
Source                                
Model    2     1396.538930  698.269465
Error   72     1718.942937   23.874207
Total   74     3115.481867            
