In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from ggplot import mtcars

Linear Model Implementation
-------------------------

In order to aid the analyses below, le'ts first create a simple `LinearModel` that we can use to fit single variables.

In [2]:
from collections import namedtuple


class LinearRegressor(object):
    
    def __init__(self, y):
        self.y = np.array(y)
        self.N = len(y)
        self.mu_y = np.mean(self.y)
        
    def fit(self, x):
        x = np.array(x)
        mu_x = np.mean(x)
        beta1 = np.dot(x - mu_x, self.y - self.mu_y) / sum((x - mu_x) ** 2)
        beta0 = self.mu_y - beta1 * mu_x
        
        yhat = (lambda x: beta0 + beta1 * x)(x)
        resid = y - yhat
        
        return {
            'intercept': beta0,
            'slope': beta1,
            'yhat': yhat,
            'residuals': resid,
        }

Q1
--

Consider the `𝚖𝚝𝚌𝚊𝚛𝚜` data set. Fit a model with `mpg` as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

* -6.071
* -3.206
* 33.991
* -4.256

Q2
--

Consider the `𝚖𝚝𝚌𝚊𝚛𝚜` data set. Fit a model with `mpg` as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

* Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.
* Within a given weight, 8 cylinder vehicles have an expected 12 mpg drop in fuel efficiency.
* Including or excluding weight does not appear to change anything regarding the estimated impact of number of cylinders on mpg.
* Holding weight constant, cylinder appears to have more of an impact on mpg than if weight is disregarded.

Q3
--

Consider the `𝚖𝚝𝚌𝚊𝚛𝚜` data set. Fit a model with `mpg` as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

* The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.
* The P-value is small (less than 0.05). Thus it is surely true that there is no interaction term in the true model.
* The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is necessary
* The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms is necessary.
* The P-value is small (less than 0.05). Thus it is surely true that there is an interaction term in the true model.
* The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is not necessary.

Q4
--

Consider the `𝚖𝚝𝚌𝚊𝚛𝚜` data set. Fit a model with `mpg` as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

```python
lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
```

How is the wt coefficient interpretted?

* The estimated expected change in MPG per half ton increase in weight.
* The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).
* The estimated expected change in MPG per half ton increase in weight for for a specific number of cylinders (4, 6, 8).
* The estimated expected change in MPG per half ton increase in weight for the average number of cylinders.
* The estimated expected change in MPG per one ton increase in weight.

Q5
--

Consider the following data set

```python
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
```

Give the hat diagonal for the most influential point

* 0.2025
* 0.2287
* 0.9946
* 0.2804

Q6
--

Consider the following data set

```python
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
```

Give the slope dfbeta for the point with the highest hat value.

* -.00134
* 0.673
* -0.378
* -134

Q7
--

Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

* Adjusting for another variable can only attenuate the coefficient toward zero. It can't materially change sign.
* The coefficient can't change sign after adjustment, except for slight numerical pathological cases.
* For the the coefficient to change sign, there must be a significant interaction term.
* **It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.**