# Chapter 3 Linear Regression

In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import ISLP

## Conceptual

### Q1

Describe the null hypotheses to which the $p$-values given in Table 3.4 correspond.
Explain what conclusions you can draw based on these $p$-values.
Your explanation should be phrased in terms of `sales`, `TV`, `radio`, and `newspaper`, rather than in terms of the coefficients of the linear model.

### A1

The first $p$-value for `Intercept` corresponds to the null hypothesis that the `sales` is zero when `TV`, `radio`, and `newspaper` are all zero.
The second $p$-value for `TV` corresponds to the null hypothesis that there is no relationship between `sales` and `TV`.
The third $p$-value for `radio` corresponds to the null hypothesis that there is no relationship between `sales` and `radio`.
The fourth $p$-value for `newspaper` corresponds to the null hypothesis that there is no relationship between `sales` and `newspaper`.
Since the first three $p$-values are low and less than 0.0001, we can reject those null hypotheses.
Since the fourth $p$-value is large, we cannot reject the null hypothesis.

### Q2

Carefully explain the differences between the KNN classifier and KNN regression methods.

### A2

The KNN classifier method takes a prediction point $x_0$, looks at the $K$ nearest points in the training data, and makes a classification based on the most common qualitative output value.
The KNN regression method takes a prediction point $x_0$, looks at the $K$ nearest points in the training data, and makes a prediction based on the average quantitative output value.

### Q3

Suppose we have a data set with five predictors, $X_1 =$ GPA, $X_2 =$ IQ, $X_3 =$ Level (1 for College and 0 for High School), $X_4 =$ Interaction between GPA and IQ, and $X_5 =$ Interaction between GPA and Level.
The response is starting salary after graduation (in thousands of dollars).
Suppose we use least squares to fit the model, and get $\hat\beta_0 = 50$, $\hat\beta_1 = 20$, $\hat\beta_2 = 0.07$, $\hat\beta_3 = 35$, $\hat\beta_4 = 0.01$, $\hat\beta_5 = −10$.
- (a) Which answer is correct, and why?
    - i. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates.
    - ii. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates.
    - iii. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.
    - iv. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates provided that the GPA is high enough.
- (b) Predict the salary of a college graduate with IQ of 110 and a GPA of 4.0.
- (c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an  interaction effect. Justify your answer.

### A3

- $\hat Y = \hat\beta_0 + \hat\beta_1 X_1 + \hat\beta_2 X_2 + \hat\beta_3 X_3 + \hat\beta_4 X_4 + \hat\beta_5 X_5$
- $\hat Y = \hat\beta_0 + \hat\beta_1 (\mathrm{GPA}) + \hat\beta_2 (\mathrm{IQ}) + \hat\beta_3 (\mathrm{Level}) + \hat\beta_4 (\mathrm{GPA}) (\mathrm{IQ}) + \hat\beta_5 (\mathrm{GPA}) (\mathrm{Level})$
- $\hat Y = \hat\beta_0 + \hat\beta_1 (\mathrm{GPA}) + \hat\beta_2 (\mathrm{IQ}) + \hat\beta_4 (\mathrm{GPA}) (\mathrm{IQ}) + \left(\hat\beta_3 + \hat\beta_5 (\mathrm{GPA})\right) \times (\mathrm{Level})$

For part (a), the third answer is correct.

> For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.

$\hat\beta_3 + \hat\beta_5 (\mathrm{GPA})$ is the coefficient for `Level`.
Since $\hat\beta_3 > 0$ and $\hat\beta_5 < 0$, the overall coefficient can be negative when $\mathrm{GPA} > - \frac{\hat\beta_3}{\hat\beta_5}$.
When the overall coefficient is negative, which occurs when GPA is high enough, then being a college graduate can lead to decreases in earnings on average.

In [3]:
def starting_salary(gpa, iq, level):
    beta = np.array([50, 20, 0.07, 35, 0.01, -10])
    x = np.array([1, gpa, iq, level, gpa * iq, gpa * level])
    return np.dot(beta, x)

# Predict the salary of a college graduate with IQ of 110 and GPA of 4.0
print(f"Starting salaring in thousands of dollars: {starting_salary(4.0, 110, 1)}")

Starting salaring in thousands of dollars: 137.1


The following statement is false.

> Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect.

Even though the coefficient is small, there could still be an interaction effect.
In depends on the standard error and whether the confidence interval contains 0.