ch3/answers

1. In Table 3.4, the null hypothesis for "TV" is that in the presence of radio
ads and newspaper ads, TV ads have no effect on sales. Similarly, the null
hypothesis for "radio" is that in the presence of TV and newspaper ads, radio
ads have no effect on sales. (And there is a similar null hypothesis for
"newspaper".) The low p-values of TV and radio suggest that the null hypotheses
are false for TV and radio. The high p-value of newspaper suggests that the null
hypothesis is true for newspaper.


2. KNN classifier and KNN regression methods are closely related in formula.
However, the final result of KNN classifier is the classification output for Y
(qualitative), where as the output for a KNN regression predicts the
quantitative value for f(X).


3. Y = 50 + 20(gpa) + 0.07(iq) + 35(gender) + 0.01(gpa * iq) - 10 (gpa * gender)

(a) Y = 50 + 20 k_1 + 0.07 k_2 + 35 gender + 0.01(k_1 * k_2) - 10 (k_1 * gender)
male: (gender = 0)   50 + 20 k_1 + 0.07 k_2 + 0.01(k_1 * k_2)
female: (gender = 1) 50 + 20 k_1 + 0.07 k_2 + 35 + 0.01(k_1 * k_2) - 10 (k_1)

Once the GPA is high enough, males earn more on average. => iii.

(b) Y(Gender = 1, IQ = 110, GPA = 4.0)
= 50 + 20 * 4 + 0.07 * 110 + 35 + 0.01 (4 * 110) - 10 * 4
= 137.1

(c) False. We must examine the p-value of the regression coefficient to
determine if the interaction term is statistically significant or not.


4. (a) I would expect the polynomial regression to have a lower training RSS
than the linear regression because it could make a tighter fit against data that
matched with a wider irreducible error (Var(epsilon)).

(b) Converse to (a), I would expect the polynomial regression to have a higher
test RSS as the overfit from training would have more error than the linear
regression.

(c) Polynomial regression has lower train RSS than the linear fit because of
higher flexibility: no matter what the underlying true relationshop is the
more flexible model will closer follow points and reduce train RSS.
An example of this beahvior is shown on Figure~2.9 from Chapter 2.

(d) There is not enough information to tell which test RSS would be lower
for either regression given the problem statement is defined as not knowing
"how far it is from linear". If it is closer to linear than cubic, the linear
regression test RSS could be lower than the cubic regression test RSS.
Or, if it is closer to cubic than linear, the cubic regression test RSS
could be lower than the linear regression test RSS. It is dues to
bias-variance tradeoff: it is not clear what level of flexibility will
fit data better.


5. See 5.jpg.


6. y = B_0 + B_1 x
from (3.4): B_0 = avg(y) - B_1 avg(x)
right hand side will equal 0 if (avg(x), avg(y)) is a point on the line
0 = B_0 + B_1 avg(x) - avg(y)
0 = (avg(y) - B_1 avg(x)) + B_1 avg(x) - avg(y)
0 = 0