ch4conceptual.Rmd

---
title: "Conceptual Exercises of Chapter 4"
output: html_notebook
---

# Question 1

Let $Y=e^{\beta_0 + \beta_1 X}$, we have:
$$
p(X) = \frac{Y}{1 + Y} \\
\therefore p(X) + Yp(X) = Y \\
p(X) = [1 - p(X)] Y \\
\frac{p(X)}{1 - p(X)} = Y = e^{\beta_0 + \beta_1 X}
$$

# Question 2

When proving the same $k$ produce both maximum $p_k(x)$ and maximum of $\delta_k(x)$,
we have assumed that $\sigma_1^2=\dots=\sigma_K^2$, the only variable is $k$ in equation (4.12) and (4.13). So let
$$
\frac { \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} x^2) } {\sum_{l=1}^K { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }} = C
$$

Take it into equation (4.12) and (4.13), we have:
$$
p_k(x) = \exp (x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2}) \pi_k C \\
\therefore
log(p_k(x))
= x\frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + log(\pi_k) + log(C)
= \delta_k(x) + log(C)
$$

For logarithm function is monotonically increasing, when $\delta_k(x)$ get its maximum,
$p_k(x)$ get its maximum, too.

# Question 3

Like above question, but without assumption that $\sigma_1^2 = \dots = \sigma_K^2$,
let:
$$
\frac {\frac {1} {\sqrt{2 \pi}}} {\sum_{l=1}^K { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }} = C
$$

Take it into equation (4.12) and (4.13), we have:
$$
p_k(x) = \exp(-\frac{(x - \mu_k)^2}{2 \sigma_k^2}) \frac{\pi_k}{\sigma_k} C \\
\therefore
log(p_k(x)) = \delta_k(x)
= -\frac{(x - \mu_k)^2}{2 \sigma_k^2} + log(\frac{\pi_k}{\sigma_k}) + log(C)
$$

So $\delta_k(x)$ is a quadratic function of $x$.

# Question 4

## 4a ~ 4d
0.1 (10% in other words), 0.01, and $10^{-100}$.
As the increase of $p$, the near points decrease exponentially.

## 4e
The length of each side for $p$ dimensional hypercube is $0.1^{\frac1p}$.

# Question 5

5a: When the Bayes decision boundary is linear, QDA performs better than LDA on training set.
LDA performs better than QDA on test set.

5b: When the Bayes boundary is non-linear, QDA performs better than LDA on both training and test sets.

5c: When $n$ increase, QDA predicts more accurately than LDA, because the bias of QDA decrease faster than LDA.
See the first paragraph of page 150 for reference.

5d: False. The bias of QDA can be smaller than LDA, which produces higher variance than LDA.
The higher variance produces higher error rate in test data. See figure 4.9 for reference.

# Question 6

According to equation (4.7):
$$
p(X) = \frac{e^{\beta_0 + \beta_1X_1 + \beta_2X_2}}{1 + e^{\beta_0 + \beta_1X_1 + \beta_2X_2}} \\
= \frac{e^{-6 + 0.05 \times 40 + 3.5}}{1 + e^{-6 + 0.05 \times 40 + 3.5}} \\
= 0.378
$$

Let the hours needed to study as $h$, we have:
$$
p(X) = \frac{e^{-6 + 0.05h + 3.5}}{1 + e^{-6 + 0.05h + 3.5}} = 0.5 \\
\Rightarrow -6 + 0.05h + 3.5 = 0 \\
h = 50
$$

The student need 50 hours to have a 50% chance of getting an A in the class.

# Question 7

Let 1 denotes "Yes" and 2 denotes "No', with equation (4.15) we have: $\mu_1 = 10, \mu_2 = 0, \pi_1 = 0.8, \pi_2 = 0.2, \sigma^2 = 36$. 
Take them into equation (4.12) with $x = 4$:
```{r}
item1 = 0.8 / sqrt(2*pi) / 6 * exp(-1 / (2 * 36) * (4 - 10)^2)
item2 = 0.2 / sqrt(2*pi) / 6 * exp(-1 / (2 * 36) * (4 - 0)^2)
p1x = item1 / (item1 + item2)
p1x
```

The probability issuing a dividend this year is 75.2%.

# Question 8
For KNN with $k = 1$, the training error rate is 0%, because for any training observation, the response is the nearest predictor itself. So the test error is 36% for KNN, which is higher than that of logistic regression (30%).
So far the latter is better. But I prefer using higer $k$ value to find out better solutions.

# Question 9
## 9a
Let $p$ denotes the default probability, we have:
$$
\frac{p}{1-p} = 0.37 \Rightarrow p = \frac{0.37}{1+0.37} = 0.27
$$

## 9b
The odds of default is:
$$
\frac{0.16}{1-0.16} = 0.19
$$