forked from leetschau/ISLNotes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ch4conceptual.Rmd
122 lines (94 loc) · 3.78 KB
/
ch4conceptual.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: "Conceptual Exercises of Chapter 4"
output: html_notebook
---
# Question 1
Let $Y=e^{\beta_0 + \beta_1 X}$, we have:
$$
p(X) = \frac{Y}{1 + Y} \\
\therefore p(X) + Yp(X) = Y \\
p(X) = [1 - p(X)] Y \\
\frac{p(X)}{1 - p(X)} = Y = e^{\beta_0 + \beta_1 X}
$$
# Question 2
When proving the same $k$ produce both maximum $p_k(x)$ and maximum of $\delta_k(x)$,
we have assumed that $\sigma_1^2=\dots=\sigma_K^2$, the only variable is $k$ in equation (4.12) and (4.13). So let
$$
\frac { \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} x^2) } {\sum_{l=1}^K { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }} = C
$$
Take it into equation (4.12) and (4.13), we have:
$$
p_k(x) = \exp (x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2}) \pi_k C \\
\therefore
log(p_k(x))
= x\frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + log(\pi_k) + log(C)
= \delta_k(x) + log(C)
$$
For logarithm function is monotonically increasing, when $\delta_k(x)$ get its maximum,
$p_k(x)$ get its maximum, too.
# Question 3
Like above question, but without assumption that $\sigma_1^2 = \dots = \sigma_K^2$,
let:
$$
\frac {\frac {1} {\sqrt{2 \pi}}} {\sum_{l=1}^K { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }} = C
$$
Take it into equation (4.12) and (4.13), we have:
$$
p_k(x) = \exp(-\frac{(x - \mu_k)^2}{2 \sigma_k^2}) \frac{\pi_k}{\sigma_k} C \\
\therefore
log(p_k(x)) = \delta_k(x)
= -\frac{(x - \mu_k)^2}{2 \sigma_k^2} + log(\frac{\pi_k}{\sigma_k}) + log(C)
$$
So $\delta_k(x)$ is a quadratic function of $x$.
# Question 4
## 4a ~ 4d
0.1 (10% in other words), 0.01, and $10^{-100}$.
As the increase of $p$, the near points decrease exponentially.
## 4e
The length of each side for $p$ dimensional hypercube is $0.1^{\frac1p}$.
# Question 5
5a: When the Bayes decision boundary is linear, QDA performs better than LDA on training set.
LDA performs better than QDA on test set.
5b: When the Bayes boundary is non-linear, QDA performs better than LDA on both training and test sets.
5c: When $n$ increase, QDA predicts more accurately than LDA, because the bias of QDA decrease faster than LDA.
See the first paragraph of page 150 for reference.
5d: False. The bias of QDA can be smaller than LDA, which produces higher variance than LDA.
The higher variance produces higher error rate in test data. See figure 4.9 for reference.
# Question 6
According to equation (4.7):
$$
p(X) = \frac{e^{\beta_0 + \beta_1X_1 + \beta_2X_2}}{1 + e^{\beta_0 + \beta_1X_1 + \beta_2X_2}} \\
= \frac{e^{-6 + 0.05 \times 40 + 3.5}}{1 + e^{-6 + 0.05 \times 40 + 3.5}} \\
= 0.378
$$
Let the hours needed to study as $h$, we have:
$$
p(X) = \frac{e^{-6 + 0.05h + 3.5}}{1 + e^{-6 + 0.05h + 3.5}} = 0.5 \\
\Rightarrow -6 + 0.05h + 3.5 = 0 \\
h = 50
$$
The student need 50 hours to have a 50% chance of getting an A in the class.
# Question 7
Let 1 denotes "Yes" and 2 denotes "No', with equation (4.15) we have: $\mu_1 = 10, \mu_2 = 0, \pi_1 = 0.8, \pi_2 = 0.2, \sigma^2 = 36$.
Take them into equation (4.12) with $x = 4$:
```{r}
item1 = 0.8 / sqrt(2*pi) / 6 * exp(-1 / (2 * 36) * (4 - 10)^2)
item2 = 0.2 / sqrt(2*pi) / 6 * exp(-1 / (2 * 36) * (4 - 0)^2)
p1x = item1 / (item1 + item2)
p1x
```
The probability issuing a dividend this year is 75.2%.
# Question 8
For KNN with $k = 1$, the training error rate is 0%, because for any training observation, the response is the nearest predictor itself. So the test error is 36% for KNN, which is higher than that of logistic regression (30%).
So far the latter is better. But I prefer using higer $k$ value to find out better solutions.
# Question 9
## 9a
Let $p$ denotes the default probability, we have:
$$
\frac{p}{1-p} = 0.37 \Rightarrow p = \frac{0.37}{1+0.37} = 0.27
$$
## 9b
The odds of default is:
$$
\frac{0.16}{1-0.16} = 0.19
$$