-
Notifications
You must be signed in to change notification settings - Fork 11.2k
/
note.tex
300 lines (203 loc) · 8.37 KB
/
note.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
\documentclass[12pt]{article}
\usepackage{mathtools}
\title{Bayes's theorem and logistic regression}
\author{Allen B. Downey}
\newcommand{\logit}{\mathrm{logit}}
\renewcommand{\P}{\mathrm{P}}
\renewcommand{\O}{\mathrm{O}}
\newcommand{\LR}{\mathrm{LR}}
\newcommand{\LO}{\mathrm{LO}}
\newcommand{\LLR}{\mathrm{LLR}}
\newcommand{\OR}{\mathrm{OR}}
\newcommand{\LOR}{\mathrm{LOR}}
\newcommand{\IF}{\mathrm{if}}
\newcommand{\notH}{\neg H}
\setlength{\headsep}{3ex}
\setlength{\parindent}{0.0in}
\setlength{\parskip}{1.7ex plus 0.5ex minus 0.5ex}
\begin{document}
\maketitle
\begin{abstract}
My two favorite topics in probability and statistics are
Bayes's theorem and logistic regression. Because there are
similarities between them, I have always assumed that there is
a connection. In this note, I demonstrate the
connection mathematically, and (I hope) shed light on the
motivation for logistic regression and the interpretation of
the results.
\end{abstract}
\section{Bayes's theorem}
I'll start by reviewing Bayes's theorem, using an example that came up
when I was in grad school. I signed up for a class on Theory of
Computation. On the first day of class, I was the first to arrive. A
few minutes later, another student arrived. Because I was expecting
most students in an advanced computer science class to be male, I was
mildly surprised that the other student was female. Another female
student arrived a few minutes later, which was sufficiently
surprising that I started to think I was in the wrong room. When
another female student arrived, I was confident I was in the wrong
place (and it turned out I was).
As each student arrived, I used the observed data to update my
belief that I was in the right place. We can use Bayes's theorem to
quantify the calculation I was doing intuitively.
I'll us $H$ to represent the hypothesis that I was in the right
room, and $F$ to represent the observation that the first other
student was female. Bayes's theorem provides an algorithm for
updating the probability of $H$:
\[ \P(H|F) = \P(H)~\frac{\P(F|H)}{P(F)}\]
Where
\begin{itemize}
\item $\P(H)$ is the prior probability of $H$ before the other
student arrived.
\item $\P(H|F)$ is the posterior probability of $H$, updated based
on the observation $F$.
\item $\P(F|H)$ is the likelihood of the data, $F$, assuming that
the hypothesis is true.
\item $P(F)$ is the likelihood of the data, independent of $H$.
\end{itemize}
Before I saw the other students, I was confident I was in the right
room, so I might assign $\P(H)$ something like 90\%.
When I was in grad school most advanced computer science classes were
90\% male, so if I was in the right room, the likelihood of the
first female student was only 10\%. And the likelihood of three
female students was only 0.1\%.
If we don't assume I was in the right room, then the likelihood of
the first female student was more like 50\%, so the likelihood
of all three was 12.5\%.
Plugging those numbers into Bayes's theorem yields $\P(H|F) = 0.64$
after one female student, $\P(H|FF) = 0.26$ after the second,
and $\P(H|FFF) = 0.07$ after the third.
\section{Logistic regression}
Logistic regression is based on the following functional form:
\[ \logit(p) = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n \]
where the dependent variable, $p$, is a probability,
the $x$s are explanatory variables, and the $\beta$s are
coefficients we want to estimate. The $\logit$ function is the
log-odds, or
\[ \logit(p) = \ln \left( \frac{p}{1-p} \right) \]
When you present logistic regression like this, it raises
three questions:
\begin{itemize}
\item Why is $\logit(p)$ the right choice for the dependent
variable?
\item Why should we expect the relationship between $\logit(p)$
and the explanatory variables to be linear?
\item How should we interpret the estimated parameters?
\end{itemize}
The answer to all of these questions turns out to be Bayes's
theorem. To demonstrate that, I'll use a simple example where
there is only one explanatory variable. But the derivation
generalizes to multiple regression.
On notation: I'll use $\P(H)$ for the probability
that some hypothesis, $H$, is true. $\O(H)$ is the odds of the same
hypothesis, defined as
\[ \O(H) = \frac{\P(H)}{1 - \P(H)} \]
I'll use $\LO(H)$ to represent the log-odds of $H$:
\[ \LO(H) = \ln \O(H) \]
I'll also use $\LR$ for a likelihood ratio, and $\OR$ for an odds
ratio. Finally, I'll use $\LLR$ for a log-likelihood ratio, and
$\LOR$ for a log-odds ratio.
\section{Making the connection}
To demonstrate the connection between Bayes's theorem and
logistic regression, I'll start with the odds form
of Bayes's theorem. Continuing the previous example,
I could write
\begin{equation} \label{A}
\O(H|F) = \O(H)~\LR(F|H)
\end{equation}
where
\begin{itemize}
\item $\O(H)$ is the prior odds that I was in the right room,
\item $\O(H|F)$ is the posterior odds after seeing one female student,
\item $\LR(F|H)$ is the likelihood ratio of the data, given
the hypothesis.
\end{itemize}
The likelihood ratio of the data is:
\[ \LR(F|H) = \frac{\P(F|H)}{\P(F|\notH)} \]
where $\notH$ means $H$ is false.
Noticing that logistic regression is expressed in terms of
log-odds, my next move is to write the log-odds form of
Bayes's theorem by taking the log of Eqn~\ref{A}:
\begin{equation} \label{B}
\LO(H|F) = \LO(H) + \LLR(F|H)
\end{equation}
If the first student to arrive had been male, we would write
\begin{equation} \label{C} \nonumber
\LO(H|M) = \LO(H) + \LLR(M|H)
\end{equation}
Or more generally if we use $X$ as a variable to represent
the sex of the observed student, we would write
\begin{equation} \label{D}
\LO(H|X) = \LO(H) + \LLR(X|H)
\end{equation}
I'll assign $X=0$ if the observed student is female and
$X=1$ if male. Then I can write:
\begin{equation} \label{E} \nonumber
\LLR(X|H) = \left\{
\begin{array}{lr}
\LLR(F|H) & \IF ~X = 0\\
\LLR(M|H) & \IF ~X = 1
\end{array}
\right.
\end{equation}
Or we can collapse these two expressions into one by using
$X$ as a multiplier:
\begin{equation} \label{F}
\LLR(X|H) = \LLR(F|H) + X [\LLR(M|H) - \LLR(F|H)]
\end{equation}
\section{Odds ratios}
The next move is to recognize that
the part of Eqn~\ref{F} in brackets is the log-odds ratio
of $H$. To see that, we need to look more closely at odds ratios.
Odds ratios are often used in medicine to describe the association
between a disease and a risk factor. In the example scenario, we
can use an odds ratio to express the odds of the hypothesis
$H$ if we observe a male student, relative to the odds if we
observe a female student:
\[ \OR_X(H) = \frac{\O(H|M)}{\O(H|F)} \]
I'm using the notation $\OR_X$ to represent the odds ratio
associated with the variable $X$.
Applying Bayes's theorem to
the top and bottom of the previous expression yields
\[ \OR_X(H) = \frac{\O(H)~\LR(M|H)}{\O(H)~\LR(F|H)} =
\frac{\LR(M|H)}{\LR(F|H)}\]
Taking the log of both sides yields
\begin{equation} \label{G}
\LOR_X(H) = \LLR(M|H) - \LLR(F|H)
\end{equation}
This result should look familiar, since it appears in
Eqn~\ref{F}.
\section{Conclusion}
Now we have all the pieces we need; we just have to assemble them.
Combining Eqns~\ref{F} and \ref{G} yields
\begin{equation} \label{H}
\LLR(H|X) = \LLR(F) + X~\LOR(X|H)
\end{equation}
Combining Eqns~\ref{D} and \ref{H} yields
\begin{equation} \label{I}
\LO(H|X) = \LO(H) + \LLR(F|H) + X~\LOR(X|H)
\end{equation}
Finally, combining Eqns~\ref{B} and \ref{I} yields
\[ \LO(H|X) = \LO(H|F) + X~\LOR(X|H) \]
We can think of this equation as the log-odds form of Bayes's theorem,
with the update term expressed as a log-odds ratio. Let's compare
that to the functional form of logistic regression:
\[ \logit(p) = \beta_0 + X \beta_1 \]
The correspondence between these equations suggests the following
interpretation:
\begin{itemize}
\item The predicted value, $\logit(p)$, is the posterior log
odds of the hypothesis, given the observed data.
\item The intercept, $\beta_0$, is the log-odds of the
hypothesis if $X=0$.
\item The coefficient of $X$, $\beta_1$, is a log-odds ratio
that represents odds of $H$ when $X=1$, relative to
when $X=0$.
\end{itemize}
This relationship between logistic regression and Bayes's theorem
tells us how to interpret the estimated coefficients. It also
answers the question I posed at the beginning of this note:
the functional form of logistic regression makes sense because
it corresponds to the way Bayes's theorem uses data to update
probabilities.
\end{document}