-
Notifications
You must be signed in to change notification settings - Fork 0
/
09_markov_models.qmd
219 lines (169 loc) · 4.98 KB
/
09_markov_models.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
title: Markov Models
date: 2024-03-07
knitr:
opts_chunk:
echo: false
message: false
warning: false
dev: ragg_png
format:
html:
mermaid:
theme: neutral
filters:
- codeblocklabel
categories:
- compling
---
```{r}
library(reticulate)
library(tidyverse)
library(gt)
```
I started [the notes on ngrams models](04_ngrams.qmd) saying that we can think of them like finite-state-automata, but with probabilities on each transition between states. Another word for this is "Markov Model."
## Markov Chains
Any system that has a set of state that you can transition between problematically could be called a Markov Chain. They have applications well beyond ngram models, and even beyond linguistics.
### Working out ngram probabilities
Let's say we had a Markov Chain with "just" three states:
```{mermaid}
stateDiagram
direction LR
[*] --> A: 0.5
[*] --> B: 0.25
[*] --> C: 0.25
A --> [*]: 0.1
A --> A: 0.2
A --> B: 0.3
A --> C: 0.4
B --> [*]: 0.9
B --> C: 0.1
C --> [*]: 0.2
C --> C: 0.7
C --> B: 0.1
```
It's going to be easier to represent all of the probabilities of transition between one state and the next as a "transition matrix"
```{r}
tibble(
from = c("start", "A", "B", "C", "end"),
start = 0,
A = c(0.5, 0.2, 0, 0, 0),
B = c(0.25, 0.3, 0, 0.1, 0),
C = c(0.25, 0.4, 0.1, 0.7, 0),
end = c(0, 0.1, 0.9, 0.2, 1)
) -> prob_table
prob_table |>
gt() |>
tab_spanner(
label = "to",
columns = start:end
)
```
### 👻 Matrix Multiplication 🤖
- What is the probability that we'll wind up in the "end state" in 3 steps?
Or, if we asked this question about sentences generated by an ngram model:
- What is the probability that a sentence will be 3 words long?
We can get at this with "matrix multiplication".
```{r}
prob_table |>
select(-1) |>
as.matrix()->
foo
```
```{python}
trans_mat = r.foo
```
```{python}
#| echo: true
import numpy as np
trans_mat
```
To get the probability of our state after the first step, we need to get that top row of probabilities. One way to do this is to represent our *current* state (the start state) like this
```{python}
#| echo: true
start_state = np.array([
1, # start
0, # A
0, # B
0, # C
0, # end
])
start_state
```
The 1 represents the probability we're in the starting position, and the 0s represent that we're in no other position. If we treat this like a "row vector", we can get the probability of the next state with matrix multiplication.
$$
[ 1, 0, 0, 0,0] \left[ \begin{array}{ccccc}
0. & 0.5 & 0.25& 0.25& 0\\
0. & 0.2 & 0.3 & 0.4 & 0.1 \\
0. & 0. & 0. & 0.1 & 0.9\\
0. & 0. & 0.1 & 0.7 & 0.2 \\
0. & 0. & 0. & 0. & 0. \end{array} \right]
$$
In matrix multipication, we multiply each *row* from the left hand side by each *column* on the right hand side, and sum the result together. In numpy, there's an operator `@` for this.
```{python}
#| echo: true
first_step = start_state @ trans_mat
first_step
```
Now, our `first_state` row vector represents the probability we're in each state. How do we get the probabilities of our location after the second step? Just multiply again!
```{python}
#| echo: true
second_step = first_step @ trans_mat
second_step
```
And for the third step:
```{python}
#| echo: true
third_step = second_step @ trans_mat
third_step
```
::: {.callout-note collapse="true"}
## Full Matrix Multiplication
```{python}
#| echo: true
trans_mat @ trans_mat @ trans_mat
```
:::
```{python}
#| echo: true
net_mat = np.array([
[0.9, 0.1],
[0.3, 0.7]
])
net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat
```
## Hidden Markov Models
"Hidden" Markov Models, or HMMs, have had a lot of applications in acoustics, specifically in Automated Speech Recognition and Forced Alignment. The core idea is that you can't directly observe the state, but you can observe something *about* the state. One analogy is to use the weather.
Let's say we are, for whatever reason, unable to directly observe the weather outside. But we *are* able to observe whether or not the sidewalk is wet. And we do know the following:
| | Wet Sidewalk | Dry Sidewalk |
|:---:|-------------:|-------------:|
| 🌧️ | 0.9 | 0.1 |
| 🌞 | 0.2 | 0.8 |
And we also know that perhaps the best predictor of weather today is what it was yesterday, so
| | 🌧️ | 🌞 |
|:---:|----:|----:|
| 🌧️ | 0.8 | 0.2 |
| 🌞 | 0.2 | 0.8 |
So we have our general Markov Model:
```{mermaid}
flowchart TB
subgraph hidden
direction LR
sunny --> rainy
sunny --> sunny
rainy --> rainy
rainy --> sunny
end
subgraph observable
sunny --> wet
sunny --> dry
rainy --> wet
rainy --> dry
end
```
If we get the sequence
```
wet wet wet dry wet dry dry dry
```
We can actually work out the probability of the weather state at each point in time.
### Application to ASR