-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
112 lines (93 loc) · 3.69 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title : Multi-armed bandits
subtitle : Algorithm comparison
author : Mojmir Vinkler
job :
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
knit : slidify::knit2slides
github :
user: Marigold
repo: slidifyCoursera
---
## Multi-armed-bandit problem
> In probability theory, the **multi-armed bandit problem** is the problem a gambler faces at a row of slot machines, sometimes known as "one-armed bandits", when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.
### Aims of this presentation
> 1. Present two possible solutions to this problem.
> 2. Compare them using real data.
--- .class #id
## General logic
1. Set wins and loses of all arms to 1 (`prior distribution`)
2. Run function which takes prior wins and loses as input and outputs an arm to pull
3. Pull that arm and update its wins and loses (`posterior distribution`)
4. Go back to step 2 and repeat
```{r echo=FALSE}
mab = function(which_arm, arm_probs, game_rounds){
# step 1. - assign priors
priors = rep(list(c(1,1)), length(arm_probs))
score = c()
# step 4. - repeat for game rounds
for(i in 1:game_rounds){
# step 2. - choose arm
chosen_arm = which_arm(priors)
# pull the lever and see if we won
if(arm_probs[chosen_arm] > runif(1)){
q = c(1,0)
} else {
q = c(0,1)
}
score = c(score, q[1])
# step 3. - update posterior
priors[[chosen_arm]] = priors[[chosen_arm]] + q
}
return(score=score)
}
```
We will call logic with `mab` function. Following is an example of algorithm which always pulls first arm.
```{r echo=T}
arm_probs = c(0.05, 0.1, 0.2) # arm true probabilities (unobserved)
game_rounds = 400 # number of rounds to play
which_arm = (function(posterior) sample(1:3,1)) # function which chooses an arm by random
reward_naive = mab(which_arm, arm_probs, game_rounds)
cat('Total reward', sum(reward_naive))
```
---
## Epsilon-greedy strategy
The best lever is selected for a proportion $1 - \epsilon$ of the trials, and a lever is selected at random (with uniform probability) for a proportion $\epsilon$.
```{r echo=T}
epsilon_greedy = function(posterior, epsilon=0.1){
if(epsilon < runif(1)){
return(sample(1:length(posterior), 1))
} else {
return(which.max(lapply(posterior, function(e) e[1] / (e[1] + e[2]))))
}
}
reward_eps = mab(epsilon_greedy, arm_probs, game_rounds)
cat('Total reward from epsilon-greedy', sum(reward_eps))
```
---
## Bayesian bandits
An idea of bayesian bandit is that number of pulls for a given lever should match its actual probability of being the optimal lever.
```{r echo=T}
bayesian_bandit = function(posterior){
p = c()
for(arm in posterior){
p = c(p, rbeta(1, arm[1], arm[2]))
}
return(which.max(p))
}
reward_bayes = mab(bayesian_bandit, arm_probs, game_rounds)
cat('Total reward from bayesian bandits', sum(reward_bayes))
```
---
## Comparison
```{r echo=F, fig.align = 'center'}
library(ggplot2)
df = data.frame(reward=c(cumsum(reward_naive), cumsum(reward_eps), cumsum(reward_bayes)),
algo=rep(c('Naive', 'Epsilon', 'Bayes'), each=game_rounds),
round=rep(1:game_rounds, 3))
print(qplot(round, reward, data=df, color=algo))
```