# Chapter 2: Multi-armed Bandits

## 1. *k*-armed Bandit Problem
- Simplest RL problem with only single state
- Set of `k` options (*actions*)
- At each time step $t$, choose an *action* $A_t$, then receive a *reward* $R_t \in \mathbb R$
- Expected reward (true *value* ) of action $a$ is $q_*(a)=E[R_t | A_t=a]$
- The true values and distribution are unknown
- Need estimate with estimated value $Q_t(a) \approx q_*(a)$
- Goal is maximize the expected total reward

## 2. Exploration vs Exploitation
- Greedy Action at time $t$ is $A_t^* =\arg\max\limits_a Q_t(a)$
- *Exploiting* if $A_t = A_t^*$
- *Exploring* if $A_t \neq A_t^*$
- Exploitaion maximizes the expected reward on the one step
- Exploration may produce the greater total reward in the long run
- Can't do both with any single action selection
- Need to balance Exploitation and Exploration

## 3. Action-value Methods
- Estimate the values of actions and use the estimates to make action selection decisions
- *sample-average* method:
$$Q_t(a)=\dfrac{\sum_{i=1}^{t-1}R_i \cdot \mathbb 1_{A_i=a}}{\sum_{i=1}^{t-1}\mathbb 1_{A_i=a}}$$
- $Q_t(a)$ coverages to $q_*(a)$ by the law of large numbers :
$$\lim\limits_{N_t(a)\rightarrow\infty}Q_t(a)=q_*(a)$$

## 4. ε-greedy Methods
- Usually select greedy actions
- Random pick an action (includes non-greedy actions) with probability `ε`
- Every action may be selected, all the $Q_t(a)$ can coverage to $q_*(a)$
- Possible to reduce `ε` over time to try to get the best of both high and low values

***************************************
Initialize, for $a = 1$ to $k$:
$$
\begin{aligned}
Q(a) & \leftarrow 0
\\
N(a) & \leftarrow 0
\end{aligned}
$$

Loop forever:
$$
\begin{aligned}
A & \leftarrow
    \begin{cases}
        \arg\max_a Q(a) &\text{with probability }(1-\epsilon)
        \\
        \text{a random action} &\text{with probability }\epsilon
    \end{cases}
\\
R & \leftarrow \text{bandit}(A)
\\
N(a) & \leftarrow N(a) + 1
\\
Q(a) & \leftarrow Q(a) + \dfrac{1}{N(A)}[R-Q(A)]
\end{aligned}  
$$
***************************************
