# Variance Analysis in Multistep Bootstrapping

In order to understand the $Q(\sigma)$ algorithm better, we will analyze and compare its performance to other multistep bootstrapping methods. Since $Q(\sigma)$ is inherently an off-policy algorithm (due to its ties to the n-step tree backup algorithm), we thus compare it to the off-policy variants of n-step SARSA, n-step expected SARSA, and to the n-step tree backup algorithm.

## Bootstrapping multistep methods
As [1] notes, bootstrapping methods work best when applied over multiple time steps, such that a significant and recognizable state change can be observed. From [1] also, note that multistep methods are usually associated with eligibility traces, but in this benchmark we consider only the multistep component of these algorithms.

## Theory
### N-step SARSA and N-step expected SARSA [1]
The regular SARSA(0) algorithm learns the value of state-action pairs and, after each transition from a nonterminal state $S_t$ applies the update equation:
$$ Q(S,A) \leftarrow Q(S,A) + \alpha [ R + \gamma Q(S', A') - Q(S,A)]$$

In off-policy n-step SARSA, we still learn the value of state-action pairs but only update our estimate after $n$ steps, weighing the update using the importance sampling ratio $\rho_{t}^{t+n}$ defined as the relative probability under the two policies of taking the n actions from $A_t$ to $A_{t+n-1}$. We thus use the update equation:

$$ Q_{t+n}(S_t,A_t) = Q_{t+n-1}(S_t,A_t) + \alpha \rho_{t+1}^{t+n} [ G_t^{(n)} - Q_{t+n-1}(S_t,A_t)] $$

with $G_t^{(n)}$ the n-step return defined in terms of estimated action values as:  

$$ G_t^{(n)} = R_{t+1} + \gamma R_{t+2} \cdots + \gamma^{n-1} R_{t+n} + \gamma^n Q_{t+n-1}(S_{t+n},a) , n \geq 1, 0\leq t \leq T-n$$


For off-policy n-step expected SARSA, we simply change the computation of the n-step return to:
$$ G_t^{(n)} = R_{t+1} + \cdots + \gamma^{n-1} R_{t+n} + \gamma^n \sum_a \pi(a|S_{t+n} Q_{t+n-1}(S_{t+n},a) , n \geq 1, 0\leq t \leq T-n$$ 
### The $Q(\sigma)$ algorithm


## Citations and Footnotes
[1] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
