## DP Algorithms: A closer Look

### Terminology 

Let us consider the policy evaluation formula
<br><br>

$$V_{k+1}(s)\leftarrow\sum_{a}\pi(s,a)\sum_{s^{\prime}}\mathcal{P}_{s s^{\prime}}^{a}\left[\mathcal{R}_{s s^{\prime}}^{a}+\gamma V_{k}(s^{\prime})\right]$$

![](CS6140_images/2023-09-03_06-38.png)

### Schematic View

![](CS6140_images/2023-09-03_06-40.png)

$$V(s) = \sum_{a}\pi(s,a)\sum_{s^{\prime}}\mathcal{P}_{s s^{\prime}}^{a}\left[\mathcal{R}_{s s^{\prime}}^{a}+\gamma V(s^{\prime})\right]$$

$$V_{k+1}(s)\leftarrow\sum_{a}\pi(s,a)\sum_{s^{\prime}}\mathcal{P}_{s s^{\prime}}^{a}\left[\mathcal{R}_{s s^{\prime}}^{a}+\gamma V_{k}(s^{\prime})\right]$$

### Drawbacks of DP Algorithms

- Requires full prior knowledge of the dynamics of the environment
- Can be implemented only on small or medium sized discrete state spaces
  - For large problems, DP suffers from Bellman’s curse of dimensionality
- DP uses full width back-ups
  - Every successor state and action is considered

## Model Free Prediction : Monte Carlo Methods

### MC: key Idea

$$\begin{align*}
  V^{\pi}(s)\quad & \stackrel{\mathrm{def}}{=}\quad\mathbb{E}_{\pi}(G_{t}|s_{t}=s)=\mathbb{E}_{\pi}\left(\sum_{k=0}^{\infty}\gamma^{k}r_{t+k+1}|s_{t}=s\right)  \\
  &= \quad \mathbb E_\pi[r_{t+1}, \gamma V^\pi (s_{t+1})|s_t=S]
\end{align*}$$

- To estimate the expectations use samples

### MC: Policy Evaluation 

- Goal : Evaluate $V^\pi (s)$ using experiences (or trajectories) under policy $\pi$ 
  $$s_0,a_0,r_0,s_1,a_1,r_1,\cdots,s_T$$ 
- Recall that 
  $$V^{\pi}(s) =\mathbb{E}_{\pi}(G_{t}|s_{t}=s)=\color{red}\mathbb{E}_{\pi}\left(\sum_{k=0}^{\infty}\gamma^{k}r_{t+k+1}|s_{t}=s\right) $$
- The idea is to calculate sample mean return $(G_t)$ starting from state $s$ instead of expected mean return.

### MC: Evaluation schematics

![](CS6140_images/2023-09-03_07-23.png)

- Use $G_1$ to update $V^\pi(s_1)$
- Use $G_2$ to update $V^\pi(s_2)$
- Use $G_3$ to update $V^\pi(s_3)$

### First-visit Monte Carlo Policy Evaluation



<br><br><br>
$\tiny  {\textcolor{#808080}{\boxed{\text{Reference: Dr. Vineeth, IIT Hyderabad }}}}$