# Value functions and Bellman Equations

> In this post, we will learn the Value functions and Bellman equations. This is the summary of lecture "Fundamentals of Reinforcement Learning" from Coursera.

- toc: true 
- badges: true
- comments: true
- author: Chanseok Kang
- categories: [Python, Coursera, Reinforcement_Learning]
- image: 

## Specifying Policies

Policy maps the current state onto a set of probabilities for taking each action.
Policies can only depend on the current state.

### Deterministic Policy

$$ \pi(s) = a $$

A policy that maps each state to a single action.

![deterministic](image/deterministic.png)

### Stochastic policy 

$$ \pi(a \vert s) $$

Follows some basic rules

* $ \sum_{a \in \mathcal{A}(s)} \pi(a \vert s) = 1 $
* $ \pi(a \vert s) \ge 0 $

![stochastic](image/stochastic.png)

## Value functions

### State-value functions

**state-value function** is the future reward an agent can expect to receive starting from a particular state. That is, the expected return from given state.

$$ v_{\pi}(s) \doteq \mathbb{E}_{\pi} [G_t \vert S_t = s] $$

Note that expected return is,

$$ G_t = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} $$

### Action-value functions

An action value describes what happens when the agent first selects a particular action. More formally, the action value of a state is the expected return if the agent selects action $a$ and then follows policy $\pi$.

$$ q_{\pi}(s, a) \doteq \mathbb{E}_{\pi} [G_t \vert S_t = s, A_t = a] $$

## Bellman Equation Derivation

### State-value Bellman equation

$$ \begin{aligned} v_{\pi}(s) &\doteq \mathbb{E}_{\pi}[G_t \vert S_t = s] \\
&= \mathbb{E}_{\pi}[R_{t+1} + \gamma G_{t+1} \vert S_t = s] \\
&= \sum_{a} \pi(a \vert s) \sum_{s'} \sum_{r} p(s', r \vert s, a) \big[r + \gamma \mathbb{E}_{\pi}[G_{t+1} \vert S_{t+1} = s']\big] \\
&= \sum_{a} \pi(a \vert s) \sum_{s'} \sum_{r} p(s', r \vert s, a) \Big[ r + \gamma \sum_{a} \pi(a' \vert s') \sum_{s''} \sum_{r'} p(s'', r' \vert s', a') \big[ r' + \gamma \mathbb{E}_{\pi}[G_{t+2} \vert S_{t+2} = s''] \big] \Big] \\
&= \sum_{a} \pi(a \vert s) \sum_{s'} \sum_{r} p(s', r \vert s, a) [r + \gamma v_{\pi}(s')]\end{aligned} $$

### Action-value Bellman equation

$$ \begin{aligned} q_{\pi}(s, a) &\doteq \mathbb{E}_{\pi}[G_t \vert S_t = s, A_t = a] \\
&= \sum_{s'} \sum_{r} p(s', r \vert s, a) \big[ r + \gamma \mathbb{E}_{\pi} [G_{t+1} \vert S_{t+1} = s'] \big] \\
&= \sum_{s'} \sum_{r} p(s', r \vert s, a) \big[ r + \gamma \sum_{a'} \pi (a' \vert s') \mathbb{E}_{\pi} [G_{t+1} \vert S_{t+1} = s', A_{t+1} = a'] \big] \\
&= \sum_{a'} \sum_{r} p(s', r \vert s, a) \big[ r+ \gamma \sum_{a'} \pi (a' \vert s') q_{\pi}(s', a') \big] \end{aligned} $$