Skip to content

Latest commit

 

History

History
21 lines (16 loc) · 1.62 KB

simple_lqr.md

File metadata and controls

21 lines (16 loc) · 1.62 KB

Simple Regulator

Problem, State and Action Space

The simple regulator problem is a simple linear quadratic regulator problem with a single state. It is an MDP with a single real-valued state and a single real-valued action.

Transitions and Rewards

Transitions are linear Gaussian such that a successor state s' is drawn from the Gaussian distribution \mathcal{N}(s + a, 0.12). Rewards are quadratic, R(s, a) = −s^2, and do not depend on the action. The examples in this text use the initial state distribution \mathcal{N}(0.3, 0.12). �

Optimal Policies

Optimal finite-horizon policies cannot be derivied using the methods from section 7.8 of Algorithms for Decision Making. In this case, T_s = [1], T_a = [1], R_s = [−1], R_a = [0] and w is drawn from \mathcal{N}(0, 0.12). Applications of the Riccati equation require that R_a be negative definite, which it is not.

The optimal policy is \pi(s) = −s, resulting in a successor state distribution centered at the origin. In the policy gradient chapters we often learn parameterized policies of the form \pi_{\theta}(s) = \mathcal{N}(\theta_1 s, \theta_2^2). In such cases, the optimal parameterization for the simple regulator problem is \theta_1 = −1 and \theta_2 is asymptotically close to zero.

The optimal value function for the simple regulator problem is also centered about the origin, with reward decreasing quadratically:

$$\begin{aligned} \mathcal{U}(s) &= -s^2 + \frac{\gamma}{1 + \gamma} \mathbb{E}_{s \sim \mathcal{N}(0, 0.1^2)} \left[-s^2 \right] \\\ &\approx -s^2 - 0.010\frac{\gamma}{1 + \gamma} \end{aligned}$$