# Linear Quadratic control
_Linear Quadratic (LQ)_ problem is a classical control problem where the dynamical system obeys linear dynamics and the cost function to be minimized is quadratic. 

## Dynamics
We consider a linear Gaussian dynamical system

\begin{align*}
s_{k+1}=A s_{k}+B u_{k}+ w_{k},
\end{align*}

where $s_{k} \in \mathbb{R}^{n}$ and $u_{k} \in \mathbb{R}^{m}$ are the state and the control input vectors respectively. The vector $w_{k} \in \mathbb{R}^{n}$ denotes the process noise drawn i.i.d. from a Gaussian distribution $\mathcal{N}(\textbf{0}, W_{w})$.



## Cost function
We define the  _quadratic running cost_ as

\begin{align*}
c_{k}=s_{k}^{T} Q s_{k}+u_{k}^T R u_{k}
\end{align*}

where $Q \geq 0$ and $R>0$ are the state and the control weighting matrices respectively. The _average cost associated with the policy_ $\pi =K s_{k}$ is defined by

\begin{align*}
\lambda=\lim_{\tau \rightarrow \infty} \frac{1}{\tau} \mathbf{E}[ \sum_{t=1}^{\tau} c_{t} ]
\end{align*}

which does not depend on the initial state of the system. For the linear system, we define the _value function_ associated with a given policy $\pi=K s_{k}$ as

\begin{align*}
V(s_{k})=\mathbf{E}[\sum_{t=k}^{+\infty} (c_{t}-\lambda) | s_{k}].
\end{align*}

## Solvability Criterion
We aim to find the policy $\pi^{*}=K^{*} s_{k}$ to minimize $V(s_{k})$.

## Why the LQ is an interesting setup?
But why do we consider to solve an LQ problem with RL when we can simply estimate the linear model?

* The LQ problem has a celebrated closed-form solution. It is an ideal benchmark for studying the RL algorithms because we know the exact analytical solution so we can compare RL algorithms against the analytical solution and see how good they are.
* It is theoretically tractable.
* It is practical in various engineering domains.

## Reinforcement Learning to Control a cartpole
We apply two basic RL routines; namely _Policy Gradient (PG)_ and $Q$-_learning_ for the LQ problem. Here is a list of related files to study:

* [Prepare a virtual environment.](Preparation.ipynb) If you have prepared the virtual environment previously, skip this step but remember to select it as the Kernel when running codes in Jupyter notebooks.
* [Policy Gradient on LQ](pg_on_lq_notebook.ipynb)
* [$Q$-learning on LQ](q_on_lq_notebook.ipynb)