# Linear Quadratic control
_Linear Quadratic (LQ)_ problem is a classical control problem where the dynamical system obeys linear dynamics and the cost function to be minimized is quadratic. 

## Dynamics
We consider a linear Gaussian dynamical system

\begin{align*}
s_{t+1}=A s_{t}+B u_{t}+ w_{t},
\end{align*}

where $s_{t} \in \mathbb{R}^{n}$ and $u_{t} \in \mathbb{R}^{m}$ are the state and the control input vectors respectively. The vector $w_{t} \in \mathbb{R}^{n}$ denotes the process noise drawn i.i.d. from a Gaussian distribution $\mathcal{N}(\textbf{0}, W_{w})$.



## Cost function
We define the  _quadratic running cost_ as

\begin{align*}
c_{t}=s_{t}^{\dagger} Q s_{t}+u_{t}^{\dagger} R u_{t}
\end{align*}

where $Q \geq 0$ and $R>0$ are the state and the control weighting matrices respectively and we have used $\dagger$ to represent transpose. It is enough to consider the reward as $ r_t = - c_t$. In the literature related to LQ problem, it is common to use and derive equations based on cost rather than reward. So, here we follow the same trend as the literature and use $c_t$.


## Solvability Criterion
Define the _value function_ associated with a given policy $\pi=K s_{t}$ as

\begin{align*}
V(s_{t})=\mathbf{E}[\sum_{k=t}^{+\infty} (c_{k}-\lambda) |s_{t}]=\mathbf{E}[c_{t}-\lambda + c_{t+1}- \lambda + ... | s_{t}]
\end{align*}
where $\lambda$ is the _average cost associated with the policy_ $\pi =K s_{t}$ 

\begin{align*}
\lambda=\lim_{T \rightarrow \infty} \frac{1}{T}  \sum_{t=1}^{T} c_{t} .
\end{align*}

We aim to find the policy $\pi=K s_{t}$ to minimize $V(s_{t})$.

## Why is the LQ problem an interesting setup?
But why do we consider to solve an LQ problem with RL when we can simply estimate the linear model?

* The LQ problem has a celebrated closed-form solution. It is an ideal benchmark for studying the RL algorithms because we know the exact analytical solution so we can compare RL algorithms against the analytical solution and see how good they are.
* It is theoretically tractable.
* It is practical in various engineering domains.

## Reinforcement Learning to Control a cartpole
We apply two basic RL routines; namely _Policy Gradient (PG)_ and $Q$-_learning_ for the LQ problem. Here is a list of related files to study:

* [Prepare a virtual environment.](Preparation.ipynb) If you have prepared the virtual environment previously, skip this step but remember to select it as the Kernel when running codes in Jupyter notebooks.
* [Policy Gradient on LQ](pg_on_lq_notebook.ipynb)
* [$Q$-learning on LQ](q_on_lq_notebook.ipynb)