# Linear Quadratic control
_Linear Quadratic (LQ)_ problem is a classical control problem where the dynamical system obeys a linear dynamics and the cost function to be minimized is quadratic. 

## Dynamics
We consider a linear Gaussian dynamical system
\begin{align}
&x_{k+1}=A x_{k}+B u_k+ w_k
\label{Eq:x},
\end{align}

where $x_k \in \mathbb{R}^{n}$ and $u_k \in \mathbb{R}^{m}$ are the state and the control input vectors respectively. The vector $w_k \in \mathbb{R}^{n}$ denotes the process noise drawn i.i.d. from a Gaussian distribution $\mathcal{N}(\textbf{0},W_w)$.



## Reward 
We define the  _quadratic running cost_ as
\begin{align}
r(y_k,u_k)=y_k^T R_y y_k+u_k^T R_u u_k
\label{Eq:RunCost}
\end{align}
where $R_y \geq 0$ and $R_u>0$ are the output and the control weighting matrices respectively. The _average cost associated with the policy $u_k=K y_k$_ is defined by
\begin{align}
\lambda(K)=\lim_{\tau \rightarrow \infty} \frac{1}{\tau} \mathbf{E}[ \sum_{t=1}^{\tau} r(y_t,K y_t) ]
\label{Eq:Lam:pi}
\end{align}
which does not depend on the initial state of the system. For the linear system, we define the \textit{value function} associated with a given policy $u_k=K y_k$
\begin{align}
V(y_k,K)=\mathbf{E}[\sum_{t=k}^{+\infty} (r(y_t,K y_t)-\lambda(K)) | y_k].
\label{Eq:V:differntial}
\end{align}

## Solvability Criterion
The problem is solved if the controller policy $K^* y_k$ minimizes $V(y_k, K)$.

## Why the LQ is an interesting setup?
But why do we consider to solve an LQ problem with RL when we can simply estimate the linear model?

* The LQ problem has a celebrated closed-form solution. It is an ideal benchmark for studying the RL algorithms because we know the exact analytical solution so we can see how good is an RL algorithm.
* It is theoretically tractable.
* It is practical in various engineering domains.

## Reinforcement Learning to Control a cartpole
We apply two basic RL routines; namely _Policy Gradient (PG)_ and $Q$-_learning_ for the LQ problem. The model-
Here is a list of related fiels to study:

* [Prepare a virtual environment.](Preparation.ipynb) If you have prepared the virtual environment previously, skip this step but remember to select it as the Kernel when running code in Jupyter notebook.
* [Policy Gradient on LQ](pg_on_lq_notebook.ipynb)
* [$Q$-learning on LQ](q_on_lq_notebook.ipynb)