This notebook provides examples to go along with the [textbook](http://underactuated.csail.mit.edu/lqr.html).  I recommend having both windows open, side-by-side!

[Click here](http://underactuated.csail.mit.edu/drake.html#notebooks) for instructions on how to run the notebook on Deepnote and/or Google Colab.

In [None]:
import numpy as np
from IPython.display import Math, clear_output, display
from pydrake.all import (DiscreteAlgebraicRiccatiEquation,
                         DiscreteTimeLinearQuadraticRegulator,
                         LinearQuadraticRegulator, ToLatex)

from underactuated import running_as_notebook


# Discrete-time vs continuous-time LQR

Let's compare the two solutions in our simplest of systems: the double integrator.

## Continuous time

In [None]:
def ct_demo():
    A = np.array([[0,1],[0,0]])
    B = np.array([[0],[1]])
    Q = np.identity(2)
    R = np.identity(1)

    K,S = LinearQuadraticRegulator(A,B,Q,R)
    display(Math(f"S = {ToLatex(S)}\n"))

    eigenvalues, eigenvectors = np.linalg.eig(S)
    display(Math(f"eig(S) = {ToLatex(eigenvalues)}\n"))

ct_demo()

## Discrete time

In [None]:
def dt_demo(h=1):
    A = np.array([[1,h],[0,1]])
    B = np.array([[0],[h]])
    Q = h*np.identity(2)
    R = h*np.identity(1)

    K,S = DiscreteTimeLinearQuadraticRegulator(A,B,Q,R)
    display(Math(f"S = {ToLatex(S)}\n"))
    eigenvalues, eigenvectors = np.linalg.eig(S)
    display(Math(f"eig(S) = {ToLatex(eigenvalues)}\n"))

dt_demo(h=1)

There are a few things to notice here.  First, the discrete-time solution converges to the continuous-time solution as the timestep goes to zero.  Second, the cost-to-go is always higher in the discrete-time version: you should think of the discrete time as adding an additional constraint that the control decision can only be changed once per timestep.  Adding constraints can only increase the total cost.

### Side note: Algebraic solution

In the DP chapter, I was able to give a nice closed-form solution for the continuous-time case:$$S = \begin{bmatrix} \sqrt{3} & 1 \\ 1 & \sqrt{3} \end{bmatrix}.$$  The discrete-time case is not as clean.  Even when $h=1$, the discrete-time Riccati equation, using $$S = \begin{bmatrix} a & b \\ b & c \end{bmatrix},$$ results in three equations: \begin{gather*} b^2 = 1+c \\ 1 + c + bc = a + ac \\ c^2 = 2b + a + ac. \end{gather*}  With a little work, you can reduce this to to a quadratic equation in $b$, with a solution, $b = \frac{1}{4}(1 + \sqrt{21} + \sqrt{2(3+\sqrt{21})}).$  Not so nice! 

### Side note: Exact integration

One can also use the exact integral of the linear system $e^Ah$ in the discretization, instead of the Euler discretization.  It doesn't change the basic observation.

# LQR via Fitted Value Iteration

Double integrator.  Discrete-time, infinite-horizon, discounted.  This is the "traditional" fitted value iteration, $J=x^TSx$ as the (linear) function approximator.  Is samples both $x$ and $u$, and takes an argmin over the samples $u$.

In [None]:
# Define the double integrator
timestep = 0.1
A = np.eye(2) + timestep*np.array([[0., 1.], [0., 0.]])
B = timestep*np.array([[0.], [1.]])
Q = timestep*np.eye(2)
R = timestep*np.eye(1)

def quadratic_regulator_cost(x, u):
    return (x * (Q @ x)).sum(axis=0) + (u * (R @ u)).sum(axis=0)

def FittedValueIteration(S_guess, gamma=0.9):

    S_optimal = DiscreteAlgebraicRiccatiEquation(A=np.sqrt(gamma) * A,
                                                 B=B,
                                                 Q=Q,
                                                 R=R / gamma)

    x1s = np.linspace(-5,5,31)
    x2s = np.linspace(-4,4,31)
    us = np.linspace(-1,1,9)
    Us, X1s, X2s = np.meshgrid(us, x1s, x2s, indexing='ij')
    XwithU = np.vstack((X1s.flatten(), X2s.flatten()))
    UwithX = Us.flatten().reshape(1,-1)
    Nx = x1s.size * x2s.size
    X = XwithU[:,:Nx]
    N = X1s.size

    Xnext = A @ XwithU + B @ UwithX
    G = quadratic_regulator_cost(XwithU, UwithX)
    Jnext = np.zeros((1,N))
    Jd = np.zeros((1,Nx))

    def cost_to_go(S, X):
        return (X * (S @ X)).sum(axis=0)  # vectorized quadratic form

    def mean_bellman_residual(S, X, J_desired):
        N = J_desired.size
        err = cost_to_go(S, X) - J_desired
        loss = np.mean(err**2) # == 1/N ∑ᵢ [tr(Sxᵢxᵢ')-yᵢ]²
        # dloss_dS = 2/N ∑ᵢ errᵢ*xᵢxᵢ' = 2/N X*Diag(err)*X'
        dloss_dS = 2/N * X @ np.diag(err.reshape(-1,)) @ X.T
        return loss, dloss_dS

    S = S_guess
    eta = 0.0001
    last_loss = np.inf
    for epoch in range(1000 if running_as_notebook else 2):
        Jnext = cost_to_go(S, Xnext)
        Jd[:] = np.min((G + gamma*Jnext).reshape(us.size, Nx), axis=0)
        for i in range(10):
            loss, dloss_dS = mean_bellman_residual(S, X, Jd)
            S -= eta*dloss_dS
        clear_output(wait=True)
        print(f"epoch {epoch}: loss = {loss}, S = {S.flatten()}")
        if np.linalg.norm(last_loss - loss) < 1e-8:
            break
        last_loss = loss

    print(f"eigenvalues of S: {np.linalg.eig(S)[0]}")
    print(f"optimal S= {S_optimal.flatten()}")


FittedValueIteration(S_guess=np.eye(2), gamma=0.9)

With the initial guess of $S$ and the discount factor $\gamma$ that I've used above, the algorithm converges nicely.  You'll see that it is close to the optimal $S$, but not exactly the optimal $S$.

Interestingly, if you choose $\gamma$ to be much closer to 1, the algorithm will diverge. At some point, the sampled values over $u$ cause problems -- the Bellman equation does not actually have a steady-state solution for the discretized $u$ version of the problem (the cost-to-go is infinite).

To convince you that this is indeed the problem, here is a version that solves analytically for the optimal $u$ (given the estimated $S$).  It draws samples only over $x$, and converges beautifully to the true optimum, even when $\gamma=1$.

In [None]:
def FittedValueIteration(S_guess, gamma=0.9):

    S_optimal = DiscreteAlgebraicRiccatiEquation(A=np.sqrt(gamma) * A,
                                                 B=B,
                                                 Q=Q,
                                                 R=R / gamma)

    x1s = np.linspace(-5,5,31)
    x2s = np.linspace(-4,4,31)
    X1s, X2s = np.meshgrid(x1s, x2s, indexing='ij')
    X = np.vstack((X1s.flatten(), X2s.flatten()))
    N = X1s.size

    Jd = np.zeros((1,N))

    def cost_to_go(S, X):
        return (X * (S @ X)).sum(axis=0)  # vectorized quadratic form

    def policy(S, X):
        return -gamma*np.linalg.inv(R + gamma*B.T @ S @ B) @ B.T @ S @ A @ X

    def mean_bellman_residual(S, X, J_desired):
        N = J_desired.size
        err = cost_to_go(S, X) - J_desired
        loss = np.mean(err**2) # == 1/N ∑ᵢ [tr(Sxᵢxᵢ')-yᵢ]²
        # dloss_dS = 2/N ∑ᵢ errᵢ*xᵢxᵢ' = 2/N X*Diag(err)*X'
        dloss_dS = 2/N * X @ np.diag(err.reshape(-1,)) @ X.T
        return loss, dloss_dS

    S = S_guess
    eta = 0.0001
    last_loss = np.inf
    for epoch in range(1000 if running_as_notebook else 2):
        U = policy(S, X)
        Xnext = A @ X + B @ U
        G = quadratic_regulator_cost(X, U)
        Jd = G + gamma*cost_to_go(S, Xnext)
        for i in range(10):
            loss, dloss_dS = mean_bellman_residual(S, X, Jd)
            S -= eta*dloss_dS
        clear_output(wait=True)
        print(f"epoch {epoch}: loss = {loss}, S = {S.flatten()}")
        if np.linalg.norm(last_loss - loss) < 1e-8:
            break
        last_loss = loss

    print(f"eigenvalues of S: {np.linalg.eig(S)[0]}")
    print(f"optimal S= {S_optimal.flatten()}")
    return S


FittedValueIteration(S_guess=np.eye(2), gamma=1);