# LQR: Dynamic Programming

Consider a standard finite-horizon LQR problem:
$$
\begin{aligned}
    \min_{x_{0:N},\,u_{0:N-1}} &\sum_{k=0}^{N-1} \Bigg(\frac{1}{2}x_k^\top Q x_k + \frac{1}{2}u_k^\top R u_k \Bigg) + \frac{1}{2}x_N^T P x_N \\
    \textrm{s.t.} \quad &x_{k+1} = A x_k + B u_k
\end{aligned}
$$

- Value function (cost-to-go): $\; V(x) = \frac{1}{2} x^\top \Pi x$

- Bellman equation:
$$
\begin{aligned}
    V(x_k) &= \min_{u_k} \quad \ell(x_k, u_k) + V(x_{k+1}) \\
    &= \min_{u_k} \quad \frac{1}{2}x_k^\top Q x_k + \frac{1}{2}u_k^\top R u_k + \frac{1}{2}(A x_k + B u_k)^\top \Pi_{k+1} (A x_k + B u_k)  \\
    &= \min_{u_k} \quad \frac{1}{2}u_k^\top (R + B^\top \Pi_{k+1} B) u_k + x_k^\top(A^\top \Pi_{k+1} B)u_k + \textcolor{blue}{\text{constant}} \\
    &= \min_{u_k} \quad Q(x_k, u_k)
\end{aligned}
$$

- Optimal control input $u_k^*$:
$$
\frac{\partial Q}{\partial u_k} = (R + B^\top \Pi_{k+1} B) u_k^* + B^\top \Pi_{k+1} A x_k = 0
$$
$$
\begin{aligned}
    u_k^* &= - (R + B^\top \Pi_{k+1} B)^{-1} B^\top \Pi_{k+1} A x_k \\
    & = K_k x_k
\end{aligned}
$$

- Value function at stage $k$:
$$
\begin{aligned}
    V(x_k) &= \frac{1}{2}x_k^\top Q x_k + \frac{1}{2}\red{u_k^{*}}^\top R \red{u_k^{*}} + \frac{1}{2}(A x_k + B \red{u_k^{*}})^\top \Pi_{k+1} (A x_k + B \red{u_k^{*}}) \\
    &= \frac{1}{2}x_k^\top \left(  \underbrace{Q + K_k^\top R K_k + (A + BK_k)^\top P_{k+1} (A + BK_k)}_{\Pi_k}  \right) x_k \\
    &= \frac{1}{2}x_k^\top \Pi_k x_k
\end{aligned}
$$

# LQR: KKT conditions

In [34]:
import numpy as np
from scipy.linalg import cho_factor, cho_solve
from control import dare

def lqr_backward(Q, R, Qf, N):
    """
    Backward pass of the LQR algorithm
    """
    nx = Q.shape[0]
    nu = R.shape[0]

    Ps = np.zeros((N + 1, nx, nx))
    Ks = np.zeros((N, nu, nx))
    
    Ps[-1, :, :] = Qf
    for i in range(N - 1, -1, -1):
        P_next = Ps[i + 1, :, :]        
        chofac = cho_factor(R + B.T @ P_next @ B)      
        K_k = - cho_solve(chofac, B.T @ P_next @ A)        
        P_k = Q + K_k.T @ R @ K_k + (A + B @ K_k).T @ P_next @ (A + B @ K_k)
        
        Ks[i, :, :] = K_k
        Ps[i, :, :] = P_k
                
    return Ks, Ps

In [54]:
A = np.array([
    [4 / 3, -2 / 3], 
    [1, 0]
])
B = np.array([1, 0]).reshape(-1, 1)  # TODO
R = np.array([0.001]).reshape(-1, 1)  # TODO
Q = np.array([
    [4 / 9 + 0.001, -2 / 3],
    [-2 / 3, 1 + 0.001]
])
P = Q.copy()

Ks, Ps = lqr_backward(Q, R, P, 5)

print(f"Ks[0]: {Ks[0]}")

A_cl = A + B @ Ks[0]
eigvals, _ = np.linalg.eig(A_cl)
print(f"eigvals: {eigvals}")



Ks[0]: [[-0.34556462  0.36743129]]
eigvals: [0.49388436+0.23518847j 0.49388436-0.23518847j]


In [43]:
_, _, K_inf = dare(A, B, Q, R)
K_inf = -K_inf

print(f"K_inf: {K_inf}")

K_inf: [[-0.6682962  0.6660034]]
