## Approximate Dynamic Programming Example

We consider an optimal control problem that is similar to that described in the [model predictive control example](https://github.com/cvxgrp/codegen/blob/main/examples/MPC.ipynb). However, we assume that the system matrices are functions of the state $x \in \mathbb{R}^n$, i.e., we have the nonlinear system $\left( A(x), B(x)\right) \in \left( \mathbb{R}^{n \times n}, \mathbb{R}^{n \times m} \right)$ with control input $u \in \mathbb{R}^m$ subject to optimization. The dynamics equation $x_{i+1} = A(x_i) x_i + B(x_i) u_i$ would be a non-convex constraint. Thus, we perform Approximate Dynamic Programming (ADP) [1] by predicting just one time step ahead and approximating the infinite-horizon cost as $\left(A(x) x + B(x) u\right)^T P \left(A(x) x + B(x) u\right)$ with measurement $x$. We solve the optimization problem

\begin{equation}
\begin{array}{ll}
\text{minimize} \quad & \left(A(x) x + B(x) u\right)^T P \left(A(x) x + B(x) u\right) + u^T R u\\
\text{subject to} \quad & \Vert u \Vert_2 \leq 1,
\end{array}
\end{equation}

where $u \in \mathbb{R}^m$ is the variable and constrained within a euclidean ball of size $1$. The cost matrices are positive definite: $P, R \in \mathbb{S}_{++}^n, \mathbb{S}_{++}^m$. We reformulate the problem to be [DPP-compliant](https://www.cvxpy.org/tutorial/advanced/index.html#disciplined-parametrized-programming), i.e.,

\begin{equation}
\begin{array}{ll}
\text{minimize} \quad & \Vert f+ G u \Vert_2^2 + \Vert R^{1/2} u \Vert_2^2\\
\text{subject to} \quad &\Vert u \Vert_2 \leq 1,
\end{array}
\end{equation}

where the new set of parameters contains $f = P^{1/2} A(x) x$, $G = P^{1/2} B(x)$, and $R^{1/2}$. Let's define the corresponding CVXPY problem.

In [None]:
import cvxpy as cp

# define dimensions
n, m = 6, 3

# define variables
u = cp.Variable(m, name='u')

# define parameters
Rsqrt = cp.Parameter((m, m), name='Rsqrt')
f = cp.Parameter(n, name='f')
G = cp.Parameter((n, m), name='G')

# define objective
objective = cp.Minimize(cp.sum_squares(f+G@u) + cp.sum_squares(Rsqrt@u))

# define constraints
constraints = [cp.norm(u, 2) <= 1]

# define problem
problem = cp.Problem(objective, constraints)

Assign parameter values and solve the problem. In this case, the state $x = \left[p^T v^T\right]^T$ consists of position $p \in \mathbb{R}^3$ and velocity $v \in \mathbb{R}^3$ of some rigid body in three-dimensional space. The control input $u$ represents aerodynamic actuation. Multiplied with velocity, the force vector that acts on the body's center of mass results. Rotational dynamics are not considered. The discretization step is denoted by $t_d \in \mathbb{R}_{++}$. Air resistance forces relate to the squared velocity.

In [None]:
import numpy as np

def dynamics(x):
    
    # continuous-time dynmaics
    A_cont = np.array([[0, 0, 0, 1, 0, 0],
                       [0, 0, 0, 0, 1, 0],
                       [0, 0, 0, 0, 0, 1],
                       [0, 0, 0, -x[3], 0, 0],
                       [0, 0, 0, 0, -x[4], 0],
                       [0, 0, 0, 0, 0, -x[5]]])
    mass = 1
    B_cont = np.concatenate((np.zeros((3,3)), 
                             (1/mass)*np.diag(x[3:])), axis=0)

    # discrete-time dynamics
    td = 0.1
    A = np.eye(n)+td*A_cont
    B = td*B_cont
    
    return A, B

# cost
Rsqrt.value = np.sqrt(0.1)*np.eye(m)
Psqrt = np.eye(n)
x = np.array([2, 2, 2, -1, -1, 1])
A, B = dynamics(x)
f.value = np.matmul(Psqrt, np.matmul(A, x))
G.value = np.matmul(Psqrt, B)

val = problem.solve()

Generating C source for the problem is as easy as:

In [None]:
from cvxpygen import cpg

cpg.generate_code(problem, code_dir='ADP_code')

Now, you can use a python wrapper around the generated code as a custom CVXPY solve method.

In [None]:
from ADP_code.cpg_solver import cpg_solve
import numpy as np
import dill as pickle
import time

# load the serialized problem formulation
with open('ADP_code/problem.pickle', 'rb') as f:
    prob = pickle.load(f)

# assign parameter values
prob.param_dict['Rsqrt'].value = np.sqrt(0.1)*np.eye(m)
prob.param_dict['f'].value = np.matmul(Psqrt, np.matmul(A, x))
prob.param_dict['G'].value = np.matmul(Psqrt, B)

# solve problem conventionally
t0 = time.time()
# CVXPY chooses eps_abs=eps_rel=1e-5, max_iter=10000, polish=True by default,
# however, we choose the OSQP default values here, as they are used for code generation as well
val = prob.solve()
t1 = time.time()
print('\nCVXPY\nSolve time: %.3f ms' % (1000 * (t1 - t0)))
print('Objective function value: %.6f\n' % val)

# solve problem with C code via python wrapper
prob.register_solve('CPG', cpg_solve)
t0 = time.time()
val = prob.solve(method='CPG')
t1 = time.time()
print('\nCVXPYgen\nSolve time: %.3f ms' % (1000 * (t1 - t0)))
print('Objective function value: %.6f\n' % val)

\[1\] Wang, Yang, Brendan O'Donoghue, and Stephen Boyd. "Approximate dynamic programming via iterated Bellman inequalities." International Journal of Robust and Nonlinear Control 25.10 (2015): 1472-1496.