In [1]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_rl
%set_random_seed 12

In [2]:
%presentation_style

In [3]:
%load_latex_macros


$\newcommand{\vect}[1]{{\mathbf{\boldsymbol{#1}} }}$
$\newcommand{\amax}{{\text{argmax}}}$
$\newcommand{\P}{{\mathbb{P}}}$
$\newcommand{\E}{{\mathbb{E}}}$
$\newcommand{\R}{{\mathbb{R}}}$
$\newcommand{\Z}{{\mathbb{Z}}}$
$\newcommand{\N}{{\mathbb{N}}}$
$\newcommand{\C}{{\mathbb{C}}}$
$\newcommand{\abs}[1]{{ \left| #1 \right| }}$
$\newcommand{\simpl}[1]{{\Delta^{#1} }}$


<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Recent Developments in Control Theory</div>

# Introduction

So far we have but we have only considered deterministic systems with no noise or disturbances.

In this part of the training, we will focus on stochastic systems and how MPC can be used to handle such a class of systems.

We will also see how MPC can be extended to incoporate learning based on data and how it can be combined with Reinforcement Learning to make systems safer.

# Stochastic Optimal Control Problem

The methods discussed in this part deal with the problem of controlling dynamical systems
that are subject to system constraints under uncertainty, which can affect numerous parts of the
problem formulation. The system dynamics in discrete-time is given by:

$$
x_{k+1} = f_t(x_k, u_k, k, w_k, \theta_t)
$$

Where:
- $x_k \in \mathbf{R}^n$ is the system state at time $k$.
- $u_k \in \mathbf{R}^n$ is the applied input at time $k$.
- $w_k$ describes a sequence of random variables corresponding to disturbances or process noise in the system, which are often assumed to be independent and identically distributed (i.i.d.).
- $\theta_t \sim \mathcal{Q}^{\theta_t}$ is a random variable describing the parametric uncertainty of the system, which is therefore constant over time.
- The subscript $t$ is used to emphasize that these quantities represent the true system dynamics or true optimal control problem. 

The true problem therefore relates to the development of an optimal controller for a distribution of systems given by $\mathcal{Q}^{\theta_t}$ under random disturbances $w_k$.

The optimality of the controller is defined with respect to a cost or objective function. In the
presence of random model uncertainties, the cost is often defined as the expectation of a sum of
potentially time-varying stage costs of the states and inputs over a possibly infinite horizon $N$:

$$
J_t = E\left(\sum \limits_{k=0}^{T} g_t(x_k, u_k, k)\right),
$$

where the expected value is taken with respect to all random variables.

## Stochastic Predictive Control

The constrained stochastic optimal control problem can be formulated as:

$$
\begin{array}\\
J_t^* = \displaystyle\min_{\pi_{k}} & 
E\left[\sum\limits_{k=0}^{T} g_t(x_k, u_k, k)\right]
\\
\text{subject to} & x_{k + 1}= f_t(x_k, u_k, k, w_k, \theta_t)
\\
& u_k = \pi_k(x_0, \dots, x_k)\\
& \bar{W} = [w_0, \dots, w_{N - 1}] \sim \mathcal{Q}^{\bar{W}}, \theta_t \sim \mathcal{Q}^{\theta_t}\\
& P[\bar{X}] = [x_0, \dots, x_{N}] \in \bar{X}_j ) \ge p_j, \forall j = 1, \dots, n_{cx}\\
& P[\bar{U}] = [u_0, \dots, u_{N - 1}] \in \bar{U}_j ) \ge p_j, \forall j = 1, \dots , n_{cu}\\
\end{array},
$$

Optimizing over a sequence of control laws $\{\pi_k\}$, which can make use of all information in the
form of state measurements $x_k$ up to time step $k$. Problems of this form are in
general very hard to solve, and direct efforts typically rely on some form of discretization in space
and approximate dynamic programming or reinforcement learning.

A notable exception, similar to what we have seen previously, is linear systems under additive noise and quadratic stage costs in the unconstrained setting, for which an exact solution, such as the standard linear quadratic regulator (LQR).

MPC approximates the previous problem by repeatedly solving a simplified version of the
problem initialized at the currently measured state $x_k$ over a shorter horizon $N$
in a receding- horizon fashion.

We introduce the prediction model:

$$
x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k, w_{i|k}, \theta),
$$

where $f$ is the prediction dynamics. It typically aims at approximating the true dynamics but often differs, e.g.,
for computational reasons or because a succinct description of the true dynamics is unavailable.

We use the subscript $i|k$ to emphasize predictive quantities, where, e.g., $x_{i|k}$ is the i-step-ahead prediction of the state, initialized at $x_{0|k} = x_k$. 

The most widespread MPC formulations of are nominal MPC schemes, which do not consider any uncertainties in the prediction model but instead rely exclusively on the compensation of uncertainties via feedback and by re-solving the problem at the next sampling instance.

In nominal MPC, the optimization can be performed over control sequences $U = [u_{0|k}, \dots , u_{N - 1|k}]$ rather than policies, resulting in the constrained optimal control problem

$$
\begin{array}\\
J^∗ &= \displaystyle\min_{U} g_f(x_{N|k}, u_{N|k}, k + N) + \sum\limits_{i=0}^{N-1} g(x_{i|k}, u_{i|k}, i + k)\\
\text{subject to} & x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k)\\
& U = [u_{0|k}, \dots, u_{N|k}] \in U_j, \forall j = 1, \dots, n_{cu} \\
& X = [x_{0|k}, \dots, x_{N|k}] \in X_j \forall j = 1, \dots, n_{cx} \\
& x_{N|k} \in X_f\\
& x_{0|k} = x_k\\
\end{array}.
$$

The control law is then implicitly defined through the optimization problem as:

$$
π^{\text{MPC}}(x_k, k) = u_{0|k}^*,
$$

where $u_{0|k}^*$ is the first element of the computed optimal control sequence $U^∗$.

## Control Design Challenges

### Feasibility and Optimality with Short Horizons

The predictive controller plans the system's trajectory over a finite time window of length $N$, which is usually much smaller than the task duration $T$.

For a short $N$ , the controller takes only shortsighted control actions, which may be unsafe or result in
poor closed-loop performance.

For instance, in autonomous racing, a predictive controller that plans the vehicle's trajectory over a short horizon without accounting for an upcoming curve accelerate to the point that safe turning becomes infeasible.

To mitigate the effect of the shortened horizon, a particular cost $g_f(x{N|k}, u_{N|k}, k + N)$
and constraint $X_f$ on the last predicted state are imposed to approximate the cost and the effect of the
constraints over the remainder of the possibly infinite control horizon $T$.

### Policy Approximation

Satisfying input and state constraints in the presence of uncertainty.

### Chance-Constraint Approximation

Ensuring computational tractability by properly reformulating constraints and costs and parameterizing control. policies

There is no systematic and universal solution to the third challenge, and often the chosen approach is application dependent. Fortunately, the first and second challenges can be addressed by using data. 

# Data-Driven Predictive Control

Model predictive control (MPC) is an established control methodology that systematically uses forecasts to compute real-time optimal control decisions. In MPC, at each time step an optimization problem is solved over a moving horizon. The objective is to find a control policy that minimizes a predicted performance index while
satisfying operating constraints.

Uncertainty in MPC is handled by optimizing over multiple uncertain forecasts. In this case, performance index and
operating constraints take the form of functions defined over a probability space, and the resulting technique is called stochastic MPC.

In this case, we are longer searching for a deterministic control sequence but instead for a stochastic feedback control policy:

$$
u_t = \pi_t(x_t)
$$

##  Scenario MPC

# Learning-Based Model Predictive Control

Learning-based MPC addresses the automated and data-driven generation or adaptation of elements of the MPC formulation such that the control performance with respect to the desired closed-loop system behavior is improved.

The setup in which this learning takes place can be diverse. For instance, offline learning
considers the adaptation of the controller between different trials or episodes of a control task,
during which data are collected.

In methods that learn online, on the other hand, the controller is adjusted during closed-loop operation (e.g., while performing repetitive tasks) or using the data collected during one task execution.

While much of the research in learning-based MPC has focused on automatically improving the model quality, which is the most obvious component affecting MPC performance, several research efforts are addressing the formulation of the MPC problem directly or utilizing the MPC concept to satisfy constraints during learning-based control.

## Learning the system dynamics

MPC relies heavily on suitable and sufficiently accurate
model representations of the system dynamics. One path of learning-based MPC considers
the automatic adjustment of the system model, either during operation or between different
operational instances.

Many learning-based MPC techniques make use of an explicit distinction between a nominal
system model $f_n$ and an additive learned term $f_l$ accommodating uncertainty:

$$
f(x, u, k, w, \theta) = f_n(x, u, k) + f_l(x, u, k, w, \theta)
$$

## Learning the controller design

A second interesting research direction focuses less on the
prediction model and more on the remaining problem formulation, such as the employed
cost function l, the constraints X , or the terminal components l f and X f , such that the
resulting closed-loop MPC controller behaves favorably with respect to the underlying task,
i.e., the stochastic optimal control problem. We discuss these approaches in Section 4.

## MPC for safe learning

A third direction is the use of MPC techniques to derive safety
guarantees for learning-based controllers. The main idea is to decouple the optimization of
the objective function lt from the requirement of constraint satisfaction, which is addressed
using MPC techniques. We discuss this research direction in Section 5.

# Safe Learning in Robotics

<figure>
    <img src="_static/images/40_comparison_model_driven_data_driven.svg" width="90%"/>
    <figcaption>
        A comparison of model-driven, data-driven, and combined approaches.
    </figcaption>
</figure>

<figure>
    <img src="_static/images/40_safe_control_block_diagram.svg" width="80%"/>
    <figcaption>
        Block diagram representing safe learning control approaches.
    </figcaption>
</figure>

## Safety Constraints

### Safety level III: constraint satisfaction guaranteed.

The system satisfies hard constraints:

$$
c_k^j(x_k, u_k, w_k) \le 0
$$

for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level II: constraint satisfaction with high probability.

The system satisfies probabilistic constraints:

$$
P\left[c_k^j(x_k, u_k, w_k ) \le 0 \right] \ge p^j,
$$

where $P[\cdot]$ denotes the probability and $p^j \in (0, 1)$ defines the likelihood of the jth constraint
being satisfied, for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level I: constraint satisfaction encouraged

The system encourages constraint satisfaction. This can be achieved in different ways:

- One way is to add a penalty term to the objective function that discourages
  the violation of constraints with a high cost. A non-negative $\epsilon_j$ is added
  to the right-hand side of the inequality in Safety level III, for all times $k \in \{0, \dots , N\}$
  and constraint indexes $j \in \{1, \dots, n_c\}$:
  
  $$
  c_k^j(x_k, u_k, w_k) \le \epsilon_j,
  $$

  and an appropriate penalty term l () ≥ 0, with l () = 0 ⇐⇒  = 0, is added to the objective
  function. The vector  includes all elements ϵj and is an additional variable of the optimization problem.

- Another way is to provide guarantees on the expected value of the constraint but only at a trajectory level:

  $$
  J_{c^j} = E\left[ \sum\limits_{k=0}^{N-1} c_k^j(x_k, u_k, w_k) \right] \le d_j,
  $$

  where $J_{c^j}$ represents the expected total constraint cost, and $d_j$ defines the constraint threshold.

<figure>
    <img src="_static/images/40_safety_levels.svg" width="100%"/>
    <figcaption>
        Illustration of Safety Levels.
    </figcaption>
</figure>

<div>
<figure style="float: left; width: 70%;">
    <img src="_static/images/40_safe_learning_approaches.svg" width="100%"/>
</figure>
<div style="float: left; width: 20%;">
<br><br><br>Summary of safe learning control approaches.
</div>
</div>

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Thank you for the attention!</div>

# References

- [<b id="rosolia_datadriven_2018">[Rosolia, U., Zhang, X. and Borrelli, F., 2018]</b>](#rosolia_datadriven_2018-back) Rosolia, Ugo, Xiaojing Zhang, and Francesco Borrelli. [Data-driven predictive control for autonomous systems.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-060117-105215) Annual Review of Control, Robotics, and Autonomous Systems 1 (2018): 259-286.

- [<b id="hewing_learningbased_2020">[Hewing, Lukas, et al. 2020]</b>](#hewing_learningbased_2020-back) Hewing, Lukas, Kim P. Wabersich, Marcel Menner, and Melanie N. Zeilinger. [Learning-based model predictive control: Toward safe learning in control.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-090419-075625) Annual Review of Control, Robotics, and Autonomous Systems 3 (2020): 269-296.

- [<b id="brunke_safe_2022">[Brunke, Lukas, et al. 2022]</b>](#brunke_safe_2022-back) Brunke, Lukas, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig. [Safe learning in robotics: From learning-based control to safe reinforcement learning.](https://www.annualreviews.org/doi/abs/10.1146/annurev-control-042920-020211) Annual Review of Control, Robotics, and Autonomous Systems 5 (2022): 411-444.