In [None]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_rl
%set_random_seed 12

In [None]:
%presentation_style

In [None]:
%load_latex_macros

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Recent Developments in Control Theory</div>

# Stochastic Optimal Control Problem

The methods discussed in this part deal with the problem of controlling dynamical systems
that are subject to system constraints under uncertainty, which can affect numerous parts of the
problem formulation. The system dynamics in discrete-time is given by:

$$
x_{k+1} = f_t(x_k, u_k, k, w_k, \theta_t)
$$

Where:
- $x_k \in \mathbf{R}^n$ is the system state at time $k$.
- $u_k \in \mathbf{R}^n$ is the applied input at time $k$.
- $w_k$ describes a sequence of random variables corresponding to disturbances or process noise in the system, which are often assumed to be independent and identically distributed (i.i.d.).
- $\theta_t \sim \mathcal{Q}^{\theta_t}$ is a random variable describing the parametric uncertainty of the system, which is therefore constant over time.
- The subscript $t$ is used to emphasize that these quantities represent the true system dynamics or true optimal control problem. 

The true problem therefore relates to the development of an optimal controller for a distribution of systems given by $\mathcal{Q}^{\theta_t}$ under random disturbances $w_k$.

The optimality of the controller is defined with respect to a cost or objective function. In the
presence of random model uncertainties, the cost is often defined as the expectation of a sum of
potentially time-varying stage costs of the states and inputs over a possibly infinite horizon $N$:

$$
J_t = E\left(\sum \limits_{k=0}^{N} g_t(x_k, u_k, k)\right),
$$

where the expected value is taken with respect to all random variables.

# Data-Driven Predictive Control

Model predictive control (MPC) is an established control methodology that systematically uses forecasts to compute real-time optimal control decisions. In MPC, at each time step an optimization problem is solved over a moving horizon. The objective is to find a control policy that minimizes a predicted performance index while
satisfying operating constraints.

Uncertainty in MPC is handled by optimizing over multiple uncertain forecasts. In this case, performance index and
operating constraints take the form of functions defined over a probability space, and the resulting technique is called stochastic MPC.

## Stochastic Predictive Control

## Control Design Challenges

- Ensuring recursive feasibility and achieving optimality despite a short prediction horizon.
- Satisfying input and state constraints in the presence of uncertainty.
- Ensuring computational tractability by properly reformulating constraints and costs and parameterizing control. policies

There is no systematic and universal solution to the third challenge, and often the chosen approach is application dependent. Fortunately, the first and second challenges can be addressed by using data. 

##  Data-Driven Stochastic Predictive Control

## Learning Model Predictive Control

# Learning-Based Model Predictive Control

Learning-based MPC addresses the automated and data-driven generation or adaptation of el-
ements of the MPC formulation such that the control performance with respect to the desired
closed-loop system behavior—i.e., the general optimal control problem (Equation 4)—is im-
proved. The setup in which this learning takes place can be diverse. For instance, offline learning
considers the adaptation of the controller between different trials or episodes of a control task,
during which data are collected. In methods that learn online, on the other hand, the controller is
adjusted during closed-loop operation (e.g., while performing repetitive tasks) or using the data
collected during one task execution.
While much of the research in learning-based MPC is focusing on automatically improving the
model quality, which is the most obvious component affecting MPC performance, several research
efforts are addressing the formulation of the MPC problem directly or utilizing the MPC concept
to satisfy constraints during learning-based control. In the remainder of the review, we discuss the
research in the following three categories:

## Learning the system dynamics

MPC relies heavily on suitable and sufficiently accurate
model representations of the system dynamics. One path of learning-based MPC considers
the automatic adjustment of the system model, either during operation or between different
operational instances. Section 3 provides an overview of this rather broad direction and
related issues.

Many learning-based MPC techniques make use of an explicit distinction between a nominal
system model $f_n$ and an additive learned term $f_l$ accommodating uncertainty:

$$
f(x, u, k, w, \theta) = f_n(x, u, k) + f_l(x, u, k, w, \theta)
$$

## Learning the controller design

A second interesting research direction focuses less on the
prediction model and more on the remaining problem formulation, such as the employed
cost function l, the constraints X , or the terminal components l f and X f , such that the
resulting closed-loop MPC controller behaves favorably with respect to the underlying task,
i.e., the stochastic optimal control problem. We discuss these approaches in Section 4.

## MPC for safe learning

A third direction is the use of MPC techniques to derive safety
guarantees for learning-based controllers. The main idea is to decouple the optimization of
the objective function lt from the requirement of constraint satisfaction, which is addressed
using MPC techniques. We discuss this research direction in Section 5.

# Safe Learning in Robotics

<figure>
    <img src="_static/images/40_comparison_model_driven_data_driven.svg" width="90%"/>
    <figcaption>
        A comparison of model-driven, data-driven, and combined approaches.
    </figcaption>
</figure>

<figure>
    <img src="_static/images/40_safe_control_block_diagram.svg" width="80%"/>
    <figcaption>
        Block diagram representing safe learning control approaches.
    </figcaption>
</figure>

## Safety Constraints

### Safety level III: constraint satisfaction guaranteed.

The system satisfies hard constraints:

$$
c_k^j(x_k, u_k, w_k) \le 0
$$

for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level II: constraint satisfaction with high probability.

The system satisfies probabilistic constraints:

$$
P\left[c_k^j(x_k, u_k, w_k ) \le 0 \right] \ge p^j,
$$

where $P[\cdot]$ denotes the probability and $p^j \in (0, 1)$ defines the likelihood of the jth constraint
being satisfied, for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level I: constraint satisfaction encouraged

The system encourages constraint satisfaction. This can be achieved in different ways:

- One way is to add a penalty term to the objective function that discourages
  the violation of constraints with a high cost. A non-negative $\epsilon_j$ is added
  to the right-hand side of the inequality in Safety level III, for all times $k \in \{0, \dots , N\}$
  and constraint indexes $j \in \{1, \dots, n_c\}$:
  
  $$
  c_k^j(x_k, u_k, w_k) \le \epsilon_j,
  $$

  and an appropriate penalty term l () ≥ 0, with l () = 0 ⇐⇒  = 0, is added to the objective
  function. The vector  includes all elements ϵj and is an additional variable of the optimization problem.

- Another way is to provide guarantees on the expected value of the constraint but only at a trajectory level:

  $$
  J_{c^j} = E\left[ \sum\limits_{k=0}^{N-1} c_k^j(x_k, u_k, w_k) \right] \le d_j,
  $$

  where $J_{c^j}$ represents the expected total constraint cost, and $d_j$ defines the constraint threshold.

<figure>
    <img src="_static/images/40_safety_levels.svg" width="100%"/>
    <figcaption>
        Illustration of Safety Levels.
    </figcaption>
</figure>

<div>
<figure style="float: left; width: 70%;">
    <img src="_static/images/40_safe_learning_approaches.svg" width="100%"/>
</figure>
<div style="float: left; width: 20%;">
<br><br><br>Summary of safe learning control approaches.
</div>
</div>

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Thank you for the attention!</div>

# References

- [<b id="rosolia_datadriven_2018">[Rosolia, U., Zhang, X. and Borrelli, F., 2018]</b>](#rosolia_datadriven_2018-back) Rosolia, Ugo, Xiaojing Zhang, and Francesco Borrelli. [Data-driven predictive control for autonomous systems.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-060117-105215) Annual Review of Control, Robotics, and Autonomous Systems 1 (2018): 259-286.

- [<b id="hewing_learningbased_2020">[Hewing, Lukas, et al. 2020]</b>](#hewing_learningbased_2020-back) Hewing, Lukas, Kim P. Wabersich, Marcel Menner, and Melanie N. Zeilinger. [Learning-based model predictive control: Toward safe learning in control.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-090419-075625) Annual Review of Control, Robotics, and Autonomous Systems 3 (2020): 269-296.

- [<b id="brunke_safe_2022">[Brunke, Lukas, et al. 2022]</b>](#brunke_safe_2022-back) Brunke, Lukas, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig. [Safe learning in robotics: From learning-based control to safe reinforcement learning.](https://www.annualreviews.org/doi/abs/10.1146/annurev-control-042920-020211) Annual Review of Control, Robotics, and Autonomous Systems 5 (2022): 411-444.