In [None]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_rl
%set_random_seed 12

In [None]:
%presentation_style

In [None]:
%load_latex_macros

In [None]:
%autoreload
import os
import warnings
from dataclasses import dataclass
from typing import Protocol

import casadi
import do_mpc
import mediapy as media
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from gymnasium import Env
from IPython.display import HTML
from ipywidgets import interact, widgets
from matplotlib.animation import FuncAnimation
from numpy.typing import NDArray

from training_rl.control import (
    create_inverted_pendulum_environment,
    InvertedPendulumParameters,
    animate_full_inverted_pendulum_simulation,
    simulate_environment,
    show_video
)

warnings.simplefilter("ignore", UserWarning)
sns.set_theme()
plt.rcParams["figure.figsize"] = [9, 5]
# This is needed because inside docker the rendering of mujoco environments may not work.
render_mode = "rgb_array" if os.environ.get("DISPLAY") else None

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Recent Developments in Control Theory</div>

# Recent Developments in Control Theory

- So far we have focused on deterministic systems with no noise or disturbances.
- In this part of the training, we will focus on stochastic systems and how MPC can be used to handle such systems.
- We will also see how MPC can leverage data through learning-based approaches.
- Additionally, we will explore how MPC can be combined with Reinforcement Learning to improve systems' safety.

# Stochastic Optimal Control Problem

The methods discussed in this part deal with the problem of controlling dynamical systems
that are subject to system constraints under uncertainty, which can affect numerous parts of the
problem formulation. The system dynamics in discrete-time is given by:

$$
x_{k+1} = f_t(x_k, u_k, k, w_k, \theta_t)
$$

Where:
- $x_k \in \mathbf{R}^n$ is the system state at time $k$.
- $u_k \in \mathbf{R}^n$ is the applied input at time $k$.
- $w_k$ describes a sequence of random variables corresponding to disturbances or process noise in the system, which are often assumed to be independent and identically distributed (i.i.d.).
- $\theta_t \sim \mathcal{Q}^{\theta_t}$ is a random variable describing the parametric uncertainty of the system, which is therefore constant over time.
- The subscript $t$ is used to emphasize that these quantities represent the true system dynamics or true optimal control problem. 

The true problem therefore relates to the development of an optimal controller for a distribution of systems given by $\mathcal{Q}^{\theta_t}$ under random disturbances $w_k$.

The optimality of the controller is defined with respect to a cost or objective function. In the
presence of random model uncertainties, the cost is often defined as the expectation of a sum of
potentially time-varying stage costs of the states and inputs over a possibly infinite horizon $T$:

$$
J_t = E\left(\sum \limits_{k=0}^{T} g_t(x_k, u_k, k)\right),
$$

where the expected value is taken with respect to all random variables.

## Stochastic Predictive Control

The constrained stochastic optimal control problem can be formulated as:

$$
\begin{array}\\
J_t^* = \displaystyle\min_{\pi_{k}} & 
E\left[\sum\limits_{k=0}^{T} g_t(x_k, u_k, k)\right]
\\
\text{subject to} & x_{k + 1}= f_t(x_k, u_k, k, w_k, \theta_t)
\\
& u_k = \pi_k(x_0, \dots, x_k)\\
& \bar{W} = [w_0, \dots, w_{N - 1}] \sim \mathcal{Q}^{\bar{W}}, \theta_t \sim \mathcal{Q}^{\theta_t}\\
& P[\bar{X}] = [x_0, \dots, x_{N}] \in \bar{X}_j ) \ge p_j, \forall j = 1, \dots, n_{cx}\\
& P[\bar{U}] = [u_0, \dots, u_{N - 1}] \in \bar{U}_j ) \ge p_j, \forall j = 1, \dots , n_{cu}\\
\end{array},
$$

Optimizing over a sequence of control laws $\{\pi_k\}$, which can make use of all information in the
form of state measurements $x_k$ up to time step $k$. Problems of this form are in
general very hard to solve, and direct efforts typically rely on some form of discretization in space
and approximate dynamic programming or reinforcement learning.

A notable exception, similar to what we have seen previously, is linear systems under additive noise and quadratic stage costs in the unconstrained setting, for which an exact solution, such as the standard linear quadratic regulator (LQR).

MPC approximates the previous problem by repeatedly solving a simplified version of the
problem initialized at the currently measured state $x_k$ over a shorter horizon $N$
in a receding- horizon fashion.

We introduce the prediction model:

$$
x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k, w_{i|k}, \theta),
$$

where $f$ is the prediction dynamics. It typically aims at approximating the true dynamics but often differs, e.g.,
for computational reasons or because a succinct description of the true dynamics is unavailable.

We use the subscript $i|k$ to emphasize predictive quantities, where, e.g., $x_{i|k}$ is the i-step-ahead prediction of the state, initialized at $x_{0|k} = x_k$. 

The most widespread MPC formulations of are nominal MPC schemes, which do not consider any uncertainties in the prediction model but instead rely exclusively on the compensation of uncertainties via feedback and by re-solving the problem at the next sampling instance.

In nominal MPC, the optimization can be performed over control sequences $U = [u_{0|k}, \dots , u_{N - 1|k}]$ rather than policies, resulting in the constrained optimal control problem

$$
\begin{array}\\
J^∗ &= \displaystyle\min_{U} g_f(x_{N|k}, u_{N|k}, k + N) + \sum\limits_{i=0}^{N-1} g(x_{i|k}, u_{i|k}, i + k)\\
\text{subject to} & x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k)\\
& U = [u_{0|k}, \dots, u_{N|k}] \in U_j, \forall j = 1, \dots, n_{cu} \\
& X = [x_{0|k}, \dots, x_{N|k}] \in X_j \forall j = 1, \dots, n_{cx} \\
& x_{N|k} \in X_f\\
& x_{0|k} = x_k\\
\end{array}.
$$

The control law is then implicitly defined through the optimization problem as:

$$
π^{\text{MPC}}(x_k, k) = u_{0|k}^*,
$$

where $u_{0|k}^*$ is the first element of the computed optimal control sequence $U^∗$.

## Control Design Challenges

- Ensuring recursive feasibility and achieving optimality despite a short prediction horizon.
- Satisfying input and state constraints in the presence of uncertainty.
- Ensuring computational tractability by properly reformulating constraints and costs and parameterizing control. policies

There is no systematic and universal solution to the third challenge, and often the chosen approach is application dependent. Fortunately, the first and second challenges can be addressed by using data. 

# Robust MPC

- Robust MPC guarantees constraint satisfaction for all uncertain element realizations.
- The model is split into a nominal part and additive uncertainty in a compact set. 
- The controller is designed to be robust against the uncertainty.
- The MPC cost is typically optimized for the nominal system.

##  Multi-Stage MPC

The basic idea for the multi-stage approach is to consider various scenarios, where a scenario is defined by one possible realization of all uncertain parameters at every control instant within the horizon. The family of all considered discrete scenarios can be represented as a tree structure, called the scenario tree

<center>
<figure>
    <img src="_static/images/40_multi_state_mpc.png" width="60%"/>
    <figcaption>
        Scenario tree representation of the uncertainty
evolution for multi-stage MPC.
    </figcaption>
</figure>
</center>

- Each node in the tree denotes the possible state of the system at every prediction step.
- The branches represent the different possible realizations of the uncertainty.
- The initial state of the system forms the root node of the tree.
- The root node branches into several nodes in the first stage depending on the number of vertex matrix pairs of the parametric uncertainty.
- All the nodes in the first stage branch again in the second stage.
- The sequence continues until the end of prediction horizon N to form the complete scenario tree.
- A path from the root node to the leaf node represents a scenario.

## Example - Inverted Pendulum

In a real system, usually the model parameters cannot be determined exactly, what represents an important source of uncertainty. In this example, we consider that the mass of the pendulum and that of the cart are not known precisely 
and vary with respect to their nominal value.

### Model

#### Model, States and Control inputs

In [None]:
model = do_mpc.model.Model("continuous")

pos = model.set_variable(var_type="_x", var_name="position")
theta = model.set_variable(var_type="_x", var_name="theta")
dpos = model.set_variable(var_type="_x", var_name="velocity")
dtheta = model.set_variable(var_type="_x", var_name="dtheta")
u = model.set_variable(var_type="_u", var_name="force")

#### Parameters

In [None]:
# Certain parameters
ip_parameters = InvertedPendulumParameters()
k = 1 / 3
l = ip_parameters.l
gamma = ip_parameters.gamma
g = ip_parameters.g
mu_p = ip_parameters.mu_p
mu_c = ip_parameters.mu_c

In [None]:
# Uncertain parameters
m = model.set_variable("_p", "m")
M = model.set_variable("_p", "M")

#### ODE

In [None]:
numerator = (
    (M + m) * g * casadi.sin(theta)
    - casadi.cos(theta) * (gamma * u + m * l * dtheta**2 * casadi.sin(theta) - mu_c * dpos)
    - (M + m) / (m * l) * mu_p * dtheta
)
denominator = (1 + k) * (M + m) * l - m * l * casadi.cos(theta) ** 2
ddtheta = numerator / denominator

In [None]:
numerator = (
    m * g * casadi.cos(theta) * casadi.sin(theta)
    - (1 + k) * (gamma * u + m * l * dtheta**2 * casadi.sin(theta) - mu_c * dpos)
    - mu_p * dtheta * casadi.cos(theta) / l
)
denominator = m * casadi.cos(theta) ** 2 - (1 + k) * (M + m)
ddpos = numerator / denominator

In [None]:
model.set_rhs("position", dpos)
model.set_rhs("theta", dtheta)
model.set_rhs("velocity", ddpos)
model.set_rhs("dtheta", ddtheta)

#### Setup

In [None]:
model.setup()

### Controller

In [None]:
mpc = do_mpc.controller.MPC(model)

In [None]:
env = create_inverted_pendulum_environment()
mpc_params = {
    "n_horizon": 50,
    "n_robust": 1,
    "t_step": env.dt,
    "state_discretization": "collocation",
    "collocation_type": "radau",
    "collocation_deg": 3,
    "collocation_ni": 1,
    "store_full_solution": True,
    # Use MA27 linear solver in ipopt for faster calculations:
    "nlpsol_opts": {"ipopt.linear_solver": "mumps"},
}
mpc.set_param(**mpc_params)

#### Objective

In [None]:
env = create_inverted_pendulum_environment()
xss = np.array([0.5, 0, 0, 0])
distance_cost = casadi.bilin(np.diag([1, 100, 0, 0]), model.x.cat - xss)
terminal_cost = distance_cost
stage_cost = distance_cost
print(f"{stage_cost=}")
print(f"{terminal_cost=}")
mpc.set_objective(mterm=terminal_cost, lterm=stage_cost)

In [None]:
force_penalty = 0.1
mpc.set_rterm(force=force_penalty)

#### Constraints

In [None]:
# lower and upper bounds of the position
x_max = 1
mpc.bounds["lower", "_x", "position"] = -x_max
mpc.bounds["upper", "_x", "position"] = x_max
# lower and upper bounds of the input
u_max = 3
mpc.bounds["lower", "_u", "force"] = -u_max
mpc.bounds["upper", "_u", "force"] = u_max

#### Parameter Uncertainty

In [None]:
m_values = ip_parameters.m * np.array([1.0, 1.30, 0.70])
M_values = ip_parameters.M * np.array([1.0, 1.30, 0.70])
mpc.set_uncertainty_values(m=m_values, M=M_values)

#### Setup

In [None]:
mpc.setup()

### Simulation

In [None]:
class MPCController:
    def __init__(self, mpc: do_mpc.controller.MPC) -> None:
        self.mpc = mpc
        self.mpc.reset_history()
        self.mpc.x0 = np.zeros(4)
        self.mpc.set_initial_guess()

    def act(self, observation: NDArray) -> NDArray:
        return mpc.make_step(observation.reshape(-1, 1)).ravel()

In [None]:
%%capture
max_steps = 100
env = create_inverted_pendulum_environment(
    render_mode=render_mode, max_steps=max_steps, cutoff_angle=np.inf, initial_angle=0.99*np.pi
)
controller = MPCController(mpc)
results = simulate_environment(env, max_steps=max_steps, controller=controller)

In [None]:
show_video(results.frames, fps=1 / env.dt)

In [None]:
animate_full_inverted_pendulum_simulation(mpc.data)

# Learning-Based Model Predictive Control

- Learning-based MPC addresses the automated and data-driven generation or adaptation of elements of the MPC formulation to improve control performance.
- The learning setup can be diverse:
  - Offline learning involves adapting the controller between trials or episodes while collecting data.
  - Online learning adjusts the controller during closed-loop operation (e.g. repetitive tasks) or using data from one task execution.
- Much research has focused on automatically improving model quality, as this clearly affects MPC performance.
- Some efforts address the MPC problem formulation directly.
- Others use MPC concepts to satisfy constraints during learning-based control.

## Learning the system dynamics

- MPC relies on accurate system models, so one approach is learning to adjust the model either during operation or between different operational instances.
- Traditionally models are derived offline before control using first principles and identification.
- Robust approaches often consider model uncertainty as well as process noise to lie in compact sets $\theta_t \in T, w_k \in W$.
- Stochastic MPC make use of distributional information on the uncertainties.
- Learning-based MPC constructs and updates models and uncertainties from data.

- Many learning-based MPC techniques make use of an explicit distinction between a nominal 
  system model $f_n$ and an additive learned term $f_l$ accommodating uncertainty:

  $$
  f(x, u, k, w, \theta) = f_n(x, u, k) + f_l(x, u, k, w, \theta)
  $$

- Successful learning methods are often based on probabilistic formulations, leading to a
  nonlinear stochastic prediction model.

- Leveraging the full potential of such models within MPC, however, is very challenging and remains an active  
  research field. Many model-learning MPC schemes have therefore evolved from the extensively studied field of robust MPC, offering a large body of available theoretical results.

- Robust MPC allows some adaptation but only considers fixed, known model uncertainty.
- Learning-based MPC aims to estimate model uncertainty directly from data.
- This allows adjusting the uncertainty over time to reduce conservatism. 
- Many techniques use set-membership identification with bounded noise.
- Parametric approaches find the set of parameter values $\theta$ consistent with observations.
- Non-parametric approaches form estimates of $f$ directly from data points.
- The goal is to learn the model uncertainty from measurements, not assume it.

#### Stochastic non-parametric approaches

- Gaussian process (GP) regression is commonly used in learning-based control due to its flexible non-parametric stochastic approach.
- GP regression assumes dynamics with additive Gaussian noise and models function values as jointly Gaussian based on a kernel function.
- Using recorded state/input data, GP regression provides posterior mean and variance functions as the estimator.
- The variance indicates residual model uncertainty from insufficient data.
- Prediction involves propagating the stochastic state distributions, often approximated as Gaussian.

- The learned part of the model can address discrepancies with a nominal model.
- Uncertainty estimates enable heuristic constraint tightening for safety.
- Computation time is a challenge, addressed via data selection, approximations, and simplifying variance.
- GP-based MPC has been applied successfully to various robotic and process control tasks.
- It is a highly data-efficient model-based reinforcement learning technique.

<center>
    <img src="_static/images/40_gaussian_process.svg" width="80%"/>
</center>

<figure style="float: left; width: 70%; margin-right: 10%;">
    <img src="_static/images/40_learning_based_mpc_gp.png" width="100%"/>
</figure>

Gaussian process–based MPC for autonomous racing. (b,c) The resulting trajectories of a similar approach applied to miniature radio-controlled cars, with the initial nominal controller shown in panel b and the improved trajectories after learning shown in panel c.

In [None]:
%%html
<iframe width="800" height="480" src="https://www.youtube.com/embed/-cdXw1MyTUA?si=S3DXY90f8QEPFddI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

## Learning the controller design

- Beyond the model, the MPC cost function and constraints strongly influence closed-loop performance.
- Learning approaches can design the MPC problem to achieve desired controller behavior.
- A parameterized MPC formulation is considered with cost $g(x,u,\theta_l)$ and constraints $\mathcal{X}(\theta_\mathcal{X}), \mathcal{U}(\theta_\mathcal{U})$:

  $$
  \begin{array}\\
  U^∗ &= \displaystyle\arg\min_{U} \sum\limits_{i=0}^{T} g(x_i,u_i,\theta_l)\\
  \text{subject to} & x_{i+1} = f(x_i, u_i, \theta_f)\\
  & U = [u_0 , \dots, u_N ] \in \mathcal{U}(\theta_\mathcal{U})\\
  & X = [x_0 , \dots, x_N ] \in \mathcal{X}(\theta_\mathcal{X})\\
  & x_0 = x_k\\
  \end{array}
  $$

- Both the cost and constraints are parameterized and learned based on observed data.

### Performance-Driven Controller Learning

- MPC can be a rough approximation of the true stochastic optimal control problem. 
- Performance-driven learning finds MPC parameters to optimize closed-loop performance.
- It uses Bayesian optimization to optimize the true closed-loop cost $J(\theta)$ as a function of parameters $\theta$.
- $J(\theta)$ is modeled as a Gaussian process to enable optimization.

- The finite prediction horizon is a limitation, mitigated via terminal cost and constraint.
- Learning constructs/improves these from data to approximate infinite-horizon cost. 
- For nominal systems, trajectories ending in $X_f$ yield an enlarged invariant set.
- Data is used to iteratively grow $X_f$ and improve the terminal cost and constraint.

### Learning from Demonstration with Inverse Optimal Control

- Designing a cost function or defining an objective mathematically in order to achieve a desired
  complex behavior can be tedious and require extensive parameter tuning and development time
- Inverse optimal control addresses this problem by inferring the cost and constraints from demonstrations.
- The hypothesis underlying inverse optimal control is that the observed demonstrations
  are the solution of a corresponding optimal control problem. 
- Most of the research in inverse optimal control has assumed knowledge of potential constraints
  and focused on the cost function instead, following three main steps:

  1. Define the optimal control problem with a parametric cost function $l(x, u, \theta_l)$,
     e.g., quadratic costs with unknown weights $l(x, u, \theta_l) = x^T Q(\theta_l)x + u^T R(\theta_l)u$.
  2. Derive optimality conditions for the parametric optimal control problem.
  3. Solve optimality conditions for the parameters $\theta_l$ given the demonstration.

- A related field of research is inverse reinforcement learning.
- This technique similarly addresses the problem of identifying a cost or reward function,
  typically in the context of probabilistic decision-making, which is often expressed
  in terms of Markov decision processes. 
- The unknown parameters are obtained by maximizing likelihood.
- These frameworks are typically for discrete state and action spaces.
- They do not explicitly consider system constraints.

<center>
    <img src="_static/images/40_inverse_optimal_control.png" width="80%"/>
    Concept of learning with inverse optimal control, where the cost function plays the
central role of encoding the demonstrated behavior.
</center>

# Safe Learning in Robotics

- Robot learning aims to enable autonomous operation in complex, uncertain environments.
- Challenges include partial knowledge of dynamics, sensors, and other agents. 
- Safety guarantees are crucial but difficult with partial knowledge.
- Control theory uses models to provide guarantees. 
- Reinforcement learning is data-driven for adaptability but lacks guarantees.
- Combining model- and data-driven approaches leverages their complementary strengths.

- Key directions are:
  - Robustness against worst-case scenarios.
  - Adaptation by learning from observations.
  - Leveraging models from domain knowledge and data.

- Control provides the basis for safety-critical applications.
- Safe RL research has grown rapidly.
- Simulation enables RL progress but transferring to real robots remains challenging.

<center>
<figure>
    <img src="_static/images/40_comparison_model_driven_data_driven.svg" width="90%"/>
    <figcaption>
        A comparison of model-driven, data-driven, and combined approaches.
    </figcaption>
</figure>
</center>

- The safe learning control problem is formulated as an optimization with 3 main components:

  1. System model describing robot dynamics.
  2. Cost function defining the control objective. 
  3. Constraints specifying safety requirements.

- The goal is to find a policy fulfilling the task under the safety constraints.
- Any of the 3 components could be initially unknown or partially known. 

<center>
<figure>
    <img src="_static/images/40_safe_control_block_diagram.svg" width="80%"/>
    <figcaption>
        Block diagram representing safe learning control approaches.
    </figcaption>
</figure>
</center>

## Safety Constraints

<center>
<figure>
    <img src="_static/images/40_safety_levels.svg" width="100%"/>
    <figcaption>
        Illustration of Safety Levels.
    </figcaption>
</figure>
</center>

### Safety level III: constraint satisfaction guaranteed.

The system satisfies hard constraints:

$$
c_k^j(x_k, u_k, w_k) \le 0
$$

for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level II: constraint satisfaction with high probability.

The system satisfies probabilistic constraints:

$$
P\left[c_k^j(x_k, u_k, w_k ) \le 0 \right] \ge p^j,
$$

where $P[\cdot]$ denotes the probability and $p^j \in (0, 1)$ defines the likelihood of the jth constraint
being satisfied, for all times $k \in \{0, \dots , N\}$ and constraint indexes $j \in \{1, \dots, n_c\}$.

### Safety level I: constraint satisfaction encouraged

The system encourages constraint satisfaction. This can be achieved in different ways:

- One way is to add a penalty term to the objective function that discourages
  the violation of constraints with a high cost. A non-negative $\epsilon_j$ is added
  to the right-hand side of the inequality in Safety level III, for all times $k \in \{0, \dots , N\}$
  and constraint indexes $j \in \{1, \dots, n_c\}$:
  
  $$
  c_k^j(x_k, u_k, w_k) \le \epsilon_j,
  $$

  and an appropriate penalty term l () ≥ 0, with l () = 0 ⇐⇒  = 0, is added to the objective
  function. The vector  includes all elements ϵj and is an additional variable of the optimization problem.

- Another way is to provide guarantees on the expected value of the constraint but only at a trajectory level:

  $$
  J_{c^j} = E\left[ \sum\limits_{k=0}^{N-1} c_k^j(x_k, u_k, w_k) \right] \le d_j,
  $$

  where $J_{c^j}$ represents the expected total constraint cost, and $d_j$ defines the constraint threshold.

#  Safe Learning Control Approaches

## Learning uncertain dynamics to safely improve performance

These works rely on an apriori model of the robot dynamics. The robot's performance is improved by learning the uncertain dynamics from data. Safety is typically guaranteed based on standard control-theoretic
frameworks, achieving safety level II or III.

## Encouraging safety and robustness in RL

These works encompass approaches that usually do not have knowledge of an apriori robot model or the safety constraints. Rather than providing hard safety guarantees, these approaches encourage safe robot operation (safety
level I), for example, by penalizing dangerous actions.

## Certifying learning-based control under dynamics uncertainty

These works aim to provide safety certificates for learning-based controllers that do not inherently consider safety
constraints. These approaches modify the learning controller output by constraining the control policy, leveraging a known safe backup controller, or modifying the controller output directly to achieve stability and/or constraint satisfaction. They typically achieve safety level II or III.

### Model Predictive Safety Filter

- General learning-based control, particularly Reinforcement Learning,
  has shown great success in solving complex and high-dimensional control tasks.
- However most techniques cannot ensure that safety constraints
  under physical limitations are met, particularly during learning iterations.
- To address this limitation, safety frameworks emerged from control theory.
- MPC techniques can be used for such safety filters to turn a safety-critical dynamical system
  into an inherently safe system to which any learning-based controller
  without safety certificates can be applied out of the box.

<div>
<figure style="float: left; width: 70%;">
    <img src="_static/images/40_safety_filter.svg" width="100%"/>
</figure>
<div style="float: right; width: 25%;">
<br><br><br>Based on the current state $x$, a learning-based controller provides an input
$u_L = \pi_L(x) \in \mathbb{R}^m$, which is processed by the safety filter $u = \pi_S(x, u_S)$ and applied to the real system.
</div>
</div>

- The idea is to address the solution to the stochastic optimal control problem
  through learning-based control methods.
- The proposed learning-based control input $u_L(k)$ at time $k$ is
  then verified in terms of safety by computing a safe backup trajectory from the one-step predicted
  state $x_{1|k}$ to a safe terminal set $X_f$ or by modifying $u_L(k)$ as little as possible
  while still providing a safe backup trajectory.
- The optimization problem necessary for validating safety of the input
  is computationally cheaper than a direct optimization of the task
  and can often be carried out over a reasonably short horizon.

The model predictive safety filter $\pi_S$ is realized through an MPC-like optimization problem of the form:

$$
\begin{array}\\
\displaystyle\min_{U} & || u_{0|k} - u_L(k)||\\
\text{subject to} & x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k)\\
& U = [u_{0|k}, \dots, u_{N|k}] \in U_j, \forall j = 1, \dots, n_{cu}\\
& X = [x_{0|k}, \dots, x_{N|k}] \in X_j, \forall j = 1, \dots, n_{cx}\\
& x_{N|k} \in X_f \\
& x_{0|k} = x_k
\end{array}
$$

<div>
<figure style="float: left; width: 70%;">
    <img src="_static/images/40_safe_learning_approaches.svg" width="100%"/>
</figure>
<div style="float: left; width: 20%;">
<br><br><br>Summary of safe learning control approaches.
</div>
</div>

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Thank you for the attention!</div>

# References

- [<b id="rosolia_datadriven_2018">[Rosolia, U., Zhang, X. and Borrelli, F., 2018]</b>](#rosolia_datadriven_2018-back) Rosolia, Ugo, Xiaojing Zhang, and Francesco Borrelli. [Data-driven predictive control for autonomous systems.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-060117-105215) Annual Review of Control, Robotics, and Autonomous Systems 1 (2018): 259-286.

- [<b id="hewing_learningbased_2020">[Hewing, Lukas, et al. 2020]</b>](#hewing_learningbased_2020-back) Hewing, Lukas, Kim P. Wabersich, Marcel Menner, and Melanie N. Zeilinger. [Learning-based model predictive control: Toward safe learning in control.](https://www.annualreviews.org/doi/full/10.1146/annurev-control-090419-075625) Annual Review of Control, Robotics, and Autonomous Systems 3 (2020): 269-296.

- [<b id="brunke_safe_2022">[Brunke, Lukas, et al. 2022]</b>](#brunke_safe_2022-back) Brunke, Lukas, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig. [Safe learning in robotics: From learning-based control to safe reinforcement learning.](https://www.annualreviews.org/doi/abs/10.1146/annurev-control-042920-020211) Annual Review of Control, Robotics, and Autonomous Systems 5 (2022): 411-444.