In [None]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_ml_control
%set_random_seed 12

In [None]:
%presentation_style

In [None]:
%autoreload
import warnings

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

import pykoopman as pk

from training_ml_control.environment import (
    create_inverted_pendulum_environment,
    simulate_environment,
)
from training_ml_control.nb_utils import show_video

warnings.simplefilter("ignore", UserWarning)
sns.set_theme()
plt.rcParams["figure.figsize"] = [12, 8]

:::{figure} ./_static/images/aai-institute-cover.png
:width: 90%
:align: center
---
name: aai-institute
---
:::

# Machine Learning & Control

## Learning-Based MPC

- Learning-based MPC addresses the automated and data-driven generation or adaptation of elements of the MPC formulation to improve control performance.
- The learning setup can be diverse:
  - Offline learning involves adapting the controller between trials or episodes while collecting data.
  - Online learning adjusts the controller during closed-loop operation (e.g. repetitive tasks) or using data from one task execution.
- Much research has focused on automatically improving model quality, as this clearly affects MPC performance.
- Some efforts address the MPC problem formulation directly.
- Others use MPC concepts to satisfy constraints during learning-based control i.e. Safe Learning Control.

## System Identification

- MPC relies on accurate system models, so one approach is learning to adjust the model either during operation or between different operational instances.
- Traditionally models are derived offline before control using first principles and identification.
- Learning-based MPC constructs and updates models and uncertainties from data.

```{exercise} Dynamic System Model Evaluation
:label: model-evaluation

How do we evaluation the fitted model of the system?
```

:::{solution} model-evaluation
:class: dropdown

- We could evaluate the model's prediction on a held-out validation set.
- Additionally we could evaluate the model's long-term predictions in an open-loop manner.
- On top of that, we could evaluate the model's step and impulse responses from different initial states. This is especially useful for linear systems.
:::

## Data Collection

Before learning a model of the system, we need to collect either during the system's operation i.e. online data collection, or inbetween episodes of the system's operation i.e. offline data collection. The collected has to be comprehensive and be representative of the all behaviours of the system that we desire to capture.

To do that properly for a controlled system, we need either a human expert or a well-tuned program under supervision for data collection. However, in many safety-critical tasks such as space exploration, there is no expert collecting data. Naturally, here comes a question: Can we safely collect data without humans in the loop, and eventually achieve an aggressive control goal? For example, landing the drone faster and faster.

### Inverted Pendulum

In [None]:
env = create_inverted_pendulum_environment(
    max_steps=100, theta_threshold=np.deg2rad(45)
)

In [None]:
observations = []
actions = []
for _ in range(20):
    result = simulate_environment(env)
    observations.append(result.observations)
    actions.append(result.actions)

In [None]:
fig, axes = plt.subplots(2, 2, sharex=True)
axes = axes.ravel()
for i, label in zip(range(4), ["$x$", r"$\dot{x}$", r"$\theta$", r"$\dot{\theta}$"]):
    t = np.arange(len(observations[j][0]))
    for j in range(len(observations)):
        axes[i].plot(t, observations[j][i], label=label)
        axes[i].set_xlabel("Time")
        axes[i].set_title(label)
fig.tight_layout()
plt.show();

In [None]:
fig, ax = plt.subplots()
for j in range(len(actions)):
    t = np.arange(len(actions[j]))
    ax.plot(t, actions[j])
    ax.set_xlabel("Time")
    ax.set_title("$u$")
fig.tight_layout()
plt.show();

## Naive Deep Learning

$$
\mathbf{x}_{t+1} = f_\theta(\mathbf{x}_t, \mathbf{u}_t)
$$

Where $\theta$ are learnable parameters.

Issues with such an approach:

- Domain shift from the training distribution to the real-world distribution.
- Black-box.

## Deep Learning with Nominal Model

A real-world dynamical system can be described as:

$$
\mathbf{x}_{t+1} = \underbrace{f_n(\mathbf{x}_t, \mathbf{u}_t)}_{\text{nominal dynamics}} + \underbrace{g_r(\mathbf{x}_t, \mathbf{u}_t)}_{\text{residual dynamics}} + \underbrace{w_t}_{\text{disturbance}}
$$

Where $f_n$ is the nominalsystem model, $g_r$ an additive residual term accommodating uncertainty and $w_t$ represents distburnances.

Many learning-based techniques make use of this explicit distinction by only learning the residual term.

An example of the use of this approach can be found in {cite}`shi_neural_2019`.

{ref}`neural-lander-diagram` is a figure depicting the ground effect that is modelled using a neural network and
{ref}`neural-lander-gif` is a comparison of drone landings with and without neural network modeling.

:::{figure} _static/images/60_neural_lander_diagram.png
:width: 60%
:label: neural-lander-diagram
Visual depiction of the ground effect, the complex aerodynamic effect between the drone and the ground, which is nonlinear, nonstationary, and very hard to model using standard system identification approaches. *Adapted from [Neural Control blog post](https://www.gshi.me/blog/NeuralControl/)*. 
:::

:::{figure} _static/images/60_neural_lander.gif
:width: 60%
:label: neural-lander-gif
Comparison of drone landing with and without neural network modelling of residual ground effect. *Taken from [Neural Control blog post](https://www.gshi.me/blog/NeuralControl/)*. 
:::

Issues with such an approach:

- Requires nominal model.
- Still susceptible to domain shift issues.

## Koopman Operator

Given $\mathcal{F}$ a space of functions $g: \Omega \rightarrow \mathbb{C}$, and $\Omega$ the state space of our dynamical system. The Koopman operator is defined on  a suitable domain $\mathcal{D}(\mathcal{K}) \subset \mathcal{F}$ via the composition formula:

$$
[\mathcal{K}g](x) = [g \circ f](x) = g(f(x)), \quad g \in \mathcal{D}(\mathcal{K})
$$

Where $x_{t+1} = f(x_t)$

The functions $g$, referred to as *observables*, serve as tools for indirectly measuring the state of the system.
Specifically, $g(x_t)$ indirectly measures the state $x_t$.

In this context, $[\mathcal{K}g](x_t) = g(f(x_t)) = g(x_{t+1})$ represents the measurement of the state one time
step ahead of $g(x_t)$. This process effectively captures the dynamic progression of the system.

The key property of the Koopman operator $\mathcal{K}$ is its *linearity*. This linearity holds irrespective of whether the system’s dynamics are linear or nonlinear.
Consequently, the spectral properties of K become a powerful tool in analyzing the dynamical system’s behavior.

:::{figure} _static/images/60_koopman_operators_summary.svg
:width: 70%
:align: center
---
name: Koopman Operators Summary
---
Summary of the idea of Koopman operators. By lifting to a space of observables, we
trade a nonlinear finite-dimensional system for a linear infinite-dimensional system {cite}`colbrook_multiverse_2023`.
:::

### Dynamic Mode Decomposition (DMD)

Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis.

### DMD with Control (DMDc)

In [None]:
X = np.concatenate([obs[:-1] for obs in observations])
U = np.concatenate(actions)

In [None]:
EDMDc = pk.regression.EDMDc()
centers = np.random.uniform(-1.5, 1.5, (4, 4))
RBF = pk.observables.RadialBasisFunction(
    rbf_type="thinplate",
    n_centers=centers.shape[1],
    centers=centers,
    kernel_width=1,
    polyharmonic_coeff=1.0,
    include_state=True,
)

model = pk.Koopman(observables=RBF, regressor=EDMDc)
model.fit(X, u=U, dt=env.dt)

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.ravel()

axes[0].imshow(model.A, aspect="auto", cmap=plt.get_cmap("magma"))
axes[0].set(title="A")

axes[1].imshow(model.B, aspect="1", cmap=plt.get_cmap("magma"))
axes[1].set(title="B")

axes[2].imshow(model.C, aspect="auto", cmap=plt.get_cmap("magma"))
axes[2].set(title="C")

axes[3].imshow(np.real(model.W), aspect="auto", cmap=plt.get_cmap("magma"))
axes[3].set(title=r"$\mathcal{Re}(W)$")
fig.tight_layout()

In [None]:
Xkoop = model.simulate(X[0], U, n_steps=X.shape[0] - 1)
Xkoop = np.vstack([X[0][np.newaxis, :], Xkoop])

In [None]:
t = np.arange(0, len(U))
fig, axs = plt.subplots(3, 1, sharex=True, tight_layout=True, figsize=(9, 6))
axs[0].plot(t, U, "-k")
axs[0].set(ylabel=r"$u$")
axs[1].plot(t, X[:, 0], "-", color="b", label="True")
axs[1].plot(t, Xkoop[:, 0], "--r", label="EDMDc")
axs[1].set(ylabel=r"$x_1$")
axs[2].plot(t, X[:, 1], "-", color="b", label="True")
axs[2].plot(t, Xkoop[:, 1], "--r", label="EDMDc")
axs[2].set(ylabel=r"$x_2$", xlabel=r"$t$")
axs[1].legend(loc="best")
axs[1].set_ylim([-2.5, 2.5])
axs[2].set_ylim([-2.5, 2.5]);

## Eigenvalues

In [None]:
eigval, eigvec = np.linalg.eig(model.A)
fig, ax = plt.subplots()
ax.plot(np.real(eigval), np.imag(eigval), "o", color="lightgrey", label="DMDc")
ax.set(title="Eigenvalues")
ax.legend()