In [1]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_ml_control
%set_random_seed 12

In [2]:
%presentation_style

In [3]:
import warnings

warnings.simplefilter("ignore", UserWarning)

In [4]:
%autoreload
import warnings

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

import pykoopman as pk

from training_ml_control.environments import (
    create_cart_environment,
    create_inverted_pendulum_environment,
    simulate_environment,
)
from training_ml_control.nb_utils import show_video, display_array

warnings.simplefilter("ignore", UserWarning)
sns.set_theme()
plt.rcParams["figure.figsize"] = [12, 8]

pygame 2.5.2 (SDL 2.28.2, Python 3.10.12)
Hello from the pygame community. https://www.pygame.org/contribute.html


:::{figure} ./_static/images/aai-institute-cover.png
:width: 90%
:align: center
---
name: aai-institute
---
:::

# Machine Learning & Control

Modern machine learning provides useful tools and perspectives for control theory. Framing control problems as data modeling tasks enables powerful function approximation, estimation, and optimization techniques from machine learning to be applied.

## Learning-Based Control

- Learning-based control addresses the automated and data-driven generation or adaptation of elements of the controller formulation to improve control performance.
- The learning setup can be diverse:
  - Offline learning involves adapting the controller between trials or episodes while collecting data.
  - Online learning adjusts the controller during closed-loop operation (e.g. repetitive tasks) or using data from one task execution.

## System Identification

- Control relies on accurate system models, so one approach is learning to adjust the model either during operation or between different operational instances.
- Traditionally models are derived offline before control using first principles and identification.
- Learning-based system identification constructs and updates models and uncertainties from data.

```{exercise} Dynamic System Model Evaluation
:label: model-evaluation

How do we evaluation the fitted model of the system?
```

:::{solution} model-evaluation
:class: dropdown

- We could evaluate the model's prediction on a held-out validation set.
- Additionally we could evaluate the model's long-term predictions in an open-loop manner.
- On top of that, we could evaluate the model's step and impulse responses from different initial states. This is especially useful for linear systems.
:::

## Data Collection

Before learning a model of the system, we need to collect either during the system's operation i.e. online data collection, or inbetween episodes of the system's operation i.e. offline data collection. The collected has to be comprehensive and be representative of the all behaviours of the system that we desire to capture.

To do that properly for a controlled system, we need either a human expert or a well-tuned program under supervision for data collection. However, in many safety-critical tasks such as space exploration, there is no expert collecting data. Naturally, here comes a question: Can we safely collect data without humans in the loop, and eventually achieve an aggressive control goal? For example, landing the drone faster and faster.

### Cart

In [None]:
cart_env = create_cart_environment(goal_position=9)

In [None]:
cart_observations = []
cart_actions = []
for _ in range(20):
    result = simulate_environment(cart_env)
    cart_observations.append(result.observations)
    cart_actions.append(result.actions)

In [None]:
fig, axes = plt.subplots(1, 2, sharex=True)
axes = axes.ravel()
for i, label in zip(range(2), ["$x$", r"$\dot{x}$"]):
    for j in range(len(cart_observations)):
        t = np.arange(len(cart_observations[j][i]))
        axes[i].plot(t, cart_observations[j][i], label=label)
        axes[i].set_xlabel("Time")
        axes[i].set_title(label)
fig.tight_layout()
plt.show();

In [None]:
fig, ax = plt.subplots(1, 1, sharex=True)

for j in range(len(cart_observations)):
    ax.plot(cart_observations[j][0], cart_observations[j][1])
    ax.set_xlabel("$x$")
    ax.set_ylabel("$\dot{x}$")
fig.tight_layout()
plt.show();

In [None]:
fig, ax = plt.subplots()
for j in range(len(cart_actions)):
    t = np.arange(len(cart_actions[j]))
    ax.plot(t, cart_actions[j])
    ax.set_xlabel("Time")
    ax.set_title("$u$")
fig.tight_layout()
plt.show();

### Inverted Pendulum

In [None]:
inverted_pendulum_env = create_inverted_pendulum_environment(
    max_steps=100, theta_threshold=np.deg2rad(45)
)

In [None]:
inverted_pendulum_observations = []
inverted_pendulum_actions = []
for _ in range(20):
    result = simulate_environment(inverted_pendulum_env)
    inverted_pendulum_observations.append(result.observations)
    inverted_pendulum_actions.append(result.actions)

In [None]:
fig, axes = plt.subplots(2, 2, sharex=True)
axes = axes.ravel()
for i, label in zip(range(4), ["$x$", r"$\dot{x}$", r"$\theta$", r"$\dot{\theta}$"]):
    for j in range(len(inverted_pendulum_observations)):
        t = np.arange(len(inverted_pendulum_observations[j][i]))
        axes[i].plot(t, inverted_pendulum_observations[j][i], label=label)
        axes[i].set_xlabel("Time")
        axes[i].set_title(label)
fig.tight_layout()
plt.show();

In [None]:
fig, ax = plt.subplots(1, 1)
for j in range(len(inverted_pendulum_observations)):
    ax.plot(inverted_pendulum_observations[j][0], inverted_pendulum_observations[j][2])
    ax.set_xlabel("$x$")
    ax.set_ylabel(r"$\theta$")
fig.tight_layout()
plt.show();

In [None]:
fig, ax = plt.subplots()
for j in range(len(inverted_pendulum_actions)):
    t = np.arange(len(inverted_pendulum_actions[j]))
    ax.plot(t, inverted_pendulum_actions[j])
    ax.set_xlabel("Time")
    ax.set_title("$u$")
fig.tight_layout()
plt.show();

## Naive Deep Learning

$$
\mathbf{x}_{t+1} = f_\theta(\mathbf{x}_t, \mathbf{u}_t)
$$

Where $\theta$ are learnable parameters.

Issues with such an approach:

- Domain shift from the training distribution to the real-world distribution.
- Black-box.

## Deep Learning with Nominal Model

A real-world dynamical system can be described as:

$$
\mathbf{x}_{t+1} = \underbrace{f_n(\mathbf{x}_t, \mathbf{u}_t)}_{\text{nominal dynamics}} + \underbrace{g_r(\mathbf{x}_t, \mathbf{u}_t)}_{\text{residual dynamics}} + \underbrace{w_t}_{\text{disturbance}}
$$

Where $f_n$ is the nominalsystem model, $g_r$ an additive residual term accommodating uncertainty and $w_t$ represents distburnances.

Many learning-based techniques make use of this explicit distinction by only learning the residual term.

An example of the use of this approach can be found in {cite}`shi_neural_2019`.

{ref}`neural-lander-diagram` is a figure depicting the ground effect that is modelled using a neural network and
{ref}`neural-lander-gif` is a comparison of drone landings with and without neural network modeling.

:::{figure} _static/images/60_neural_lander_diagram.png
:width: 60%
:label: neural-lander-diagram
Visual depiction of the ground effect, the complex aerodynamic effect between the drone and the ground, which is nonlinear, nonstationary, and very hard to model using standard system identification approaches. *Adapted from [Neural Control blog post](https://www.gshi.me/blog/NeuralControl/)*. 
:::

:::{figure} _static/images/60_neural_lander.gif
:width: 60%
:label: neural-lander-gif
Comparison of drone landing with and without neural network modelling of residual ground effect. *Taken from [Neural Control blog post](https://www.gshi.me/blog/NeuralControl/)*. 
:::

Issues with such an approach:

- Requires nominal model.
- Still susceptible to domain shift issues.

## Koopman Operator

Given $\mathcal{F}$ a space of functions $g: \Omega \rightarrow \mathbb{C}$, and $\Omega$ the state space of our dynamical system. The Koopman operator is defined on  a suitable domain $\mathcal{D}(\mathcal{K}) \subset \mathcal{F}$ via the composition formula:

$$
[\mathcal{K}g](\mathbf{x}) = [g \circ f](\mathbf{x}) = g(f(\mathbf{x})), \quad g \in \mathcal{D}(\mathcal{K})
$$

Where $\mathbf{x}_{t+1} = f(\mathbf{x}_t)$

The functions $g$, referred to as *observables*, serve as tools for indirectly measuring the state of the system.
Specifically, $g(x_t)$ indirectly measures the state $x_t$.

In this context, $[\mathcal{K}g](\mathbf{x}_t) = g(f(\mathbf{x}_t)) = g(\mathbf{x}_{t+1})$ represents the measurement of the state one time
step ahead of $g(\mathbf{x}_t)$. This process effectively captures the dynamic progression of the system.

The key property of the Koopman operator $\mathcal{K}$ is its *linearity*. This linearity holds irrespective of whether the system’s dynamics are linear or nonlinear. Consequently, the spectral properties of $\mathcal{K}$ become a powerful tool in analyzing the dynamical system’s behavior.

if $g \in \mathcal{F}$ is an eigenfunction of $\mathcal{K}$ with eigenvalue $\lambda$, then:

$$
g(\mathbf{x}_t) = [\mathcal{K}^t g](\mathbf{x}_0) = λ^t g(\mathbf{x}_0), \quad \forall n \in \mathbb{N}.
$$

One of the most useful features of Koopman operators is the Koopman Mode Decomposition (KMD).
The KMD expresses the state $\mathbf{x}$ or an observable $g(\mathbf{x})$ as a linear combination of dominant coherent structures.
It can be considered a diagonalization of the Koopman operator.

As a result, the KMD is invaluable for tasks such as dimensionality and model reduction. It generalizes the space-time separation of variables typically achieved through the Fourier transformor singular value decomposition (SVD).

:::{figure} _static/images/60_koopman_operators_summary.svg
:width: 70%
:align: center
---
name: Koopman Operators Summary
---
Summary of the idea of Koopman operators. By lifting to a space of observables, we
trade a nonlinear finite-dimensional system for a linear infinite-dimensional system {cite}`colbrook_multiverse_2023`.
:::

## Dynamic Mode Decomposition (DMD)

Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis.

The simplest and historically first interpretation of DMD is as a linear regression.

We consider a discrete-time dynamical systems represented as:

$$
\mathbf{x}_{t+1} = f(\mathbf{x}_t), \quad n = 0, 1, 2, \dots, 
$$

Given discrete-time snapshots of the system:

$$
\{\mathbf{x}^{(m)}, \mathbf{y}^{(m)}\}^M_{m=1}, \quad \text{s.t.} \quad \mathbf{y}^{(m)} = f(\mathbf{x}^{(m)}), \quad m = 1, \dots , M.
$$

We define the snapshot matrices $\mathbf{X}, \mathbf{Y} \in \mathbb{C}^{d\times M}$ as:

$$
\mathbf{X} = \begin{bmatrix}x^{(1)} & x^{(2)} & \dots & x^{(M)} \end{bmatrix}
$$

$$
\mathbf{Y} = \begin{bmatrix}y^{(1)} & y^{(2)} & \dots & y^{(M)} \end{bmatrix}
$$

We seek a matrix $\mathbf{K}_{\text{DMD}}$ such that $\mathbf{Y} \approx \mathbf{K}_{\text{DMD}} \mathbf{X}$. We can think
of this as constructing a linear and approximate dynamical system.

To find a suitable matrix KDMD, we consider the minimization problem:

$$
\underset{\mathbf{K}_{\text{DMD}} \in \mathbb{C}^{d\times d} }{\min} \left\lVert \mathbf{Y} − \mathbf{K}_{\text{DMD}} \mathbf{X} \right\rVert_F ,
$$ (dmd-minimization)

where $\left\lVert . \right\rVert_F$ denotes the Frobenius norm[^*]. Similar optimization problems will be at the heart of the
various DMD-type algorithms we consider in this review. A solution to the problem in {eq}`dmd-minimization` is:

[^*]: The Frobenius norm of a matrix $\mathbf{X} \in \mathbb{C}^{m\times n}$ is defined as $\left\lVert \mathbf{X} \right\rVert_F = \sqrt{\sum_{i=0}^{m} \sum_{j=0}^{n} x_{i,j}}$

$$
\mathbf{K}_{\text{DMD}} = \mathbf{Y} \mathbf{X}^{+} ∈ \mathbb{C}^{d\times d},
$$

where $^{+}$ denotes the Moore–Penrose pseudoinverse.

In practice, this is computed using the Singular Value Decomposition (SVD) as follows:

$$
\begin{array}{ll}
\mathbf{X} \approx U \Sigma V^∗ & \text{(truncated SVD of rank r)}\\
\tilde{\mathbf{K}}_{\text{DMD}} = U^∗\mathbf{Y} V \Sigma^{-1} & \text{(Compute compression)}\\
\tilde{\mathbf{K}}_{\text{DMD}} W = W \Lambda & \text{(Compute eigendecomposition)}\\
\phi = YV\Sigma^{-1}W & \text{(Compute the modes)}\\
\end{array}
$$ 

The core goal of DMD is to apply linear algebra and spectral techniques to the analysis, prediction, and control of nonlinear dynamical systems.
However, DMD often faces several challenges that have been a driving force for the many versions of the DMD algorithm that have appeared.

Generally speaking, the error of DMD and its approximate KMD can be split into three types:
- The projection error is due to projecting/truncating the Koopman operator onto a finite-dimensional space of observables. This is linked to the issue of    closure and lack of (or lack of knowledge of) non-trivial finite-dimensional Koopman invariant subspaces.
- The estimation error is due to estimating the matrices that represent the projected Koopman operator from a finite set of potentially noisy trajectory data.
- Numerical errors (e.g., roundoff, stability, further compression, etc.) incurred when processing the finite DMD matrix.

### Variants

:::{table} Summary of some DMD methods. *Adapted from {cite}`colbrook_multiverse_2023`*.
:widths: auto
:align: center

| DMD Method | Challenges Overcome | Key Insight/Development |
|---|---|---|
| Forward-Backward DMD      | Sensor noise bias. | Take geometric mean of forward and backward propagators for the data. |
| Total Least-Squares DMD   | Sensor noise bias. | Replace least-squares problem with total least-squares problem. |
| Optimized DMD<br>Bagging Optimized DMD | Sensor noise bias.<br>Optimal collective processing of snapshots.| Exponential fitting problem, solve using variable projection method.<br>Statistical bagging sampling strategy. |
| Compressed Sensing        | Computational efficiency.<br>Temporal or spatial undersampling. | Unitary invariance of DMD extended to settings of compressed sensing (e.g., RIP, sparsity-promoting regularizers). |
| Randomized DMD            | Computational efficiency.<br>Memory usage. | Sketch data matrix for computations in reduced-dimensional space. |
| Multiresolution DMD       | Multiscale dynamics. | Filtered decomposition across scales. |
| **DMD with Control**      | Separation of unforced dynamics and actuation. | Generalized regression for globally linear control framework. |
| **Extended DMD**          | Nonlinear observables. | Arbitrary (nonlinear) dictionaries, recasting of DMD as a Galerkin method. |
| Physics-Informed DMD      | Preserving structure of dynamical systems.<br>Numerous instances given in general framework. | Restrict the least-squares optimization to lie on a matrix manifold. | 
:::

### DMD with Control (DMDc)

One of the most successful applications of the Koopman operator framework lies in control with demonstrated successes in various challenging appli-
cations. These include fluid dynamics, robotics, power grids, biology, and chemical processes.

The key point is that Koopman operators represent nonlinear dynamics within a globally linear framework. This approach leads to tractable convex optimization problems and circumvents theoretical and computational limitations associated with nonlinearity. Moreover, it is amenable to data-driven, model-free approaches.

DMDc extends DMD to disambiguate between unforced dynamics and the effect of actuation.

The DMD regression is generalized to:

$$
\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t) \approx A \mathbf{x}_t + B \mathbf{u}_t
$$

where $A \in \mathbb{C}^{d\times d}$ and $B \in \mathbb{C}^{d\times q} are unknown matrices.

Snapshot triplets of the form $\{\mathbf{x}^{(m)}, \mathbf{y}^{(m)}, \mathbf{u}^{(m)}\}^M_{m=1}$ are collected, where we assume that:

$$
\mathbf{y}^{(m)} \approx f(\mathbf{x}^{(m)}, \mathbf{u}^{(m)}), \quad m = 1, \dots , M.
$$

The control portion of the snapshots is arranged into the matrix $\Upsilon = \begin{pmatrix}u^{(1)} & u^{(2)} & \dots & u^{(M)}\end{pmatrix}$.

The optimization problem in {eq}`dmd-minimization` is replaced by

$$
\underset{A, B}{\min} \left\lVert \mathbf{Y} − \begin{pmatrix}A & B \end{pmatrix}\Omega \right\rVert_F^2,\\
\text{where} \quad \Omega = \begin{pmatrix}\mathbf{X} \\ \Upsilon\end{pmatrix} 
$$ (dmdc-minimization)

A solution is given as $\begin{pmatrix}A & B \end{pmatrix} = \mathbf{Y} \Omega^{+}$.

#### Cart

We will use the DMDc method to fit a model on the data collected from the Cart environment. For that we will make sure of the [pykoopman](https://pykoopman.readthedocs.io/en/master/index.html) package.

In [None]:
X = np.concatenate([obs[:-1] for obs in cart_observations])
U = np.concatenate(cart_actions)

In [None]:
DMDc = pk.regression.DMDc()
model = pk.Koopman(regressor=DMDc)
model.fit(X, u=U, dt=cart_env.dt)

Once we fit the model we can access the linear state-space models matrices:

In [None]:
display_array("A", model.A)
display_array("B", model.B)
display_array("C", model.C)
display_array("W", model.W)

After that we can use the model to simulate the system using the remaining data

In [None]:
Xkoop = model.simulate(X[0], U, n_steps=X.shape[0] - 1)
Xkoop = np.vstack([X[0][np.newaxis, :], Xkoop])

In [None]:
t = np.arange(0, len(U))
fig, axs = plt.subplots(3, 1, sharex=True, figsize=(16, 8))
axs[0].plot(t, U, "-k")
axs[0].set(ylabel=r"$u$")
axs[1].plot(t, X[:, 0], "-", color="b", label="True")
axs[1].plot(t, Xkoop[:, 0], "--r", label="DMDc")
axs[1].set(ylabel=r"$x_1$")
axs[2].plot(t, X[:, 1], "-", color="b", label="True")
axs[2].plot(t, Xkoop[:, 1], "--r", label="DMDc")
axs[2].set(ylabel=r"$x_2$", xlabel=r"$t$")
axs[1].legend(loc="best")
axs[1].set_ylim([-8, 8])
axs[2].set_ylim([-8, 8])
fig.tight_layout();

#### Eigenvalues

In [None]:
eigval, eigvec = np.linalg.eig(model.A)
display_array("Eigenvalues", eigval)
fig, ax = plt.subplots()
ax.plot(np.real(eigval), np.imag(eigval), "o", color="lightgrey", label="DMDc")
ax.set(title="Eigenvalues")
ax.legend();

:::{exercise} Inverted Pendulum DMDc
:label: inverted-pendulum-dmdc

Use the DMDc method to fit a model on the data collected from the inverted pendulum environment.
:::

:::{solution} inverted-pendulum-dmdc
:::

In [None]:
# Your solution here

:::{solution} inverted-pendulum-dmdc
:class: dropdown
**Work in Progress**

```{code-cell}
EDMDc = pk.regression.EDMDc()
centers = np.random.uniform(-1.5, 1.5, (4, 4))
RBF = pk.observables.RadialBasisFunction(
    rbf_type="thinplate",
    n_centers=centers.shape[1],
    centers=centers,
    kernel_width=1,
    polyharmonic_coeff=1.0,
    include_state=True,
)

model = pk.Koopman(observables=RBF, regressor=EDMDc)
model.fit(X, u=U, dt=env.dt)
```
:::