# Dictionary Learning for enhanced Koopman Operator approximations

Original paper: Li, Qianxiao, et al. "Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator." Chaos: An Interdisciplinary Journal of Nonlinear Science 27.10 (2017). https://doi.org/10.1063/1.4993854

The conventional fixed dictionary approach in EDMD can pose challenges, particularly when dealing with high-dimensional and nonlinear systems. To overcome this limitation, the above paper proposes an advancement using dictionary learning techniques. By combining EDMD with a trainable artificial neural network dictionary, the EDMD-DL can dynamically adapt the observables without the need for preselection. This notebook repeats the demonstrates the functionality in datafold by repeating the Duffing oscillator case of the paper (Section IV-A).


### Notes:

- The implementation is in an early stage. This means that the API and class names may change if needed.
- The neural network is specified in `torch`, which needs to be installed separately from the datafold's dependencies
- Currently the neural network does not make use of GPU computations (contributions welcome).

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from torch.optim.lr_scheduler import ReduceLROnPlateau

from datafold import EDMD, TSCIdentity
from datafold.dynfold.dictlearning import DMDDictLearning, FeedforwardNN
from datafold.utils._systems import Duffing
from datafold.utils.plot import plot_eigenvalues

## Specify original system and sample data

In line with the original paper we parametrize the Duffing system. The resulting system has two stable steady states at $(\pm1,0)$ separated by a saddle point at $(0, 0)$. By collecting data we convert the continuous dynamical system to a discrete one by defining a flow map mapping from one state to the next with a fixed time interval `dt`. 

In [None]:
delta = 0.5
beta = -1.0
alpha = 1.0

system = Duffing(alpha=alpha, beta=beta, delta=delta)

dt = 0.25

### Specify fraining and test data
For the training we draw 1000 random initial conditions uniformly in the region $[-2,2]^2$⁠. Each initial condition is evolved up to `num_steps = 10` with the flow-map so that we have a total of $10^5$ data points to form the training set.

In addition (not covered in the paper), we define two out-of-sample trajectories. where one starts in the lower left corner $[-2,2]$ and upper right corner $[2,2]$ respectively.

In [None]:
# train data
num_init = 1000
num_steps = 10

time_values = np.arange(0, dt * num_steps, dt)
rng = np.random.default_rng(2)
IC = rng.uniform(low=[-2, -2], high=[2, 2], size=(num_init, 2))

X, _ = system.predict(IC, time_values=time_values)

# test data
time_values_oos = np.arange(0, dt * 500, dt)
IC_oos = np.array([[-2, -2], [2, 2]], dtype=np.float64)

X_oos, _ = system.predict(IC_oos, time_values=time_values_oos)

### Characteristics of data stored in TSCDataFrame

In [None]:
print(f"{X.n_timeseries=}")
print(f"{X.n_timesteps=}")
print(f"{X.delta_time=}")
X

In [None]:
print(f"{X_oos.n_timeseries=}")
print(f"{X_oos.n_timesteps=}")
print(f"{X_oos.delta_time=}")
X_oos

### Plot both training and test trajectories

In [None]:
f, ax = plt.subplots()

for i, df in X.itertimeseries():
    X_np = df.to_numpy()
    ax.plot(
        X_np[:, 0],
        X_np[:, 1],
        c="black",
        linewidth=0.1,
        label="training" if i == 0 else None,
    )

ax.set_title("Original Duffing system with ODE solver (training and test data)")
for i, df in X_oos.itertimeseries():
    ax.plot(
        df.iloc[:, 0].to_numpy(),
        df.iloc[:, 1].to_numpy(),
        c=["red", "blue"][i],
        label=f"test traj. {i}",
    )
ax.grid()
ax.legend();

## Building EDMD model

In this next step, we proceed with training the EDMD-DL model, leveraging the convenient `EDMD` class that supports a combination of fixed dictionary elements, such as time delay embedding, along with dictionary learning. For this study, we opt to use the identity for simplicity, which matches the case of the original paper.

The core concept behind incorporating dictionary learning lies in the creation of a dedicated variant of dynamic mode decomposition, the `DMDDictLearning` class. This class not only learns observables from the data but also provides the mode decomposition of the system matrix. While various learning algorithms can be included in `DMDDictLearning`, the primary supported class is `FeedforwardNN`, which aligns with the specifications of Li et al.

For our study, we specify the neural network with the same number of layers, width per layer, and output size (representing the number of observables). We train the network with a relatively low number of epochs, and additional training parameters can be passed to `fit_params`. In this case, we set a learning rate scheduler `ReduceLROnPlateau` from pytorch and utilize `X_oos` as validation data (impacting the scheduler). The losses are recorded to facilitate later training vs. validation loss visualization. The option to disable the `tqdm` progress bar and set an early stopping in `fit_params` is also available, but not highlighted in this tutorial.

Finally, with both the dictionary learning steps and the dynamic mode decomposition (implemented through `DMDDictLearning`) incorporated into the standard `EDMD` class, we initiate the training process. 

In [None]:
dict_steps = [("_id", TSCIdentity())]
network = FeedforwardNN(
    hidden_size=100,
    n_hidden_layer=3,
    n_dict_elements=22,
    batch_size=5000,
    n_epochs=50,
    sys_regularization=0.00,
    learning_rate=1e-4,
    random_state=1,
)
dmd = DMDDictLearning(learning_model=network)

fit_params = dict(
    dmd__record_losses=True,
    dmd__X_val=X_oos,
    dmd__lr_scheduler=ReduceLROnPlateau,
)

edmd = EDMD(
    dict_steps=dict_steps,
    dmd_model=dmd,
    stepwise_transform=True,
    include_id_state=False,
    dict_preserves_id_state=False,
    sort_koopman_triplets=False,
)
edmd.fit(X, **fit_params)

In [None]:
print(f"Number of parameters {edmd[-2].n_params}")

We now look at the EDMD instance again and see that the dictionary and final estimator changed during the model's fit. The `DMDDictLearning` provides both a transformer (in which the dictionary is learnt) as well as an DMD object for the predictions.

In this case, the dictionary pipline (transformers) are now `TSCIdentity` and `FeedforwardNN`. This means when we evaluate `edmd.transform(X)`, we map X to the output layer of `FeedforwardNN`. Finally the estimator is a DMD class, which is predicting the dictionary states forward in time.

In [None]:
edmd

### Plot training process of neural network

In [None]:
plt.figure()
plt.semilogy(edmd[-2].fit_losses_, "-*", label="train error")
plt.semilogy(edmd[-2].val_losses_, "-*", c="orange", label="val error")
plt.legend()
plt.ylabel("loss")
plt.xlabel("iteration")

### Evaluate EDMD model

We reconstruct both the training data and out-of-sample data. We use these to compare the EDMD model against the Duffing system in plots.

In [None]:
X_reconstruct_train = edmd.reconstruct(X)
X_oos_reconstruct = edmd.reconstruct(X_oos)

We can also investigate the dictionary by mapping the data to the dictionary states. Here this corresponds to the original state, a constant and the last layer of the neural network (psis).

In [None]:
edmd.transform(X).head(10)

In [None]:
### Plot comparison between EDMD-DL and Duffing system

In [None]:
f, ax = plt.subplots(figsize=(10, 5), ncols=2, sharex=True, sharey=True)

for i, df in X.itertimeseries():
    X_np = df.to_numpy()
    ax[0].plot(
        X_np[:, 0],
        X_np[:, 1],
        c="black",
        linewidth=0.1,
        label="training" if i == 0 else None,
    )

ax[0].set_title("Original Duffing system")
for i, df in X_oos.itertimeseries():
    ax[0].plot(
        df.iloc[:, 0].to_numpy(),
        df.iloc[:, 1].to_numpy(),
        c=["red", "blue"][i],
        label=f"test traj. {i}",
    )
ax[0].legend()

for _, df in X_reconstruct_train.itertimeseries():
    X_np = df.to_numpy()
    ax[1].plot(X_np[:, 0], X_np[:, 1], c="black", linewidth=0.1)

ax[1].set_title("Reconstructed with EDMD-DL")

for i, df in X_oos_reconstruct.itertimeseries():
    ax[1].plot(
        df.iloc[:, 0].to_numpy(),
        df.iloc[:, 1].to_numpy(),
        c=["red", "blue"][i],
    )
ax[0].grid()
ax[1].grid()

In [None]:
f, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
for i, df in X_oos.itertimeseries():
    ax[i].plot(
        df.index[:100],
        df.iloc[:100, 0].to_numpy(),
        c=["red", "blue"][i],
        label=f"orig x {i}",
    )

    ax[i].plot(
        df.index[:100],
        df.iloc[:100, 1].to_numpy(),
        c=["red", "blue"][i],
        label=f"orig y {i}",
    )

for i, df in X_oos_reconstruct.itertimeseries():
    ax[i].plot(
        df.index[:100],
        df.iloc[:100, 0].to_numpy(),
        "--",
        c=["red", "blue"][i],
        label=f"pred x {i}",
    )

    ax[i].plot(
        df.index[:100],
        df.iloc[:100, 1].to_numpy(),
        "--",
        c=["red", "blue"][i],
        label=f"pred y {i}",
    )

ax[0].grid()
ax[1].grid()
ax[1].set_xlabel("time")
ax[0].set_ylabel("x/y")
ax[1].set_ylabel("x/y");

From the plots we can see that the predictions match the original system. By increasing the number of dictionary elements or by changing the learning process we may be able to further enhance the model's quality. 

Since the the underlying model in EDMD is linear, we can also view the eigenvalues and investigate their stability. This following plot compares to the analysis in Fig. 2 of Li et al. 

In [None]:
ax = plot_eigenvalues(edmd.koopman_eigenvalues.to_numpy(), plot_unit_circle=True)
ax.grid();