# Parametric Dynamic Mode Decomposition

In this tutorial, we will investigate the usage of the `PartitionedDMD` class in *datafold*. The `PartitionedDMD` is described in the paper "A Dynamic Mode Decomposition Extension for the Forecasting of Parametric Dynamical Systems" by Andreuzzi et al. (DOI: https://doi.org/10.1137/22M1481658) and serves as an extension of the standard dynamic mode decomposition for parametric problems. 

This tutorial aligns with [pydmd tutorial on ParamDMD](https://github.com/PyDMD/PyDMD/blob/master/tutorials/tutorial10/tutorial-10-paramdmd.ipynb).

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import colors

from datafold import EDMD, TSCDataFrame, TSCIdentity
from datafold.dynfold.dmd import PartitionedDMD

We set up a simple parametric ( $\mu$ ) time-dependent problem, specifically the summation of two complex periodic functions:

\begin{cases}
f_1(x,t) &:= e^{2.3i \cdot t} \cosh(x+3)^{-1}\\
f_2(x,t) &:= 2 e^{2.8j \cdot t} \tanh(x) \cosh(x)^{-1}\\
f^{\mu}(x,t) &:= \mu f_1(x,t) + (1-\mu) f_2(x,t), \qquad \mu \in [0,1]
\end{cases}

In [None]:
def f1(x, t):
    return 1.0 / np.cosh(x + 3) * np.exp(2.3j * t)


def f2(x, t):
    return 2.0 / np.cosh(x) * np.tanh(x) * np.exp(2.8j * t)


def f(mu, x, t):
    return mu * f1(x, t) + (1 - mu) * f2(x, t)

The following cell includes helper functions which are later used for visualization.

In [None]:
def title(param):
    return rf"$\mu$={param}"


# this is needed to visualize the time/space in the appropriate way


def labels_func(ax):
    n_space = 500

    ax.set_yticks([0, n_space // 2, n_space])
    ax.set_yticklabels([r"3\pi", r"4$\pi$", r"5$\pi$"])

    ax.set_xticks([0, n_space // 2, n_space])
    ax.set_xticklabels(["-5", "0", "5"])


def visualize(X, param, ax, log=False, labels_func=None, cmap="viridis"):
    ax.set_title(title(param))
    if labels_func is not None:
        labels_func(ax)
    if log:
        return ax.pcolormesh(
            X.real.T, norm=colors.LogNorm(vmin=X.min(), vmax=X.max()), cmap=cmap
        )
    else:
        return ax.pcolormesh(X.real.T, cmap=cmap)


def visualize_multiple(
    Xs, params, log=False, figsize=(20, 6), labels_func=None, title=None, cmap="viridis"
):
    if log:
        Xs[Xs == 0] = np.min(Xs[Xs != 0])

    fig = plt.figure(figsize=figsize)
    axes = fig.subplots(nrows=1, ncols=5, sharey=True)

    if labels_func is None:

        def labels_func_default(ax):
            ax.set_yticks([0, n_time // 2, n_time])
            ax.set_yticklabels(["0", r"$\pi$", r"2$\pi$"])

            ax.set_xticks([0, n_space // 2, n_space])
            ax.set_xticklabels(["-5", "0", "5"])

        labels_func = labels_func_default

    im = [
        visualize(X.T, param=param, ax=ax, log=log, labels_func=labels_func, cmap=cmap)
        for X, param, ax in zip(Xs, params, axes)
    ][-1]

    fig.colorbar(im, ax=axes)

    if fig is not None:
        fig.suptitle(title)

## Training dataset

We set up the model training data and specify a space-time grid which contains a sufficient number of sample points in each dimension. We also choose 10 equally spaced parameters ($\mu$ in the equations above) within the range `[0, 1]`, which we also visualize in a plot.

In [None]:
n_space = 500
n_time = 160

x = np.linspace(-5, 5, n_space)
t = np.linspace(0, 4 * np.pi, n_time)

xgrid, tgrid = np.meshgrid(x, t)

training_params = np.round(np.linspace(0, 1, 10), 1)

In [None]:
plt.figure(figsize=(8, 2))
plt.scatter(training_params, np.zeros(len(training_params)), label="training")
plt.title("Training parameters")
plt.grid()
plt.xlabel(r"$\mu$")
plt.yticks([], []);

Ensuring an adequate number of training parameters is crucial, as insufficient parameters can hinder the algorithm's ability to explore the solution manifold effectively.

The training dataset is derived from applying the function `f` to the combination of `xgrid`, `tgrid`, and the parameters in `training_params`. For the time series data we use *datafold*'s `TSCDataFrame` and for the parameters we use pandas `DataFrame`.

In the visualization of some selected parameters, we see that the dynamics are changing with the parameter. 

In [None]:
training_snapshots = np.stack([f(x=xgrid, t=tgrid, mu=p) for p in training_params])
X_train_d = TSCDataFrame.from_tensor(training_snapshots, time_values=t)
P_train_d = pd.DataFrame(training_params, index=X_train_d.ids)

In [None]:
idxes = [0, 2, 4, 6, 8]
visualize_multiple(training_snapshots[idxes], training_params[idxes])

## Training EDMD model

We now train the model with `EDMD`, where we specify a `dmd_model` that can handle the additional parameter input `P`. Note that here we do not perform a transform and instead use the identity (the `PartitionedDMD` could also be directly). This is mainly to highlight that it is possible to use parametric models in the more generic framework of `EDMD`, where data can also be transformed.

We then visualize and compare the training data with the reconstructed training data of the parametric `EDMD` model.

In [None]:
pdmd = EDMD(
    dict_steps=[("_id", TSCIdentity())],
    dmd_model=PartitionedDMD(n_components=20, dmd_kwargs=dict(rank=20)),
)
pdmd = pdmd.fit(X_train_d, P=P_train_d)

In [None]:
X_reconstruct = pdmd.reconstruct(X_train_d, P=P_train_d)

In [None]:
visualize_multiple(
    training_snapshots,
    training_params,
    figsize=(20, 2.5),
    title="training",
)
visualize_multiple(
    X_reconstruct.to_tensor("row"),
    training_params,
    figsize=(20, 2.5),
    title="EDMD (datafold)",
)

visualize_multiple(
    np.abs(X_reconstruct.to_tensor("row") - training_snapshots),
    training_params,
    figsize=(20, 2.5),
    labels_func=labels_func,
    title="abs difference",
    cmap="OrRd",
)

## Testing data 

To perform predictions it is also essential that the model can handle initial conditions for parameters choices that are not contained in the training data (but within the sampling regime). To showcase this we set up testing data.    

In [None]:
similar_testing_params = [1, 3, 5, 7, 9]
testing_params = training_params[similar_testing_params] + np.array(
    [5 * pow(10, -i) for i in range(2, 7)]
)
testing_params_labels = [
    str(training_params[similar_testing_params][i - 2]) + f"+$5*10^{{-{i}}}$"
    for i in range(2, 7)
]

time_step = t[1] - t[0]
N_predict = 40
N_nonpredict = 40

t2 = np.array(
    [4 * np.pi + i * time_step for i in range(-N_nonpredict + 1, N_predict + 1)]
)
xgrid2, tgrid2 = np.meshgrid(x, t2)

testing_snapshots = np.array([f(mu=p, x=xgrid2, t=tgrid2) for p in testing_params])

X_test_d = TSCDataFrame.from_tensor(testing_snapshots, time_values=t2)
P_test_d = pd.DataFrame(testing_params, index=X_test_d.ids)

In [None]:
plt.figure(figsize=(8, 2))
plt.scatter(training_params, np.zeros(len(training_params)), label="Training")
plt.scatter(testing_params, np.zeros(len(testing_params)), label="Testing")
plt.legend()
plt.grid()
plt.title("Training vs testing parameters")
plt.xlabel(r"$\mu$")
plt.yticks([], []);

Reconstruct the testing data and compare the data with the the ground truth in a plot. 

In [None]:
X_reconstruct_test = pdmd.reconstruct(X_test_d, P=P_test_d)

In [None]:
visualize_multiple(
    testing_snapshots,
    testing_params_labels,
    figsize=(20, 2.5),
    labels_func=labels_func,
    title="ground truth",
)

visualize_multiple(
    X_reconstruct_test.to_tensor("row"),
    testing_params_labels,
    figsize=(20, 2.5),
    labels_func=labels_func,
    title="EDMD (datafold)",
)

visualize_multiple(
    np.abs(X_reconstruct_test.to_tensor("row") - testing_snapshots),
    testing_params_labels,
    figsize=(20, 2.5),
    labels_func=labels_func,
    title="abs difference",
    cmap="OrRd",
)