# Extended Dynamic Mode Decomposition on Limit Cycle

In this tutorial we explore the (Extended-) Dynamic Mode Decomposition (E-DMD). We set up a non-linear ordinary differential equation (ODE) system, generate time series data with it and learn the dynamics with an `EDMD` model. 

Note that all models for time series modelling require `TSCDataFrame` type for fitting (`.fit`). The initial conditions for `predict` can be either `numpy.ndarray` or `pandas.DataFrame` typed (as long as the initial condition itself is not a time series).

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp

# NOTE: make sure "path/to/datafold" is in sys.path or PYTHONPATH if not installed
from datafold.pcfold import TSCDataFrame, GaussianKernel
from datafold.dynfold import DMDFull
from datafold.dynfold.transform import TSCRadialBasis, TSCPolynomialFeatures
from datafold.appfold import EDMD

## Set up ODE system

We set up a Hopf ODE system with:

$$
\dot{y}_0 = -y_1 + y_0 (\mu - y_0^2 - y_1^2) \\
\dot{y}_1 = y_0 + y_1 (\mu - y_0^2 - y_1^2)
$$

with $\mu=1$. The ODE system has an circle shaped attractor which is centered at the origin. All sampled initial conditions are off the attractor (i.e. the time series are sampled on the transient phase space region). 

We solve the system by integration (Runge Kutta 45) with scipy's ODE solver. The return type is `TSCDataFrame` and includes a time series for each initial condition (row in `initial_conditions`).

In [None]:
def solve_limit_cycle(initial_conditions, t_eval):
    
    def limit_cycle(t, y):
        """ODE system."""
        mu = 1
        y_dot = np.zeros(2)

        factor = mu - y[0] ** 2 - y[1] ** 2

        y_dot[0] = -y[1] + y[0] * factor
        y_dot[1] = y[0] + y[1] * factor
        return y_dot

    assert initial_conditions.ndim == 2
    assert initial_conditions.shape[1] == 2

    time_series_dfs = []

    for ic in initial_conditions:
        solution = solve_ivp(limit_cycle, t_span=(t_eval[0], t_eval[-1]), y0=ic, t_eval=t_eval)
        
        solution = pd.DataFrame(
            data=solution["y"].T,
            index=solution["t"],
            columns=["x1", "x2"],
        )

        time_series_dfs.append(solution)

    return TSCDataFrame.from_frame_list(time_series_dfs)

## Sampling the dynamical system

We now start collecting time series data from the ODE system (our training set). To sample the phase space, we distribute initial conditions and solve the ODE system for rather short time intervals.

In [None]:
nr_time_steps = 30
t_eval = np.linspace(0, 0.4, 20)

initial_conditions = np.array(np.meshgrid(np.linspace(-2, 2, 8), np.linspace(-2, 2, 8))).T.reshape(-1, 2)

tsc_data = solve_limit_cycle(initial_conditions, t_eval)

print(f"time delta: {tsc_data.delta_time}")
print(f"#time series: {tsc_data.n_timeseries}")
print(f"#time steps per time series: {tsc_data.n_timesteps}")
print(f"(n_samples, n_features): {tsc_data.shape}")
print(f"time interval {tsc_data.time_interval()}")
print(f"Same time values: {tsc_data.is_same_time_values()} ")
print("")
print("Data snippet:")
tsc_data

### Plot: Sampled time series used for training

In [None]:
# function to add a single arrow in the following time series plots
idx_arrow = np.array([t_eval.shape[0] // 2 -1, t_eval.shape[0] // 2])

def include_arrow(ax, df):
    arrow = df.iloc[idx_arrow, :]
    ax.arrow(arrow.iloc[0, 0], 
             arrow.iloc[0, 1], 
             dx=arrow.iloc[1, 0]-arrow.iloc[0, 0], 
             dy=arrow.iloc[1, 1]-arrow.iloc[0, 1], 
             color="black", head_width=0.05)

In [None]:
fig = plt.figure(figsize=[7,7])

ax = fig.add_subplot(1, 1, 1)
for _id, df in tsc_data.itertimeseries():
    ax.plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax, df)

ax.set_title("sampled time series data from ODE system")
ax.set_xlabel("x1")
ax.set_ylabel("x2")
ax.axis("equal")
ax.grid();

## 1. DMD: Identity dictionary

In our first model we use a Dynamic Mode Decomposition (in `datafold.dynfold.dmd`) model and decompose the data in spatio-temporal coordinates using the original form of the time series. In other words, our dictionary only includes the state identities "x1" and "x2" as observable functions. 

We use the `DMDFull` model directly (the same can be accomblished with `EDMD` and setting `dict_step=["id", TSCIdentity()]`).

Note that the DMD-based models' API aligns with scikit-learn. However, the input type of `X` is restricted to a `TSCDataFrame`. The `predict` method allows to set an array of `time_values`, where we can choose at which time samples to evaluate the model. In this case, when we are interested in reconstructing the training data, we leave it as `None`. The model then uses the same time values that were available during `fit`.

In [None]:
dmd = DMDFull().fit(X=tsc_data)  # must be TSCDataFrame
dmd_values = dmd.predict(tsc_data.initial_states(), time_values=None)

print("Data snipped with predicted time series data")
dmd_values

### Compare with training data 

We can now visually compare the original time series data with the data-driven reconstruction of the DMD model. The plots show us that the DMD model performs relatively poor. The Koopman matrix, which describes a linear dynamical system is only a $\mathbb{R}^{2 \times 2}$ matrix. We can therefore classify the phase portrait with [stability theory](https://en.wikipedia.org/wiki/Stability_theory). Note that the Koopman matrix internally describes a dicrete system with fixed time interval. We therefore have have to first convert the Koopman matrix to its continuous form.

#### TODO: the stability analysis is not right... The solution should be a sink source.

In [None]:
cont_koopman_matrix = np.log(dmd.koopman_matrix_.astype(np.complex)) / dmd.dt_

print("Relevant values for stability analysis:")
print(f"determinant {np.linalg.det(cont_koopman_matrix)}")
print(f"trace {np.trace(cont_koopman_matrix)}")

f, ax = plt.subplots(1, 2, figsize=(14, 5))
for _id, df in tsc_data.itertimeseries():
    ax[0].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[0], df)

ax[0].set_title("training data used during fit")
ax[0].set_xlabel("x1")
ax[0].set_ylabel("x2")
ax[0].axis("equal")
ax[0].grid()

for _id, df in dmd_values.itertimeseries():
    ax[1].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[1], df)
    
ax[1].set_title("DMD model (identity state dictionary)")
ax[1].set_xlabel("x1")
ax[1].set_ylabel("x2")
ax[1].axis("equal")
ax[1].grid();

## 2. EDMD: Polynomial feature dictionary

We now get to the "extended" part of a Dynamic Model Decomposition: We define a *dictionary* in which we process the time series data before we fit a DMD model with it. For this, we use the `datafold.appfold.EDMD` class, which is a [`sklearn.pipeline.Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html?highlight=pipeline#sklearn.pipeline.Pipeline). In the `EDMD` model, a dictionary can be a variable number of transform models that are process the time series data consecutively (in same order as defined). The final final estimator has to be a `datafold.dynfold.dmd.DMDBase` model and defaults to `DMDFull`.  

Choosing the "right" dictionary is not an easy task and is similar to "model selection" in classical machine learning. In our choice of dictionary we can include expert knowledge, e.g. if we know the principle equations from an underlying physical system from which time series are collected. We can also apply methods from mathematical theory to represent the data in another (functional) basis with the aim to linearize an unknown phase space's manifold. 

In the first dictionary we use `TSCPolynomialFeatures` which is a wrapper of [`sklearn.preprocessing.PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html?highlight=polynomial#sklearn.preprocessing.PolynomialFeatures) to support `TSCDataFrame` type.

In [None]:
dict_step = [
    (
        "polynomial",
        TSCPolynomialFeatures(degree=3),
    )
]

edmd_poly = EDMD(dict_steps=dict_step, include_id_state=True).fit(X=tsc_data)
edmd_poly_values = edmd_poly.predict(tsc_data.initial_states())

### Analyze the dictionary

Before we compare the model's time series data to the training data, we investigate how we to analyze the actual process of dictionary transformations in a `EDMD` model.  

For example, we may be interested in the values of the "dictionary space" (the data after all the transformations are applied before it in handled to the final DMD model). For this we can use the `transform` method of `EDMD` which applies the dictionary on the available data (e.g. our training set). In this case we can see that the result is a `TSCDataFrame` which includes the original states "x1" and "x2" plus the generated polynomial features. 

The single dictionary models are accessible with the specified name via `named_steps`. Here, we access the model and its attribute `TSCPolynomialFeatures.powers_` through the `EDMD` model.

In [None]:
# access models in the dictionary, the name was given in "dict_step" above 
print(edmd_poly.named_steps["polynomial"])

print("")
print("polynomial degrees for data (first column 'x1' and second 'x2'):")
print(edmd_poly.named_steps["polynomial"].powers_)

print("")
print("Dictionary space values:")
edmd_poly.transform(tsc_data)

### Compare with training data

We see that reconstruction of time series improved and the phase portrait now look a lot better than the previous DMD approach. However, there are still differences and some time series even cross, which is not a behavior of the original system.

In [None]:
f, ax = plt.subplots(1, 2, figsize=(14, 5))
for _id, df in tsc_data.itertimeseries():
    ax[0].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[0], df)

ax[0].set_title("training data used during fit")
ax[0].set_xlabel("x1")
ax[0].set_ylabel("x2")
ax[0].axis("equal");

for _id, df in edmd_poly_values.itertimeseries():
    ax[1].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[1], df)
    
ax[1].set_title("EDMD with polyomial dictionary")
ax[1].set_xlabel("x1")
ax[1].set_ylabel("x2")
ax[1].axis("equal");

## 3. EDMD: Radial basis function dictionary

In our last attempt we set up a dictionary with `TSCRadialBasis`. The transform class computes coefficients of each time series sample to a set of basis functions (which are distributed in the phase space). The radial basis functions therefore provide a way to linearize the phase space's manifold. Here we choose a Gaussian kernel and set the center of the functions to the initial condition states.

In the time series in "dictionary space" we see that the feature dimension is now much greater than in the beginning (i.e. we provide a larger set of observables to compute the Koopman operator).

In [None]:
dict_step = [("rbf", TSCRadialBasis(kernel=GaussianKernel(epsilon=0.17), center_type="initial_condition"))]

edmd_rbf = EDMD(dict_steps=dict_step, include_id_state=True).fit(X=tsc_data)  # Note that the "extended" part is in the transformations
edmd_rbf_values = edmd_rbf.predict(tsc_data.initial_states())

print(f"shape of Koopman matrix: {edmd_rbf.named_steps['dmd'].koopman_matrix_.shape}")
edmd_rbf.transform(tsc_data)

### Compare with training data

Again for comparison we plot the training time series next to the EDMD model's time series. This time the phase portraits match quite well. However, at this stage this is only an indicator of a successful model. Like for all data-driven machine learning models there is always the danger to overfit the training data. The consequence is a poor generalization for out-of-sample initial conditions. 

The right way to tackle overfitting is to apply cross-validation. For the `EDMD` model this can be achieved with `EDMDCV`, which allows an exhausitve search over a grid of the model's and the dictionary model parameters. **datafold** provides time series splitting for cross validation which enables measureing the model's quality on unseen (partial) time series data.

For this tutorial, we only add a single out-of-sample initial condition and compare it to the ODE system. We used this plot to visually "optimize" the Gaussian kernel epilon value. If we now predict the time series we want to highlight that the `EDMD` model interpolates in time. This means, we are now able to freely choose the time interval and number of time samples at which to evaluate the model. In the time series we can see that the model follows the ground truth solution fairly well for some time. However, the `EDMD` model won't stay on the attractor for $t \rightarrow \infty$ yet.

The problem of overfitting can be seen if `epsilon=1` is set in the Gaussian kernel. The reconstruction phase portrait looks equally well, but the out-of-sample quality decreases. 

In [None]:
f, ax = plt.subplots(1, 2, sharey=True, figsize=(14, 5))
for _id, df in tsc_data.itertimeseries():
    ax[0].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[0], df)
    
ax[0].set_title("training data used during fit")
ax[0].set_xlabel("x1")
ax[0].set_ylabel("x2")
ax[0].axis("equal");
ax[0].grid()

for _id, df in edmd_rbf_values.itertimeseries():
    ax[1].plot(df["x1"].to_numpy(), df["x2"].to_numpy(), 0.1, c="black")
    include_arrow(ax[1], df)

ax[1].set_title("EDMD with RBF dictionary")
ax[1].set_xlabel("x1")
ax[1].set_ylabel("x2")
ax[1].axis("equal")
ax[1].grid()


# make out-of-sample prediction
initial_condition = np.array([[2, 1]])
t_eval = np.linspace(0, 7, 400)

ground_truth = solve_limit_cycle(initial_condition, t_eval)
predicted = edmd_rbf.predict(initial_condition, t_eval)

f, ax = plt.subplots(figsize=(7,7))

ax.plot(ground_truth.loc[:, "x1"], ground_truth.loc[:, "x2"], label="true system")
include_arrow(ax, ground_truth)
ax.plot(predicted.loc[:, "x1"], predicted.loc[:, "x2"], c="orange", label="edmd_rbf")

ax.set_title("out-of-sample prediction")
ax.axis("equal")
ax.grid()
ax.legend();
