In this notebook, we demonstrate some of the examples that appear in the paper.

### Gaussian Balls vs. CBF Subsequences using Persistence Diagrams

This example demonstrates that a typical "clusterable" dataset and the subsequences of an apparently "segmentable" time series have very different persistence diagram representations.

In [6]:
import sys
sys.path.append("..")

from topological.utils.tda_tools import *
import gtda.homology as ghm
import sklearn.datasets as skds
from pyts.datasets import make_cylinder_bell_funnel

hv.extension("plotly")

We first generate a Gaussian blobs example:

In [7]:
X, y = skds.make_blobs(
    n_samples=300, n_features=2, random_state=2222, center_box=(-5.0, 5.0)
)
hv.Scatter((X[:, 0], X[:, 1], y), "x", ["y", "c"]).opts(color="c", cmap="bkr").opts(
    width=500, height=500
)

Let us look at its weighted persistence diagram (which is more robust to noise):

In [8]:
vr = ghm.WeightedRipsPersistence(reduced_homology=False)
pd = vr.fit_transform_plot(X[None, ...])

We now generate a random CBF dataset, which consists of distinct shape patterns concatenated together in a random pattern:

In [60]:
Z, u = make_cylinder_bell_funnel(n_samples=30, random_state=9999)
Z = Z.flatten()
hv.Curve(Z).opts(width=1000, height=300)

To visualise its subsequences, we first take its time-delay embedding (extracting sliding windows), then use PCA to produce a 2D visualisation, colouring each point with the shape pattern it belongs to:

In [64]:
pcs, _ = to_td_pca(torch.tensor(Z[None, :, None]), 3, 1, 128, random_state=9999)
pcs = pcs.squeeze()
hv.Scatter3D(
    (pcs[:, 0], pcs[:, 1], pcs[:, 2], np.repeat(u, 128)[-len(pcs) :]),
    [("x", "PC1"), ("y", "PC2"), ("z", "PC3")],
    ["c"],
).opts(color="c", width=500, height=500)

As we see, although it appears different shape patterns produce different "orbits" in the TD embedding space, they are interconnected and produce cyclic structures. Let us look at the persistence diagram:

In [65]:
vr = ghm.WeightedRipsPersistence(reduced_homology=False, n_jobs=-1)
pd = vr.fit_transform_plot(pcs[None, ...])

As we can see in this diagram, it is impossible to discover distinct connected components ($H_0$), making it hard to produce meaningful clusters through direct clustering. Moreover, there are a number of noisy cyclic ($H_1$) components, produced by the repeating (but not periodic) patterns.