# Topological Divergences of Time Series

Data analysis for time series generated by chaotic maps, over a range of control parameter values, using various topological divergences. Classical Lyapunov estimators and recent TDA/HVG measures are also computed as baselines. 

## Set up parallel processing

Ensure cluster is running before executing the code below.

Start a cluster with 32 cores with `ipcluster start -n 32`.

Ensure cluster is stopped after code is complete.

Stop the cluster with `ipcluster stop` in a separate terminal.

In [1]:
import ipyparallel as ipp
clients = ipp.Client()
dv = clients.direct_view()
lbv = clients.load_balanced_view()

## Import modules, classes, and functions

In [32]:
import numpy as np
from scipy import stats

from numpy.random import MT19937
from numpy.random import RandomState
from numpy.random import SeedSequence

from nolds import lyap_r
from nolds import lyap_e

from LogisticMapLCE import logistic_lce
from HenonMapLCE import henon_lce
from IkedaMapLCE import ikeda_lce
from TinkerbellMapLCE import tinkerbell_lce

from TimeSeriesHVG import TimeSeriesHVG as TSHVG
from TimeSeriesMergeTree import TimeSeriesMergeTree as TSMT
from TimeSeriesPersistence import TimeSeriesPersistence as TSPH

## Configure the experiment data

In [3]:
# draw samples from a known random state for reproducibility
SEED = 42
randomState = RandomState(MT19937(SeedSequence(SEED)))

TIME_SERIES_LENGTH = 200
NUM_CONTROL_PARAM_SAMPLES = 10

### Logistic

In [35]:
logistic_control_params = [
    dict(r=r) for r in np.sort(randomState.uniform(3.5, 4.0, NUM_CONTROL_PARAM_SAMPLES))
]
logistic_dataset = [
    logistic_lce(mapParams=params, nIterates=TIME_SERIES_LENGTH, includeTrajectory=True)
    for params in logistic_control_params
]
logistic_trajectories = [
    data["trajectory"][:,0]
    for data in logistic_dataset
]

### Hénon

In [36]:
henon_control_params = [
    dict(a=a, b=0.3) for a in np.sort(randomState.uniform(0.8, 1.4, NUM_CONTROL_PARAM_SAMPLES))
]
henon_dataset = [
    henon_lce(mapParams=params, nIterates=TIME_SERIES_LENGTH, includeTrajectory=True)
    for params in henon_control_params
]
henon_trajectories = [
    data["trajectory"][:,0]
    for data in henon_dataset
]


### Ikeda

In [39]:
ikeda_control_params = [
    dict(a=a) for a in np.sort(randomState.uniform(0.5, 1.0, NUM_CONTROL_PARAM_SAMPLES))
]
ikeda_dataset = [
    ikeda_lce(mapParams=params, nIterates=TIME_SERIES_LENGTH, includeTrajectory=True)
    for params in ikeda_control_params
]
ikeda_trajectories = [
    data["trajectory"][:,0]
    for data in ikeda_dataset
]


### Tinkerbell

In [38]:
tinkerbell_control_params = [
    dict(a=a) for a in np.sort(randomState.uniform(0.7, 0.9, NUM_CONTROL_PARAM_SAMPLES))
]
tinkerbell_dataset = [
    tinkerbell_lce(mapParams=params, nIterates=TIME_SERIES_LENGTH, includeTrajectory=True)
    for params in tinkerbell_control_params
]
tinkerbell_trajectories = [
    data["trajectory"][:,0]
    for data in tinkerbell_dataset
]

## Build time series representations

In [8]:
def build_representation(dataset, rep_class, rep_class_kwargs):
    trajectories = [data["trajectory"][:,0] for data in dataset]
    return [rep_class(ts, **rep_class_kwargs) for ts in trajectories]

In [9]:
tshvg_kwargs = dict(
    DEGREE_DISTRIBUTION_MAX_DEGREE=100,
    DEGREE_DISTRIBUTION_DIVERGENCE_P_VALUE=1.0,
    directed=None,
    weighted=None,
    penetrable_limit=0,
)

In [10]:
tsmt_kwargs = dict(
    INTERLEAVING_DIVERGENCE_MESH=0.5,
    DMT_ALPHA=0.5,
    DISTRIBUTION_VECTOR_LENGTH=100,
    LEAF_NEIGHBOUR_OFFSET=1,
)

In [11]:
tsph_kwargs = dict(
    ENTROPY_SUMMARY_RESOLUTION=100,
    BETTI_CURVE_RESOLUTION=100,
    BETTI_CURVE_NORM_P_VALUE=1.0,
    SILHOUETTE_RESOLUTION=100,
    SILHOUETTE_WEIGHT=1,
    LIFESPAN_CURVE_RESOLUTION=100,
    IMAGE_BANDWIDTH=0.2,
    IMAGE_RESOLUTION=20,
    ENTROPY_SUMMARY_DIVERGENCE_P_VALUE=2.0,
    PERSISTENCE_STATISTICS_DIVERGENCE_P_VALUE=2.0,
    WASSERSTEIN_DIVERGENCE_P_VALUE=1.0,
    BETTI_CURVE_DIVERGENCE_P_VALUE=1.0,
    PERSISTENCE_SILHOUETTE_DIVERGENCE_P_VALUE=2.0,
    PERSISTENCE_LIFESPAN_DIVERGENCE_P_VALUE=2.0,
)

### Logistic

In [12]:
logistic_tshvgs = build_representation(logistic_dataset, TSHVG, tshvg_kwargs)
logistic_tsmts = build_representation(logistic_dataset, TSMT, tsmt_kwargs)
logistic_tsphs = build_representation(logistic_dataset, TSPH, tsph_kwargs)

### Hénon

In [13]:
henon_tshvgs = build_representation(henon_dataset, TSHVG, tshvg_kwargs)
henon_tsmts = build_representation(henon_dataset, TSMT, tsmt_kwargs)
henon_tsphs = build_representation(henon_dataset, TSPH, tsph_kwargs)

### Ikeda

In [14]:
ikeda_tshvgs = build_representation(ikeda_dataset, TSHVG, tshvg_kwargs)
ikeda_tsmts = build_representation(ikeda_dataset, TSMT, tsmt_kwargs)
ikeda_tsphs = build_representation(ikeda_dataset, TSPH, tsph_kwargs)

### Tinkerbell

In [15]:
tinkerbell_tshvgs = build_representation(tinkerbell_dataset, TSHVG, tshvg_kwargs)
tinkerbell_tsmts = build_representation(tinkerbell_dataset, TSMT, tsmt_kwargs)
tinkerbell_tsphs = build_representation(tinkerbell_dataset, TSPH, tsph_kwargs)

## Get Lyapunov exponents and topological divergences

### Lyapunov exponents (ground truth)

Calculated using numerical integration and the Benettin algorithm.

In [16]:
logistic_lces = np.array([data["lce"][0] for data in logistic_dataset])
henon_lces = np.array([data["lce"][0] for data in henon_dataset])
ikeda_lces = np.array([data["lce"][0] for data in ikeda_dataset])
tinkerbell_lces = np.array([data["lce"][0] for data in tinkerbell_dataset])

In [47]:
logistic_lces

array([-0.10634193, -0.06317295,  0.39735755,  0.42847687,  0.43455378,
       -0.13680436, -0.01506717,  0.43273952, -0.07622925,  0.55053153])

### Helper functions

In [26]:
def dict_of_arrays(list_of_dicts):
    """Convert list of dictionaries with equal keys to a dictionary of numpy arrays.
    
    Example
        Input
            [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
        Output
            {'a': np.array([1, 3]), 'b': np.array([2, 4])}
    """
    return {key: np.array([d[key] for d in list_of_dicts]) for key in list_of_dicts[0]}

def topological_divergences(ts_representations):
    divergences = dv.map_sync(lambda rep: rep.divergences, ts_representations)
    return dict_of_arrays(divergences)

### HVG divergences

Wasserstein and $L_p$ divergences of the time series HVG degree distributions.

In [28]:
logistic_hvg_divergences = topological_divergences(logistic_tshvgs)
henon_hvg_divergences = topological_divergences(henon_tshvgs)
ikeda_hvg_divergences = topological_divergences(ikeda_tshvgs)
tinkerbell_hvg_divergences = topological_divergences(tinkerbell_tshvgs)

In [46]:
logistic_hvg_divergences

{'degree_wasserstein': array([0.00029851, 0.00029851, 0.00139303, 0.00109453, 0.00079602,
        0.00039801, 0.000199  , 0.00109453, 0.0039801 , 0.00169154]),
 'degree_lp': array([0.039801  , 0.7761194 , 0.17910448, 0.24875622, 0.16915423,
        0.039801  , 0.0199005 , 0.17910448, 0.7761194 , 0.21890547])}

### Merge tree divergences

Interleaving divergence and leaf-to-offset-leaf path length distribution divergence.

In [31]:
logistic_mt_divergences = topological_divergences(logistic_tsmts)
henon_mt_divergences = topological_divergences(henon_tsmts)
ikeda_mt_divergences = topological_divergences(ikeda_tsmts)
tinkerbell_mt_divergences = topological_divergences(tinkerbell_tsmts)

In [48]:
logistic_mt_divergences

{'interleaving': array([3.93313734e-01, 6.15667548e-01, 6.88947523e-01, 7.55338508e-01,
        7.67488701e-01, 1.91288674e-08, 8.10931079e-01, 8.20755652e-01,
        7.58843195e-01, 8.79628651e-01]),
 'leaf_to_leaf_path_length': array([0.00059394, 0.00049051, 0.00233158, 0.00088515, 0.00132852,
        0.00029851, 0.0003166 , 0.00142857, 0.00074051, 0.00120482])}

### Persistent homology divergences

Various divergences based on the superlevel and sublevel persistence diagrams.

In [30]:
logistic_ph_divergences = topological_divergences(logistic_tsphs)
henon_ph_divergences = topological_divergences(henon_tsphs)
ikeda_ph_divergences = topological_divergences(ikeda_tsphs)
tinkerbell_ph_divergences = topological_divergences(tinkerbell_tsphs)

In [49]:
logistic_ph_divergences

{'point_summary_entropy': array([5.21179913e-04, 1.93738326e-03, 3.08965474e-03, 7.07017585e-04,
        2.45777300e-03, 7.12452319e-12, 1.29340781e-03, 1.49321068e-03,
        4.43695853e-04, 1.50770478e-03]),
 'point_summary_max_persistence_ratio': array([6.68237354e-10, 3.73538755e-10, 2.13077875e-06, 3.29878903e-07,
        7.41441419e-06, 1.21846977e-11, 1.60118122e-07, 6.46471089e-08,
        4.37202485e-09, 3.52424915e-06]),
 'point_summary_homology_class_ratio': array([0.00497512, 0.00497512, 0.00497512, 0.00497512, 0.00497512,
        0.        , 0.00497512, 0.        , 0.00497512, 0.        ]),
 'entropy': array([0.27211655, 0.46292688, 0.47501362, 0.40765834, 0.38399686,
        0.06347962, 0.16911732, 0.40261831, 0.38422877, 0.44901553]),
 'betti': array([100., 100.,  85.,  97.,  72.,   0.,  42.,  40., 100.,  87.]),
 'silhouette': array([9.00514975e-03, 3.37850710e-02, 3.51090577e-02, 2.54191087e-02,
        3.32253555e-02, 6.35089870e-09, 1.92827642e-02, 3.48520423e-02,
  

## Baselines

Other measures that might approximate or estimate the largest Lyapunov exponent of the trajectory data.

### Classical measures

The Rosenstein and Eckmann estimates from Python `nolds`.

In [40]:
logistic_rosenstein_estimates = dv.map_sync(lyap_r, logistic_trajectories)
henon_rosenstein_estimates = dv.map_sync(lyap_r, henon_trajectories)
ikeda_rosenstein_estimates = dv.map_sync(lyap_r, ikeda_trajectories)
tinkerbell_rosenstein_estimates = dv.map_sync(lyap_r, tinkerbell_trajectories)


In [50]:
logistic_rosenstein_estimates

[-0.060033527173494014,
 -0.0301349970759175,
 -0.01609325408935554,
 0.002687146792062013,
 0.006097245570133959,
 0.0003380567505397046,
 -0.0003902937019019587,
 -0.052557697152732764,
 0.000643072041158883,
 0.0006139790982520421]

In [52]:
logistic_eckmann_estimates = np.array([x[0] for x in dv.map_sync(lyap_e, logistic_trajectories)])
henon_eckmann_estimates = np.array([x[0] for x in dv.map_sync(lyap_e, henon_trajectories)])
ikeda_eckmann_estimates = np.array([x[0] for x in dv.map_sync(lyap_e, ikeda_trajectories)])
tinkerbell_eckmann_estimates = np.array([x[0] for x in dv.map_sync(lyap_e, tinkerbell_trajectories)])


In [53]:
logistic_eckmann_estimates

array([ 0.15823905,  0.22930793,  0.69618255,  0.84920216,  0.93647736,
        0.9846051 ,  0.89982027, -0.09447172,  0.30277407,  0.25711146],
      dtype=float32)

### HVG-based measures

The $L_1$ distance between degree distributions of top and bottom HVGs as used in the _Peak vs pit asymmetry_ paper in **Scientific Reports**.

This is already computed above as `logistic_hvg_divergences["degree_lp"]`.

### TDA-based measures