<h1>Lets do the time warp again: time series machine learning with distance functions</h1>

# Distance module overview

## Avaliable distances
| Metric | Distance Function       |
|--------|-------------------------|
| dtw    | distance.dtw_distance   |
| ddtw   | distance.ddtw_distance  |
| wdtw   | distance.wdtw_distance  |
| wddtw  | distance.wddtw_distance |
| erp    | distance.erp_distance   |
| edr    | distance.edr_distance   |
| msm    | distance.msm_distance   |
| twe    | distance.twe_distance   |
| lcss   | distance.lcss_distance  |

## Calling a distance function

In [9]:
import numpy as np
# These can be 1d or 2d arrays and in addition they do not have to be equal length
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 7, 8, 9, 10])

# Calling a distance directly
from aeon.distances import dtw_distance
dtw_distance(x, y)

# Calling a distance using utility function
from aeon.distances import distance
# Any value in the table above is a valid metric string
distance(x, y, metric='dtw')

108.0

## Distance function parameters

You pass parameters as kwargs to a distance. The parameters for each distance
is documented at https://www.aeon-toolkit.org/en/latest/api_reference/distances.html.

To pass parameters to utility function just specify the kwargs after the metric
parameter.

Below shows using the window parameter for DTW.

In [10]:
# Calling a distance directly
from aeon.distances import msm_distance
msm_distance(x, y, window=0.2)

# Calling a distance using utility function
distance(x, y, metric='msm', window=0.2)

2.0

# Other distance module functionality

## Pairwise distance
If we have a collection of time series (or dataset) and we want to find the distance
between each pair of time series we can use the pairwise distance function.

In [11]:
X = np.array([[1, 2, 3, 4, 5],
              [6, 7, 8, 9, 10],
              [11, 12, 13, 14, 15]])

# Calling a distance directly
from aeon.distances import twe_pairwise_distance
twe_pairwise_distance(X)

# Using utility function
from aeon.distances import pairwise_distance
pairwise_distance(X, metric='twe')

array([[ 0.   , 21.008, 26.008],
       [21.008,  0.   , 21.008],
       [26.008, 21.008,  0.   ]])

We can also compute the pairwise distance between two different collections of time series.

In [44]:
y = np.array([[16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25]])

# Calling a distance directly
from aeon.distances import erp_pairwise_distance
erp_pairwise_distance(X, y)

# Using utility function
pairwise_distance(X, y, metric='erp')

[[ 75. 100.]
 [ 50.  75.]
 [ 25.  50.]]


## Pairwise distance parameters

Pairwise distance functions take the same parameters as the distance functions and
you pass them in the same way. They MUST be passed by kwargs.

In [13]:
# Calling a distance directly
from aeon.distances import wdtw_pairwise_distance
wdtw_pairwise_distance(X, y, window=0.2)

# Using utility function
pairwise_distance(X, y, metric='wdtw', window=0.2)

array([[527.38945495, 937.58125325],
       [234.39531331, 527.38945495],
       [ 54.24009352, 234.39531331]])

# Using distances with Sklearn

In [30]:
from aeon.datasets import load_gunpoint as load_data
X_train, y_train = load_data(split="TRAIN", return_type="numpy2D")
X_test, y_test = load_data(split="TEST", return_type="numpy2D")

# Clustering

In [31]:
from sklearn.cluster import AgglomerativeClustering
from aeon.distances import twe_pairwise_distance, pairwise_distance

model_precomputed = AgglomerativeClustering(metric="precomputed", linkage="complete")
model_distance = AgglomerativeClustering(metric=twe_pairwise_distance, linkage="complete")

# Precompute pairwise twe distances
train_pw_distance = pairwise_distance(X_train, metric="twe")

# Fit model using precomputed
model_precomputed.fit(train_pw_distance)
# Fit model using distance function
model_distance.fit(X_train)
#
# Score models on training data
print("DBSCAN with twe distance labels: ", model_distance.labels_)
print("DBSCAN with precomputed labels: ", model_precomputed.labels_)

DBSCAN with twe distance labels:  [1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 0
 1 0 0 1 1 1 1 0 1 1 1 1 0]
DBSCAN with precomputed labels:  [1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 0
 1 0 0 1 1 1 1 0 1 1 1 1 0]


## Classification

In [34]:
from sklearn.svm import SVC
from aeon.distances import msm_pairwise_distance,  pairwise_distance
model_precomputed = SVC(kernel="precomputed")
model_distance = SVC(kernel=msm_pairwise_distance)


# Precompute pairwise twe distances
train_pw_distance = pairwise_distance(X_train, metric="msm")
test_pw_distance = pairwise_distance(X_test, X_train, metric="msm")

# Fit model using precomputed
model_precomputed.fit(train_pw_distance, y_train)
# Fit model using distance function
model_distance.fit(X_train, y_train)

# Score models on training data
print("SVM with twe distance score: ", model_distance.score(X_test, y_test))
print("SVM with precomputed score: ", model_precomputed.score(test_pw_distance, y_test))

SVM with twe distance score:  0.38666666666666666
SVM with precomputed score:  0.38666666666666666


# Performance

In [42]:
import numpy as np
import time
import pandas as pd
N_VALUES = [1000, 5000, 10000, 50000, 100000]
# N_VALUES = [100, 200, 300]
def time_distance(dist_func, dist_name, num_reruns=10, store_dict=None):
    """Time a distance function."""
    if dist_name not in store_dict.keys():
        if store_dict is None:
            store_dict = {}
        store_dict[dist_name] = []

    for n in N_VALUES:
        curr_x = np.random.rand(1, n)
        curr_y = np.random.rand(1, n)
        times = []
        for _ in range(num_reruns):
            start = time.time()
            dist_func(curr_x, curr_y)
            end = time.time()
            times.append(end - start)
        store_dict[dist_name].append(np.mean(times))
    return store_dict

from aeon.distances import (
    dtw_distance,
    ddtw_distance,
    erp_distance,
    edr_distance,
    twe_distance,
    msm_distance,
    wdtw_distance,
    wddtw_distance,
    lcss_distance,
    euclidean_distance
)

dists = [
    (dtw_distance, "dtw_distance"),
    (ddtw_distance, "ddtw_distance"),
    (erp_distance, "erp_distance"),
    (edr_distance, "edr_distance"),
    (twe_distance, "twe_distance"),
    (msm_distance, "msm_distance"),
    (wdtw_distance, "wdtw_distance"),
    (wddtw_distance, "wddtw_distance"),
    (lcss_distance, "lcss_distance"),
    (euclidean_distance, "euclidean_distance")
]

store_dict = {}
for dist in dists:
    store_dict = time_distance(dist[0], dist[1], store_dict=store_dict)

df = pd.DataFrame(store_dict)
df.index = N_VALUES
print(df)

     dtw_distance  ddtw_distance  erp_distance  edr_distance  twe_distance  \
100      0.000223       0.000333      0.000249      0.000202      0.000855   
200      0.000939       0.000887      0.000824      0.000808      0.003283   
300      0.002959       0.001958      0.001758      0.001826      0.007376   

     msm_distance  wdtw_distance  wddtw_distance  lcss_distance  \
100      0.000037       0.000250        0.000239       0.000264   
200      0.000151       0.000866        0.000849       0.001095   
300      0.000368       0.001876        0.001861       0.002175   

     euclidean_distance  
100        9.536743e-07  
200        5.245209e-07  
300        4.768372e-07  
