# Time-Series Clustering using pretrained embeddings (with fallbacks)

This notebook shows how to cluster time-series using pretrained embeddings (if available) and fallbacks to feature-based methods (TSFresh / DTW / tslearn). It includes data prep, embedding extraction, dimensionality reduction, clustering, and evaluation (DTW-aware metrics, silhouette on embeddings).

# Install notes (uncomment when running in Colab)
# !pip install tslearn umap-learn tsfresh numpy matplotlib scikit-learn

print('If you have a pretrained time-series encoder (TS2Vec or similar), install and load it. Otherwise, this notebook falls back to DTW/tslearn features.')

In [1]:
!pip install tslearn umap-learn tsfresh numpy matplotlib scikit-learn



In [None]:
!pip install --force-reinstall tslearn

In [2]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import umap

sns.set()
print('Imports done (tslearn optional)')

Imports done (tslearn optional)


In [3]:
# Load a sample dataset from tslearn (ECG200) and cluster with TimeSeriesKMeans (DTW)
try:
    X_train, y_train, X_test, y_test = CachedDatasets().load_dataset('ECG200')
    X = np.vstack([X_train, X_test])
    print('Loaded ECG200, shape:', X.shape)
    # normalize each series
    X_norm = (X - np.mean(X, axis=1, keepdims=True)) / (np.std(X, axis=1, keepdims=True) + 1e-8)
    # cluster with DTW-based KMeans
    model = TimeSeriesKMeans(n_clusters=2, metric='dtw', random_state=0)
    labels = model.fit_predict(X_norm)
    print('Cluster labels unique:', np.unique(labels))
    # visualize first 20 series colored by cluster
    plt.figure(figsize=(12,4))
    for i in range(20):
        plt.plot(X_norm[i].ravel(), alpha=0.6, color=['C0','C1'][labels[i]])
    plt.title('Sample time series colored by cluster (first 20)')
    plt.show()
except Exception as e:
    print('tslearn dataset or TimeSeriesKMeans not available in this environment:', e)
    print('You can install tslearn in Colab: !pip install tslearn')


tslearn dataset or TimeSeriesKMeans not available in this environment: [Errno 2] No such file or directory: '/usr/local/lib/python3.12/dist-packages/tslearn/datasets/../.cached_datasets/ECG200.npz'
You can install tslearn in Colab: !pip install tslearn
