This is now available in aeon-toolkit !
Simply run the following:
from aeon.datasets import load_classification
from aeon.clustering.averaging import elastic_barycenter_average
X, y = load_classification(name="Coffee")
average_class_0 = elastic_barycenter_average(X[y == 0], distance="shape_dtw", reach=15)
This repository contains the code of our paper "ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging" accepted at 8th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2023) in conjunction with the 2023 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
This work was done by Ali Ismail-Fawaz, Hassan Ismail Fawaz, Fran ̧cois Petitjean, Maxime Devanne, Jonathan Weber, Stefano Berretti, Geoffery I. Webb and Germain Forestier.
Before doing anything, run the following command in root directory to build the necessary cython
components of the Dynamic Time Warping (DTW) algorithm and its variants.
./utils/build-cython.sh
In order to use the code, first step is to adapt it to your own machine as follows:
- You should download the datasets of the UCR Archive
- The directory containing the dataset of the UCR Archive should be put in the variable
root_dir_dataset_archive
in this line of the main.py file - Specify the
root_dir
directory where the results will be stored in this line of the main.py file
Two options can be chosen when running the main.py
file:
- Visualize the resulted average per class of a dataset using the following command:
python3 main.py visualize_average <dataset_name> <archive_name> <averaging_method>
An example would be:python3 main.py visualize_average Coffee UCRArchie_2018 shapeDBA
The<archive_name
should be the same name as the directory containing the datasets, for instance the directory of the dataset should beroot_dir_dataset_archive + '<archive_name>/Coffee'
The choices for the<averaging_method>
are: mean, DBA, softDBA and ShapeDBA - Generate the clustering results of the paper. First in this line of the constants.py file, you can choose which datasets to use in the study by edditing the
UNIVARIATE_DATASET_NAMES_2018
list variable.
Then you can produce the results by running the following command:python3 main.py data_clustering
We compared the usage of Kmeans with Euclidean Distance, DBA, softDBA and ShapeDBA as well as the Kshape algorithm following the ARI metric and the running time.
We present in what follows both the Multi-Comparison Matric (MCM) and the Critical Difference Diagram (CDD) of both studies.
numpy==1.24.3
tslearn==0.5.3.2
matplotlib==3.7.1
cython==0.29.34
pandas==2.0.1
sklearn==1.2.2
scipy==1.10.1
If you use this work please cite the following corresponding paper:
@inproceedings{ismail-fawaz2023shapedba,
author = {Ismail-Fawaz, Ali and Ismail Fawaz, Hassan and Petitjean, François and Devanne, Maxime and Weber, Jonathan and Berretti, Stefano and Webb, Geoffrey I and Forestier, Germain},
title = {ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging},
booktitle = {ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data},
city = {Turin},
country = {Italy},
year = {2023},
}
This work was supported by the ANR DELEGATION project (grant ANR-21-CE23-0014) of the French Agence Nationale de la Recherche. The authors would like to acknowledge the High Performance Computing Center of the University of Strasbourg for supporting this work by providing scientific support and access to computing resources. Part of the computing resources were funded by the Equipex Equip@Meso project (Programme Investissements d’Avenir) and the CPER Alsacalcul/Big Data. The authors would also like to thank the creators and providers of the UCR Archive.