# Dictionary based time series classification in sktime

Dictionary based approaches adapt the bag of words model commonly used in signal processing, computer vision and audio processing for time series classification. Like shapelet based algorithms, dictionary approaches use phase-independent subsequences by sliding a window over time series. However, rather than to measure the distance to a subsequence, as in shapelets, each window is transformed into a word, and the frequency of occurrence of repeating patterns is recorded. Algorithms following the dictionary model build a classifier by:

Extracting subsequences, aka windows, from a time series;
1. Transforming each window of real values into a discrete-valued \emph{word} (a sequence of symbols over a fixed alphabet);
2. Building a sparse feature vector of histograms of word counts, and
3. Finally using a classification method from the machine learning repertoire on these feature vectors.
The figure illustrates these steps from a raw time series to a dictionary model using overlapping windows.
<img src="./img/dictionary.png" width="600" alt="Dictionary based time series classification"> [<i>&#x200B;</i>](./img/tsc.png)

Dictionary-based methods differ in the way they transform a window of real-valued measurements into discrete words (discretization). Many methods are based on a symbolic representation called SFA. To create a discrete word from a window of continuous values in a series, SFA follows the following steps:
1. Values in each window are normalized to have standard deviation of 1.
2. The dimensionality of each normalized window reduced by the use of the truncated Fourier transform. The window is transformed using as fast Fourier transform, and only the first few coefficients are retained.
3. Each coefficient is discretized into a symbol from an alphabet a fixed size to form a word

Creating words from windows requires three parameters:
1. 'window_size' specifies how long each window is.
2. 'length' specifies the reduced series length in step 2.
3. 'alphabet_size' is the number of letters in the alphabet used in step 3.

These core parameters are often fixed internally. There are currently four dictionary based classifiers implemented in sktime, all making use of the Symbolic Fourier Approximation (SFA) \[1\] transform to discretise into words. These are the Bag of SFA Symbols (BOSS) \[2\], the Contractable Bag of SFA Symbols (cBOSS) \[3\], Word Extraction for Time Series Classification  (WEASEL) \[4\] and the Temporal Dictionary Ensemble (TDE) \[5\]. WEASEL has a multivariate extension called MUSE \[7\] and TDE has multivariate capability. We summarise their characteristics and give example usage in this notebook. More technical details are available in \[8\].


## 1. Imports and Load Data

In [5]:
from sklearn import metrics

from sktime.classification.dictionary_based import (
    IndividualBOSS,
    BOSSEnsemble,
    ContractableBOSS,
    MUSE,
    WEASEL,
    TemporalDictionaryEnsemble,
)
from sktime.datasets import load_basic_motions, load_italy_power_demand

X_train, y_train = load_italy_power_demand(split="train")
X_test, y_test = load_italy_power_demand(split="test")
X_test = X_test[:50]
y_test = y_test[:50]

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

X_train_mv, y_train_mv = load_basic_motions(split="train")
X_test_mv, y_test_mv = load_basic_motions(split="test")

X_train_mv = X_train_mv[:20]
y_train_mv = y_train_mv[:20]
X_test_mv = X_test_mv[:20]
y_test_mv = y_test_mv[:20]

print(X_train_mv.shape, y_train_mv.shape, X_test_mv.shape, y_test_mv.shape)

(67, 1, 24) (67,) (50, 1, 24) (50,)
(20, 6, 100) (20,) (20, 6, 100) (20,)


## 3. Bag of SFA Symbols (BOSS): `IndividualBOSS`, `BOSSEnsemble` and `cBOSS`

BOSS is an ensemble of individual BOSS classifiers making use of the SFA transform. `IndividualBOSS` has arguments for `window_size` ($w$) default 10, `word_length` ($l$), default 8, and `alphabet_size` ($\alpha$), default 4. Algorithms that use the `IndividualBOSS` classifier use ensembles that diversify their members through changing their parameters.

The `BOSSEnsemble` classifier is an ensemble of `IndividualBOSS` classifiers. It performs grid-search through a large number of combinations of `window_size` (default 10), `word_length` (default 8) and `alphabet_size`(boolean normalise each window) parameters. Of the classifiers searched only those within 92\% accuracy of the best classifier are retained. Individual BOSS classifiers use a non-symmetric distance function, BOSS distance, in conjunction with a nearest neighbour classifier. BOSS internally tunes so there are few parameters to be altered. Generally it should be run using default settings.

cBOSS significantly speeds up BOSS with no significant difference in accuracy by improving how the ensemble is formed. cBOSS randomly selects a set parameters for $w$, $l$ and $\alpha$, and keeps the best `max_ensemble_size` `IndividualBOSS` classifiers in the ensemble, where best means highest is estimated accuracy on the train data. The number of `IndividualBOSS` classifiers to keep in the ensemble and the number of randomly generated parameters to test are parameters `max_ensemble_size` (default 50) and `n_parameter_samples` (default 250). The `n_parameter_samples` parameter can be replaced with a maximum run time limit with the parameter `time_limit_in_minutes`. Setting this parameter will make the classifier randomly sample parameters for the specified amount of time. We call this capability contracting.

In [6]:
one_boss = IndividualBOSS(window_size=8, word_length=4, alphabet_size=6)
boss = BOSSEnsemble(random_state=47)
boss.fit(X_train, y_train)

boss_preds = boss.predict(X_test)
print("BOSS Accuracy: " + str(metrics.accuracy_score(y_test, boss_preds)))
cboss = ContractableBOSS(n_parameter_samples=250, max_ensemble_size=50, random_state=47)
cboss.fit(X_train, y_train)

cboss_preds = cboss.predict(X_test)
print("cBOSS Accuracy: " + str(metrics.accuracy_score(y_test, cboss_preds)))

BOSS Accuracy: 0.94
cBOSS Accuracy: 0.9


## 5. Word Extraction for Time Series Classification (WEASEL)

WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set than BOSS. It extends SFA by bigrams, feature selection using Anova-f-test and Information Gain Binning (IGB).

### Univariate

In [13]:
weasel = WEASEL(binning_strategy="equi-depth", anova=False, random_state=47)
weasel.fit(X_train, y_train)

weasel_preds = weasel.predict(X_test)
print("WEASEL Accuracy: " + str(metrics.accuracy_score(y_test, weasel_preds)))

WEASEL Accuracy: 0.98


### Multivariate

WEASEL+MUSE (Multivariate Symbolic Extension) is the multivariate extension of WEASEL.

In [14]:
muse = MUSE()
muse.fit(X_train_mv, y_train_mv)

muse_preds = muse.predict(X_test_mv)
print("MUSE Accuracy: " + str(metrics.accuracy_score(y_test_mv, muse_preds)))

KeyboardInterrupt: 

## 6. Temporal Dictionary Ensemble (TDE)

TDE aggregates the best components of 3 classifiers extending from the original BOSS algorithm. The ensemble structure and improvements of cBOSS\[3\] are used; Spatial pyramids are introduced from Spatial BOSS (S-BOSS)\[6\]; From Word Extraction for Time Series Classification (WEASEL)\[4\] bigrams and Information Gain Binning (IGB), a replacement for the multiple coefficient binning (MCB) used by SFA, are included.
Two new parameters are included in the ensemble parameter search, the number of spatial pyramid levels $h$ and whether to use IGB or MCB $b$.
A Gaussian processes regressor is used to select new parameter sets to evaluate for the ensemble, predicting the accuracy of a set of parameter values using past classifier performances.

Inheriting the cBOSS ensemble structure, the number of parameter samples $k$, time limit $t$ and max ensemble size $s$ remain as parameters to be set accounting for memory and time requirements.

### Univariate

In [None]:
# Recommended non-contract TDE parameters
tde_u = TemporalDictionaryEnsemble(
    n_parameter_samples=250,
    max_ensemble_size=50,
    randomly_selected_params=50,
    random_state=47,
)

# TDE with a 1 minute build time contract
# tde = TemporalDictionaryEnsemble(time_limit_in_minutes=1,
#                                 max_ensemble_size=50,
#                                 randomly_selected_params=50,
#                                 random_state=47)

tde_u.fit(X_train, y_train)

tde_u_preds = tde_u.predict(X_test)
print("TDE Accuracy: " + str(metrics.accuracy_score(y_test, tde_u_preds)))

### Multivariate

In [None]:
# Recommended non-contract TDE parameters
tde_mv = TemporalDictionaryEnsemble(
    n_parameter_samples=250,
    max_ensemble_size=50,
    randomly_selected_params=50,
    random_state=47,
)

# TDE with a 1 minute build time contract
# tde_m = TemporalDictionaryEnsemble(time_limit_in_minutes=1,
#                                 max_ensemble_size=50,
#                                 randomly_selected_params=50,
#                                 random_state=47)

tde_mv.fit(X_train_mv, y_train_mv)

tde_mv_preds = tde_mv.predict(X_test_mv)
print("TDE Accuracy: " + str(metrics.accuracy_score(y_test_mv, tde_mv_preds)))

#### References:

\[1\] Schäfer, P., & Högqvist, M. (2012). SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In Proceedings of the 15th International Conference on Extending Database Technology (pp. 516-527).

\[2\] Schäfer, P. (2015). The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery, 29(6), 1505-1530.

\[3\] Middlehurst, M., Vickers, W., & Bagnall, A. (2019). Scalable dictionary classifiers for time series classification. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 11-19). Springer, Cham.

\[4\] Schäfer, P., & Leser, U. (2017). Fast and accurate time series classification with WEASEL. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 637-646).

\[5\] Middlehurst, M., Large, J., Cawley, G., & Bagnall, A. (2020). The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

\[6\] Large, J., Bagnall, A., Malinowski, S., & Tavenard, R. (2019). On time series classification with dictionary-based classifiers. Intelligent Data Analysis, 23(5), 1073-1089.

\[7\] Schäfer, P., & Leser, U. (2018). Multivariate time series classification with WEASEL+MUSE. 3rd ECML/PKDD Workshop on AALTD.

\[8\] Middlehurst, M., Schäfer, P., & Bagnall, A. (2023). Bake off redux. COMING SOON TO AN ARXIV NEAR YOU!.