Codebase for our Paper Deep Semi-supervised Learning (SSL) for Time Series Classification (TSC) to appear at the ICMLA '21
tldr: performance gains of semi-supervised models translate well from image to time series classification:
This framework allows the evaluation of the performance of SSL algorithms initially designed for image classification tasks on time series classification problems and their comparison with a different baseline models.
This pytorch-based codebase allows you to run experiments in a reproducible manner and to track and visualize your single experiments via mlflow.
The core of this framework are two sub-packages dl4d for data loading and sampling in a semi-supervised manner and ssltsc which contains different backbone architectures, baseline models and the semi-supervised learning strategies.
To control the hyperparameters and general arguments for the model runs, you want to use the config files specifying single experiments in ssltsc/experiments/config_files.
Hyperparameter tuning is possible based upon this config file syntax using Hyperband as implemented in optuna.
All models in this repository were developed using image classification datasets (Cifar10, SVHN) as comparison to validate the correctness of the code. This means, you can use it not only for semi-supervised time series classification but also as a starting point for semi-supervised image classification.
The core functionalities of this framework are also tested in a series of unit tests.
Run python -m unittest discover -s tests from the parent level of this repository to test those crucial parts of the framework via the unittest framework. CI will be integrated on top of these tests soon.
The following UML diagram gives a detailed overview on the different components of this framework: 
Install the requirements.txt in a clean python environment via pip install -r requirements.txt. Then install the module ssltsc by running pip install -e . from the parent level of this repository.
The following are some examples on how to train or tune different algorithms on different datasets using this framework. Datasets are downloaded to the folder data on the fly if they are used the first time. These code-snippets should be run from ssltsc/experiments. Then
To train a mixmatch model with an FCN backbone on the pamap2 Dataset for 1000 update steps storing the results in the mlflow experiment hello_mixmatch_fcn, run:
python run.py --config config_files/mixmatch.yaml --n_steps 1000 --dataset pamap2 --backbone FCN --mlflow_name hello_mixmatch_fcn
To verify the correct implementation of the virtual adversarial training (VAT) model on cifar10 with a wideresnet28 backbone run:
python run.py --config config_files/vat.yaml --dataset cifar10 --backbone wideresnet28
To run a Random Forest baseline based on features extracted via tsfresh from the SITS dataset on 250 labelled samples only, run:
python run_baseline.py --config config_files/randomforest.yaml --dataset sits --num_labels 250
And finally to tune the hyperparameters of the meanteacher model on the crop dataset for 10 hours on 1000 labelled samples, run:
python tune.py --config config_files/meanteacher.yaml --num_labels 1000 --time_budget 36000
All algorithms are stored in ssltsc.models. Currently, the following semi-supervised algorithms are implemented within this framework:
- Mixmatch by Berthelot et al. (2019)
- Virtual Adversarial Training by Miyato et al. (2017)
- Mean Teacher by Valpola et al. (2017)
- Ladder Net by Rasmus et al. (2016)
- Self-supervised Learning for TSC by Jawed et al. (2020)
and the following baseline models:
- Supervised baseline model
- Random Forest (based on features extracted via tsfresh)
- Logistic Regression (based on features extracted via tsfresh
All integrated datasets can be found at dl4d.datasets. This framework currently contain the following TSC datasets:
as well as these standard image classification datasets to validate the implementation
- Cifar10
- SVHN
