Skip to content

A Systematic Review: Self-Supervised Contrastive Learning for Medical Time Series

Notifications You must be signed in to change notification settings

DL4mHealth/Contrastive-Learning-in-Medical-Time-Series-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review

Hits

This is for the survey paper Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review which was published in Sensors in 2023.

Authors: Ziyu Liu (ziyu.liu2@student.rmit.edu.au), Azadeh Alavi (azadeh.alavi@rmit.edu.au), Minyi Li (liminyi0709@gmail.com) and Xiang Zhang (xiang.zhang@uncc.edu)


Summary:

We carefully reviewed 43 papers in the field of self-supervised contrastive learning for medical time series. Specifically, this paper outlines the pipeline of contrastive learning, including pre-training, fine-tuning, and testing. We provide a comprehensive summary of the various augmentations applied to medical time series data, the architectures of pre-training encoders, the types of fine-tuning classifiers and clusters, and the popular contrastive loss functions. Moreover, we present an overview of the different data types used in medical time series, highlight the medical applications of interest, and provide a comprehensive table of 51 public datasets that have been utilized in this field. In addition, this paper will provide a discussion on the promising future scopes such as providing guidance for effective augmentation design, developing a unified framework for analyzing hierarchical time series, and investigating methods for processing multimodal data. Despite being in its early stages, self-supervised contrastive learning has shown great potential in overcoming the need for expert-created annotations in the research of medical time series.

This repo includes:

  • Timeseries_augmentations.ipynb: The implementation of time series augmentations file, this file augments the time series data at sample-level. We will release the code that can achieve augmentation at batch-level and dataset-level later.
  • Summarized Datasets Table: Summary of medical time series (e.g., physiological signal) public datasets that are used in the reviewed papers. The datasets are ordered by the data type. (Identical to the version in the paper except for the citation of the dataset)
  • Summarized Studies Table: An extended summary table of the 43 reviewed papers, including title, author/year, challenges, contributions, scenario/task/findings, datasets, preprocessing/perturbation, model, performance and link to their implementation codes (if publically released).

Citation

If you find this paper useful for your research, please consider citing it:

  @article{liu2023self,
   title={Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review},
   author={Liu, Ziyu and Alavi, Azadeh and Li, Minyi and Zhang, Xiang},
   journal={Sensors},
   volume={23},
   number={9},
   pages={4221},
   year={2023},
   publisher={MDPI}
 }

Summarized Datasets Table

Dataset Data type # of Subjects Frequency(Hz) # of Channels Task
MotionSense Acceleration,
Angular velocity
24 50 12 Activity Recognition
HHAR Acceleration,
Angular velocity
9 50-200 16 Activity Recognition
MobiAct Acceleration,
Angular velocity
57 20 6 Activity Recognition
UCI HAR Acceleration,
Angular velocity
30 50 6 Activity Recognition
PSG dataset Acceleration,
HR, Steps
31 - Acceleration: 3
HR: 1
Steps: 1
Sleep study
Dutch STAN trial CTG 5,681 - - Fetal monitoring
DiCOVA-ICASSP
2022 challenge
Cough,
Speech,
Breath
- 44.1k - COVID detection
CODE ECG 1,558,415 300-1000 12 ECG abnormalities detection
Ribeiro ECG 827 12 Automatic diagnosis of ECG
PhysioNet 2020 ECG 6877 12 ECG classification
MIT-BIH Arrhythmia
Database
ECG 47 125 2 Cardiac arrhythmia study
PhysioNet 2017 ECG 8528 (recordings) 300 1 AF(ECG) Classification
CPSC 2018
(ICBEB2018)
ECG 6,877 500 12 Heart diseases study
PTB diagnostic ECG ECG 290 125 14 Heart diseases study
Chapman-Shaoxing ECG 10,646 500 12 Cardiac arrhythmia study
Cardiology ECG 328 1 Cardiologist-level arrhythmia detection and classification
PTB-XL ECG 18,869 100 12 Heart diseases study
MIT-BIH-SUP ECG 78 128 - Supplement of supraventricular arrhythmias in the MIT-BIH Arrhythmia Database
Physiological
Synchrony
Selective
Attention
ECG,
EEG,
EDA
26 1024 EEG:32
EDA: 2
ECG:2
Attention focus study
Sleep Heart
Health Study
(SHHS)
ECG,
EEG,
EOG,
EMG,
SpO2,
RR
9,736 EEG: 125
EOG: 50
EMG: 125
ECG: 125/250
SpO2: 1
RR: 10
EEG: 2
EOG: 2
EMG: 1
ECG: 1
SpO2: 1
RR: 1
Sleep-disordered breathing study
AMIGOS ECG,
EEG,
GSR
40 ECG: 256
EEG: 128
GSR: 128
ECG: 4
EEG: 14
GSR: 2
Emotional states recognition
MPI LEMON EEG,
fMRI
227 EEG: 2500 EEG: 62 Mind-body-emotion interactions study
PhysioNet 2016 ECG, PCG 3,126 2000 2 Heart Sound Recordings classification
WESAD ECG, Temperature,
Blood volume pulse,
Acceleration, etc.
15 ECG: 700 ECG: 1 Wearable Stress and Affect Detection
SWELL ECG, Facial expressions, etc. 25 ECG: 2048 - Work psychology study
The Fenland study ECG, HR,
Acceleration, etc.
2100 - - Obesity, type 2 diabetes, and related metabolic disorders study
SEED EEG 15 200 62 EEG-based Emotion Recognition
TUSZ
(The TUH EEG
Seizure Corpus)
EEG 315 250 21 Seizure study
TUAB
(TUH EEG
Abnormal Corpus)
EEG 564 250 21 ECG abnormalities study
EEG Motor
Movement/Imagery
Dataset
EEG 109 160 64 General-Purpose
BCI Competition
IV-2A
EEG 9 250 22 Motor-Imagery classification
Sleep-EDF Database
Expanded
(Sleep-EDFx)
EEG 197 100 2 Sleep study
MGH Sleep EEG 2621 200 6 Sleep study
MI-2 Dataset EEG 25 200 62 Motor-Imagery classification
EPILEPSIAE EEG 275 250 - Seizure study
UPenn&Mayo Clinic's
Seizure Detection
Challenge
EEG (Intracranial) 4 dogs
8 human patients
400 16 Seizure study
Dreem Open Dataset
-Healthy
(DOD-H)
EEG (PSG data) 25 250 12 Sleep study
Dreem Open Dataset
-Obstructive
(DOD-O)
EEG (PSG data) 55 250 8 Sleep study
DREAMER EEG,
ECG
23 ECG: 256 ECG: 4 Affect recognition
Montreal Archive
of Sleep Studies
(MASS)
EEG, EMG, EOG 200 256 16-21* Sleep study
PhysioNet 2018 EEG, EOG, EMG, EKG, SaO2 1,985 200 5 Diagnosis of sleep disorders
ISRUC-SLEEP EEG, EOG, EMG, EKG, SaO2 100/8/10* EEG: 200
EOG: 200
EMG: 200
EKG: 200
SaO2: 12.5
EEG: 6
EOG: 2
EMG: 2
EKG: 1
SaO2: 1
Sleep study
Sleep-EDF EEG, EOG, chin EMG 20 100 2
MIT DriverDb EKG, EMG, EDA, PR 17 EKG: 496
EMG: 15.5
EDA: 31
PR: 31
EKG: 1
EMG: 1
EDA: 1
PR: 1
Stress detection
High time Resolution
ICU Dataset
(HiRID)
ICU 33,000+ -
eICU ICU - - ~160 variables -
PhysioNet 2012 ICU 12,000 - 37 Mortality prediction
MIMIC-III ICU 4000+ - - -
PhysioNet 2022 PCG 1,568 4000 5 Heart Murmur Detection
ICBHI 2017 Respiratory sound 126 4,000 1 Computational lung auscultation
LibriSpeech
dataset
Voice 251 16k - Speech Recognition
mPower data Voice, Walking kinematics 3,101 (walking kinematics) - - Parkinson disease study though mobile data

Summarized Studies Table

Title Author (Year) Challenge Contribution Scenario/task/findings Datsets Preprocessing/perturbation Model Performance Code
First Steps
Towards
Self-Supervised
Pretraining
of the
12-Lead ECG
Gedon et al.
(2022)
Discover a supervision signal from the data itself for self-supervised represenation learning 1) Define a self-supervised learning task and pretraining procedure which
can learn generalizable features of ECG data,
2) Develop and show that a ResNet based architecture can successfully be
used in combination with our learning task.
ECG reconstruction and
(anomalies)classification;
Pretraining on the CODE
training dataset,
Use transfer learning
with the ECG benchmarks:
PTB-XL and CPSC dataset;
CODE,
CPSC 2018,
PTB-XL
U-ResNet:
ResNet + encoder-decoder
+ channel-wise dense layer
+ U-Net based skip-connections.
Downstream task(classification):
encoder (no bottleneck layer,
no U-Net skip connections) +
linear classifier
AUC:
CPSC:
+PT:
0.954;

PTB-XL:
+PT:
0.919
-
Self-supervised
representation
learning from
12-lead ECG data
Mehari et al. (2022) Label scarcity in ECG data 1. Comprehensive assessment of self-supervised representation learning for
12-lead ECG data to foster measurable progress.
2. Compare instance-based self-supervised methods and contrastive
forecasting methods.
3. Modify the CPC architecture and training procedure for
performance improvements.
4. Evaluate downstream classifiers finetued from self-supervised models to
training from scratch.
Assessment of self-supervised
representation learning from
clinical 12-lead ECG data:
-data efficiency
(downstream performance
number of folds used
in finetuning);
-quantitative performance
(macro AUC);
-robustness
(influence of physiological
on downstream performance)
Pretraining:
CinC2020,
Chapman,
Ribeiro

Evaluation:
PTB-XL
Modified CPC (4FC+2LSTM+2FC);

Compared with:
Supervised (4FC+2LSTM+2FC)
Supervised (xresnet1d50)
SimCLR(RRC, TO)(xresnet1d50)
SimCLR physio(xresnet1d50)
BYOL(RRC,TO)(xresnet1d50)
BYOL physio. (xresnet1d50)

*Physiological noise,
(RRC, TO) are transformations
Macro AUC
(on PTB-XL):

Modified
CPC:
Linear:
0.9272
fine-tuned:
0.9418

Link
Semi-Supervised
Contrastive Learning
for Generalizable
Motor Imagery
EEG Classification
Han et al.
(2021)
Label scarcity in ECG data 1. A semi-supervised framework with a combination of self-supervised
contrastive learning and adversarial training.
2. Semi-supervised learning structure with contrastive learning for
unlabelled data.
3. Adversarial training to disentangle the subject/session-specific
information from the desired MI information in the latent representation.
BCIC IV 2a
MI-EEG dataset
from the
MOABB library
Filtered between 4Hz and 40Hz,
converted it into microvolt.
all 22 channels of the EEG
and the entire 4 seconds of
the trial windows. T
he EEG windows were then
resampled from 250Hz to
128Hz resulting in a length
of 512 sample points for
each window and processed
through channel-wise z-score
normalisation.
Augmentation-based contrastive loss
+ task classification loss +
domain discriminator loss
EEGNet, DeepConvNet as the encoder
Semi-Deep
ConvNet:
10%: 67.6
20%: 74.3
50%: 77.4
100%: 79.4
-
Self-Supervised
Representation
Learning
from
Electroence-
phalography
Signals
Banville et al.
(2019)
Supervised models are limited by the cost - and
sometimes the impracticality - of data collection and labeling
1. Propose self-supervised strategies to learn end-to-end features from
unlabeled time series such as EEG.
2. Two temporal contrastive learning tasks refer to as “relative positioning”
and “temporal shuffling”.
3. On a downstream sleep staging task, outperform traditional unsupervised
and purely supervised approaches, specifically in low-data regimes.
Demonstrate that contrastive
learning tasks based on
predicting whether time windows
are close in time can be used
to learn EEG features that
capture multiple components
of the structure underlying
the data (time windows close
in time should share the
same label)
Sleep EDF,
MASS session3
Both:raw EEG ->4th-order FIR
lowpass filter (20-Hz cutoff
frequency and Hamming window)
MASS: downsampled to 128Hz,
extracted non-overlapping
30-s windows, windows were
normalized
(to focus on Fz,Cz and Oz
channels.)
Pre-tain: sample pairs of
time windows (RP: x_t,x_t';
TS: triplets: x_t, x_t', x_t'')
+ feature extractor (CNN) +
contrastive module aggregate
the feature representation of
each window (element-wise
absolute difference)

Finetuning: feature extractor (CNN)
+ linear context discriminative model
Average
per-class
recall:
RP: 76.66
TS: 75.9

EEG
features:
79.43
Fully
supervised:
72.51
-
Anomaly Detection
on
Electroence-
phalography
with
Self-supervised
Learning
Xu et al.
(2020)
1. Hand crafted features could omit potentially discriminative feature;
2. Labeling of EEG signals of the state of epilepitc seizures have become bottleneck in applying deep learning;
3. Individual differences of patients with epilepsy and certain abnormal brain activities share with other brain dieases. generalize issue
1. A new self-supervised learning method based on only normal EEG data
is proposed particularly for detection of any abnormal signal in EEG data.
2. A simple and effective method is proposed to generate the self-labeled data
for self-supervised learning, in which different labels correspond to different
scaling transformations on EEG data.
3. Performs significantly better than existing wellknown anomaly
detection approaches, and is robust to varying model structures
and hyperparameters settings.
Higher-frequency signals
in an abnormal EEG data would
probably mislead
the classifier to predict an
incorrect scaling transformation.
UPenn and
Mayo Clinic's
seizure detection
challenge dataset
Generation of self-labeled
EEG data:
Each sequence of EEG data
matrix X_i -> K scalling
transformations -> a longer
sequence s_k *d (number of
values in the original sequence)
-> for each scale s_k, all
the formed new sequences
are collected to form a new
scaled EEG data T_k(X_i)
CNN classifier for prediction
of scaling transformations:
Input: self-labeled dataset;
Output: K values, each
representing the probability
of one scaling transformation.
Cross entropy -> classifier
output, ground truth scaling
transformation (one-hot vector)

Anomaly detection:
Difference between predicted
scaling and ground truth scaling
indicates the degree of abnomality
of new EEG
AUC:
ResNet34:
0.941

Ablition
study on
kernel shapes
(this paper
proposed 33
compared to
1
3):
Backbone:
ResNet34:
0.943
VGG19:
0.960
-
Contrastive Representation
Learning for Electroencephalogram
Classification
Mohsenvand et al.
(2020)
Hand-crafted feature; deep learning in supervised manner restricts the use of learned features to specific task; labeling EEG is cumbersome and requires years of medical training and experimental design; labeled EEG data is limited and existing dataset are small; existing dataset use incompatible EEG setups (different number of channels, sampling rates, types of sensors, etc.) hard to fuse to larger dataset for unsupervised learning. 1. Combine multiple EEG datasets,
2. Use the uderlying physics of EEG signals to multiply
the number of samples (quadratic increase),
3. Learn representations in a self-supervised manner via
contrastive learning without requiring labels.
1. Emotion recognition.
2. Normal/abnormal classification.
3. Sleep-stage scoring.
1. SEED dataset
(ER)
2. TUH dataset
(NAC)
3. SleepEDF
(SSS)
Channel recombination:
By subtracting two channels,
one obtains a new channel
that represents the voltage
difference between the two
sensors, resulting in another
physiologically valid channel.
Preprocessing: resampled all
datasets to 200Hz and applied
a fifth-order band-pass
Butterworth filter (0.3-80 Hz).
removed the channels that
involved voltages higher
than 500 μVs as they normally
represent artifacts.
To train the encoder,
cut the channels into
chunks of 20 seconds
Channel augmenter: each channel,
randomly applies two of the
augmentations to form a positive pair.
Channel encoder: recurrent encoder,
convolutional encoder.
Projector: downsampling and bidirectional
LSTM units--each direction output
concatenated and fed into dense
layers with a ReLU activation in between.
Contrastive loss: NT-Xent.

Downstream tasks: Classifier: discard
the projector and use a classifier
almost identical to the projector:
fine-tuned
SeqCLR:
1. (C) 50%:
85.21

2. (C) 50%:
87.21

3. (R) 50%:
83.72
-
Forecasting adverse surgical
events using self-supervised
transfer learning for
physiological signals
Chen et al.
(2021)
1. Availability of training data, lack sufficient data or computational resources.
2. Patient privacy considerations mean that large public EHR datasets are unlikely, leaving many institutions wiht insufficient resources to train performant models on their own.
Improves predictive accuracy by leveraging deep learning to embed
physiological signals. using LSTMs, embeds physiological signals
prior to forecasting adverse events with a downstream model.
Shares models rather than data to address data
insufficiency and improves over alternative methods.
By transferring performamt models as has been done in medical
images and clinical text, scientists can collaborate to improve
the accuracy of predictive model without exposing patient data.
Utilize fifteen physiological signal
variables and six static variable
inputs to forecast six possible
outcomes: hypoxemia, hypocapnia,
hypotension, hypertension,
phenylephrine administration,
and epinephrine administration.
Two OR datasets
(private)

ICU dataset
(MIMIC dataset)
LSTM for representation learning,
followed by fully connected layer as
downstream predictor. Use observations
in previous 1 hour to predict next 5 mins.
- Link
T-DPSOM: An Interpretable
Clustering Method for
Unsupervised Learning
of Patient Health States
Manduchi et al.
(2021)
Traditional clustering methods have poor performance on high-dimensionality dataset -> dimensionality reduction and deature transformation to obtain low-dimentional representation of the raw data (easier to cluster) -> cluster feature lie in a latent space, can not be easily visualized or interpreted or investigating the relationship between clusters. Self-Organizing Map is a clustering method that provides such an interpretable representation. 1. A deep clustering architecture conbines a VAE with a novel SOM-based
clustering objective.
2. An extension of this architecture to time series, improving clustering
performance, enabling temporal forecasting.
3. Showing superior performance on static image data and medical time series (ICU).
4. Cluster patientis into different sub-phenotypes and gain better understanding
of disease patterns and individual patient health states.
A useful tool to understand and
track patient health states in the ICU.
MNIST,
Fashion-MNIST,
eICU dataset
For eICU: use vital sign(d=14)
and lab measurements(d=84)
resampled to a 1-hour based
grid using forward filling
with population statistics
from training set if no
measurements were available
prior to the time point.
From ICU stays:
3 days<include< 30days,
or has gap in continuous
vital sign monitoring.
Overall data dimension d=98.
The last 72 hours of
multivariate time series
were used for the experiments.
As labels, use a variant of
the current dynamic APACHE.
A data point 𝑥𝑖 is mapped to a continuous
embedding 𝑧𝑖 using a VAE. In T-DPSOM,
the embeddings 𝑧𝑖,𝑡 for 𝑡 = 1,...,𝑇 are
connected by an LSTM, which predicts
the embedding 𝑧𝑡 +1 of the next time step.
clustering
NMI:
0.1115
+-0.0006
Link
CLOCS: Contrastive Learning
of Cardiac Signals Across
Space, Time, and Patients
Kiyasseh et al.
(2021)
1. Propose a family of patient-specific contrastive learning methods, that exploit
both temporal and spatial information present in ECG signals.
2. Outperforms state-of-the-art methods, BYOL and SimCLR, when performing a linear
evaluation of, and fine-tuning on, downstream tasks involving cardiac arrhythmia
classification.
Downstream task: cardiac
arrhythmia classification

Human physiology where abrupt
changes in cardiac function
(on the order of seconds)
are unlikely to occur.
multiple leads
(collected at the same time)
will reflect the same
underlying cardiac function.
PhysioNet 2020,
Chapman,
PhysioNet 2017,
Cardiology
Gaussian, Flip, SpecAugment Pre-train: Contrastive Multi-segment
Coding; Contrastive Multi-lead Coding,
Contrastive Multi-segment Multi-lead Coding

Downstream task: 1)Linear Evaluation of
representation (pre-train, fine-tune same
dataset); 2)Transfer capabilities of
representations (pre-train, fine-tune
different dataset)
AUC:
1)CMSC
(Chapman):
0.896+-0.005
CMSC
(PhysioNet2020):
0.715+-0.033
2)CMSC(
Chapman+
PhysioNet2020):
0.83+-0.002
CMSC
(PhysioNet2020+
Chapman):
0.932+-0.008
CMSMLC
(PhysioNet2020+
PhysioNet2017):
0.774+-0.012
Link
Segment Origin Prediction:
A Self-supervised Learning
Method for Electrocardiogram
Arrhythmia Classification
Luo et al. (2021) 1. Lack of well-annotated labels,
2. Compared to random weight initlization, pre-trained model weights can help to allivate overfitting
Develop a new augmentation: reorganization. Single-lead ECG classification:
heart arrithmia detection
PhysioNet2017,
CPSC2018
Discrete wavelet
transform (DWT) for
denoising
One framework with 6 different methods

as encoder structure.
Innovation: a new
augmentation (reorganization).
Take two ECG segments/peaks
from a pool of segments:
if the two taken segments are from
the same recording, assign it psudo label 1;
otherwise, assign psudo label 0.

A classifier for psudo label prediction
serves as supervision signal for pretraining.
PhysioNet2017
for pre-train;
CPSC2018 for
fine-tuning/
test.
F1 score:
0.875
-
Learning Unsupervised
Representations for
ICU Timeseries
Weatherhead et al.
(2022)
1. Lack of labels in ICU time series
2. Allivate the effect of severe data imbalance
1. Improved TNC model by using autocorrelation encoding-based neighborhood defining.
2. Overcame the negative sampling bias, i.e., the selected negative sample
(far away from target sample) could have the same label with the target sample
ICU scenarios: mortality,
diagnostic groups, circulatory
failure, cardiopulmonary arrest
HiRID dataset
(public),
High-frequency
ICU (private)
Based on TNC: neighboring samples are
regarded as positive, otherwise negative.
The neighborhood is calculated/defined by
autocorrelation encoding (based on
Pearson correlation)
F1 score:
0.59 in HiRID
mortality;
0.61 in
diagnostic
group, 0.56
in
circulatory
failure,
0.77 in
cardiopulmonary
arrest
-
CROCS: Clustering and
Retrieval of
Cardiac Signals
Based on Patient
Disease
Class, Sex, and Age
Kiyasseh et al.
(2021)
Given a large, unlabelled clinical database,
1. How do we extract attribute information from such unlabelled instances?
2. How do we reliably search for and retrieve relevant instances?
1. A supervised contrastive learning framework, attracts representations
of cardiac signals associated with a unique set of patient attributes
to embeddings, entitled clinical prototypes.
2. Outperforms DTC, in the clustering setting and retrieves relevant
cardiac signals from a large database. At the same time,
clinical prototypes adopt a semantically meaningful arrangement and
thus confer a high degree of interpretability.
Clinical representation learning
and clustering(setting 1),
Clinical information
retrieval(setting 2)
Chapman,
PTB-XL
Chapman: cardiac arrhythmia
labels-> group into 4 major
classes
PTB-XL: disease label ->
group into 5 major classes.
Each dataset contains patient
sex and age information and
is split, at the patient level,
into training, validation,
and test sets. Each time-series
recording is split into
non-overlapping segments of
2500 samples (≈ 5s in duration),
as this is common for
in-hospital recordings.
Supervised clustering. ResNet18 Clustering:
1. cardiac
arrhythmia
class attribute:
CP CROCS(
Chapman)
acc: 90.3:
CP CROCS
(PTB-XL)
acc: 76.0
2. Sex and age
attributes:
Chapman: CP
CROCS(sex):
57.4; (age):
38.0
PTB-XL: CP
CROCS(sex):
73.5; TP
CROCS(age):
39.4
Retrieval:
check paper
-
Self-Supervised Graph
Neural Networks for
Improved
Electroencephalographic
Seizure Analysis
Tang et al.
(2022)
1. Representing non-Euclidean data structure in EEGs,
2. Accurately classifying rare seizure types,
3. Lacking a quantitative interpretability approach to measure model ability to localize seizures.
1. Representing the spatiotemporal dependencies in EEGs using a GNN and
proposing two EEG graph structures that capture the electrode geometry
or dynamic brain connectivity,
2. Proposing a self-supervised pre-training method that predicts preprocessed
signals for the next time period to further improve model performance,
particularly on rare seizure types,
3. Proposing a quantitative model interpretability approach to assess a
model’s ability to localize seizures within EEGs.
Seizure detection and
classification

Use self-supervised pre-training:
predict future 12 seconds to learn
task-agnostic representations and
improve downstream task (detection
and classification) performance
Temple
University
Hospital EEG
Seizure Corpus
(TUSZ),
a in-house dataset
Transform raw EEG to the
frequency domain, and obtain
the log-amplitudes of the
fast Fourier transform of
raw EEG signals.
Detection and self-supervised
pre-training: use both seizures
and non-seizure EEGs, obtain
the 12-s(60-s)EEG clips
non-overlapping 12-s(60-s)
sliding windows.
Classification: use only
seizure EEGs and obtain one
12-s(60-s) EEG clips from
each seizure event(such that
each EEG clip had exactly
one seizure type), use a
refined seizure classification
scheme: four seizure
classes in total.
Augmentation: a) randomly scaling,
b) randomly reflecting the signals
along the scalp midline.

Distance graph: represents the natural
geometry of EEG electrodes,
compute edge weight by applying
a thresholded Gaussian kernel
to the pairwise Euclidean distance
between electrodes.
Correlation graph: capture dy
namic brain connectivity, define
the edge weight as the absolute
value of the normalized cross-correlation
between the preprocessed signals.

Encoder: DCGRU-Diffusion
Convolutional Gated Recurrent Units
With
pre-training:
Seizure
detection
AUROC:
Dist-
DCRNN(12s):
0.866+-0.016
Dist-
DCRNN(60s):
0.875+-0.016

Seizure
classification
weighted
F1-score:
12s:
Dist-DCRNN:
0.746+-0.024
60s:
Corr-DCRNN:
0.749+-0.017
Dist-DCRNN:
0.749+-0.028
Link
Domain-guided
Self-supervision
of EEG Data
Improves Downstream
Classification
Performance
and
Generalizability
Wagh et al.
(2021)
Can we make encoders learn desirable physiological or pathological features through bespoke pretext tasks? 1. Propose SSL tasks for EEG based on the spatial similarity of brain activity,
underlying behavioral states, and age-related differences;
2. Present evidence that an encoder pretrained using the proposed SSL tasks shows
strong predictive performance on multiple downstream classifications;
3. Using two large EEG datasets, show encoder generalizes well to multiple EEG
datasets during downstream evaluations.
Downstream tasks:
EEG grade(normal, abnormal),
eye state(eye open, eyes closed),
age(young, old), and
gender(male, female)
classification
TUH EEG Abnormal
Corpus(TUAB),
MPI LEMON
Pre-text task:
Hemipheric symmetry(HS):
aug1-randomly flipping,
aug2-add Gaussian noise;
Behavioral state
estimation(BSE):
DBR-delta-beta power
ratio(proxy measure of
the subjects's
behavioral state);
Age contrastive(AC):
a triplet training tuple
constructed from 3 EEG
epochs:(X,X+,X_),
similarity measured by
Euclidean distance,
triplet loss.
(same age group
labeled similar).
Pre-training:
represented the EEG epochs by
2D images (topographical map of
the spectral power in a brain rhythm band)
->Resnet-18 backbone(feature extractor)
->three linear layers(projector)
-> three SSL pre-text task layer
-> multi-task loss


Fine-tuning:
(Resnet-18 backbone -> linear layer)
x 4 (four dowmstream tasks)
Binary classification
(AUC):
TUH:
BSE only
(eeg grade):
0.918(3e-4)
LEMON:
BSE-AC(Age):
0.987(1e-3)
HS-BSE-AC
(Gender):
0.803(8e-3)
Link
CLECG: A Novel
Contrastive Learning
Framework for
Electrocardiogram
Arrhythmia
Classification
Chen et al, (2021) Lack of annontations in ECG Contrastive learning framework for ECG pre-training Heart arrhythmia detection PTB-XL for
traning,
ICBEB2018
and
PhysioNet 2017
for fine-tuning
Augmentation:
Daubechies wavelet transform,
random crop/drop.

Encoder: xresnet101 backbone
+MLP projection head
F1 0.788 for
PhysioNet2017;
0.942 (F1)
on ICBEB2018
-
Self-Supervised
Learning with
Electrocardiogram
Delineation for
Arrhythmia Detection
Lee et al., (2021) Lack of annontations in ECG Propose a mixed schematic diagram by combining self-supervised representations
and manually extracted features for ECG delineation
Heart arrhythmia detection CPSC,
PT-BXL,
Shaoxing-Chapman
m-ResNet architecture F1:
With 10%
labels,
69.18 for
CPSC,
66.86 for
PTB,
81.49 for
Shaoxing
-
Towards Parkinson’s
Disease Prognosis
Using Self-Supervised
Learning and
Anomaly Detection
Jiang et al.
(2021)
1. No enough label to detect PK
2. PD is chronic disease that last for long time, the positive samples could be very diverse as they are collected span a long period.
Form PD detection as a task of anomaly detection. Use contrastive learning
to learn representations unsupervisely,
then detect PD with anomaly detection model.
PD detection mPower data Sensory signals are
downsampled to 10%
of original sampling
rate, to reduce
high frequency noise
CPC for SSL pre-training,
One-Class Deep SVDD for
anomaly detection.
AUC: 67.3 -
Detection of maternal
and fetal stress from
the electrocardiogram
with self-supervised
representation
learning
Sarkar et al.
(2021)
DL's utility in non-invasive biometric monitoring during pregnancy not well studied 1. Validated the chronic stress exposure by psychological inventory,
maternal hair cortisol and FSI(Fetal Stress Index).
2. Tested two variants of SSL architecture, one trained on the generic
ECG features for emotional recognition obtained from public datasets and
another transfer learned on private data. 3. Provides a novel source of
physiological insights into complex multi‐modal relationships between
different regulatory systems exposed to chronic stress.
Detection of maternal and
fetal stress from abdominal
ECG (the aECG was deconvoluted
into fetal and maternal
ECG-fECG, mECG)
AMIGOS,
DREAMER,
WESAD,
SWELL,
FELICITy
(private)
Performed minimal
pre-processing on
the raw data.
re-sampled ECG signals
to a sampling frequency
to 256 Hz, segmentation
into 10-s windows.
To remove the noisy
parts of aECG and mECG
data, utilized the SQI
values available with
the segments, SQI < 0.5
were discarded. resulted
in removing approximately
4.1% of total acquired
data with a standard
deviation.
Transformations: noise addition,
scaling, negation, temporal inversion,
permutation, time-warping
1. Signal transformation recognition
network (pre-train)
Transformed ECG -> three convolutional
blocks, each consists of two 1
D convolution layers with ReLU
and a max pooling layer
-> global max pooling
-> several fully connected layers
2. Affective recognition network
(fine-tune) Raw ECG-> Frozen network
->flatening layer
-> several FC layers
-> classification task &
regression tasks
Classification
(Detection of
stressed mothers):
AUROC:
FELICITy dataset:
(mECG) 0.931
Public dataset
(transfer
learning:
public+private
dataset):
(mECG) 0.982

Regression
(Prediction of
biomarkers):
Public datasets:
(mECG)
Cortisol: 0.931;
FSI: 0.946;
PDQ: 0.961;
PSS: 0.943.
Link
Self-supervised
transfer learning
of physiological
representations
from free-living
wearable data
Spathis et al.
(2021)
1. Label scarcity problem in wearable data;
2. Multimodal learning approaches rely on the modalities being used as parallel inputs, limiting the scope of the resulting representations.
1. The new pre-training task forecasts ECG-level quality HR in real-time by only
utilizing activity signals,
2. Leverage the learned representations of this model to predict personalized
health-related outcomes through transfer learning with linear classifers.
Set HR responses as the
supervisory signal for
the activity data,
predict personalized
health-related outcomes
The Fenland
study
(not public,
but can request)
Heart rate-noise
removal, accelerometer
data: auto-calibrated
to local gravity,
non-wear time was
inferred and participants
with less than 72 hours
of wear were removed.
Magnitude of acceleration
was calculated through
the Euclidean Norm Minus
One and the high-passed
fltered vector magnitude.
Both the accelerometry
and ECG signals-summarized
to a common time resolution
of one observation per 15
seconds. encoded the sensor
timestamps using cyclical
temporal features.
Input: X (sensors),
M (metadata), y (target HR)
Output : E ̃
(user-level embedding),
y ̃ (target variable)
network:
pass X through CNN & GRU layers;
pass M through reLU layers;
concatenate outputs in E;
forecast & backpropagate
with joint loss L;
use trained network to
extract embeddings E;
aggregate E to the user-level
E ̃ with average pooling;
train a linear model to
predict target variables y ̃;
Downstream:
traditional classifier
(A/R/T)=
acceleration
features/resting
heart
rate/temporal
features

outcome: sex
AUC:
step2heart
(A/R/T):
93.4

outcome:
height
AUC:
step2heart
(A/T):
82.1
Link
Supervised and
Self-Supervised
Pretraining
Based Covid-19
Detection Using
Acoustic
Breathing/Cough/
Speech
Signals
Chen et al.
(2022)
The amount of COVID-19 audio data in each sub-task (breath/cough/speech) is still limited, the traditional MFCC feature might be not sufficiently representative for classification tasks. 1. A supervised pre-training method, the model uses breath, cough and
speech to train three different models and obtain an average model
(used as an initialization model).
2. A self-supervised pre-training method, use the pre-tained
wav2vec2.0 model to extract high-level features, which are input
into the diagnosing model to replace the classic MFCC feature.
3. Ensemble the scores obtained by the two models
COVID-19 detection
(binary classification)
DiCOVA-ICASSP
2022 challenge
dataset
The amplitude of the
raw waveform is
normalized between
-1 to 1, cut off silent
segments, sound data is
downsampled to 16 kHz,
forty dimensional MFCC
and delta-delta
coefficients and
extracted with a window
of 25 msec audio samples
and a hop of 10 msec.
use SpecAugment
time-frequency mask to
augment the data
(due to small size
of the training data)
Model: two bi-directional
LSTM layers(encoder) +
two linear transformations
with a ReLU activation in
between(classifier)
Supervised pre-train:
average model (average
three BiLSTM task model)
as initialize of classifier.
Self-supervised pre-training:
wav2vec2.0 model (raw waveform
-> a CNN based encoder + a
transformer encoder +
a quantization model
discretizeds the output of
feature encoder as targets
in the contrastive objective.)
Ensemble: train two models,
ensemble the scores.
AUC: 88.44
on blind test
in the
fusion track
-
Contrastive
Predictive
Coding for
Anomaly
Detection
of Fetal
Health from
the
Cardiotocogram
de Vries et al.
(2022)
Low availiability of pathological data along with the high variability in pathologies and a scarcity of available labels 1. Extended the original CPC model by making stochastic, recurrent,
and conditioned (upon uterine contractions) predictions,
and a custom loss function.
2. Based on the detection of out-of-distribution behaviour and
deviations from subject-specific behaviour,
the proposed model is capable of achieving promising results
for identification of suspicious and anomalous FHR events in the CTG.
Detection of fetal health
from CTG
* CTG provides a temporal
recording of both the Fetal
Heart Rate (FHR) and
Uterine Contractions (UC)
Dutch STAN trial,
a healthy dataset
Fatal heart rate
signals and toco
data were pre-processed
to yield a constant
sampling frequency
of 4Hz by means of
linear interpolation
and subsequently
normalized using
the mean, and 98th
percentile of the
healthy dataset.
Before normalization,
toco signal was
filtered by a zero-phase,
4th order Butterworth
bandpass-filter with
cut-off frequencies
at 0.001 and 0.1 Hz.
(to eliminate offset
and high-frequency noise)
Contiditional CPC (Contrastive
Predictive Coding)
GRU (encoder) + 3 layer MLP
(predictor)
Use three past windows to
predict K=4
Nagetive pair: same signal
at different time
Training: only use the data
of healthy childern.
AUC: 0.96
(normal vs
anomalous)
average
correlation
coefficient
of 0.8+-0.13
with respect
to expert
annotations
-
Self-Supervised
Learning for
Anomalous Channel
Detection in EEG
Graphs:
Application to
Seizure Analysis
Ho et al.
(2022)
Lack of access to the labeled seizure data 1. A self-supervised method for identifying abnormal brain regions
and EEG channels without access to the abnormal
class data during the training phase.
2. Model brain regions and their connectivities using attributed graphs.
3. Employing contrastive and generative learning, propose
an augmentation approach to create the positive and negative pairs
to form contrastive and generative loss.
4. Define a channel-based anomaly score function
(linear combination of the contrastive and reconstruction loss)
Serizure detection
(no access to the
seizure data is needed)
TUSZ For a given eeg clip,
build four types of
EEG graphs:
Dist-EEG-Graph:
use Euclidean distance
between electrodes,
embed the structure
of electrode locations
in the graph's
adjacency matrix.
Rand-EEG-Graph:
randomly connection
of nodes(assume all
electroes are connected
and eqyally contribute
in brain activities,
so every edge has
the chance of present
in the graph)
Corr-EEG-Graph:
functional connectivity
between electrodes
(cross-correlation function,
top-3 neighborhood nodes)
DTF-EEG-Graph:
directed transfer
function graph, functional
connectivity of the
brian regions.
Positive and negative pair
sampling: 2 positive & 1
negative sub-graphs for every
node in every constructed EEG
graph(positive:first selected
an electrode as target node,
target code anonymize in positvie
subgraph(replace its feature
vector with an all-zero vector);
negative:first find the farthest
electrode from the target node)
Contrastive learning model:
pairs-> GNN encoder -> all embeddings
-> take avg over rows ->
obtian similiarity -> contrastive loss;
Generative learning model: (GNN
encoder) -> positive embeddings ->
GNN decoder(constracting the target
node anonymized in the positive
subgraphs, using other node features
and edges) -> reconstruction loss;
Specificity:
EEG_t-CGS:
0.989
* EEG_t
refers to
all four
graph types
are
concatenated
and fed to
the system
as the input
representing
the given
EEG clip
Link
A Contrastive
Predictive
Coding-Based
Classification
Framework for
Healthcare
Sensor Data
Ren et al.
(2022)
Annotating data consume a large amount of manpower and resources 1. Designing a contrastive predicting coding(CPC)-based pretext task
for medical sensor data classification,
redesigning the arrangement of positive sample pairs and negative pairs.
2. Design a lightweight downstream classification model,
further improve the classification accuracy.
1. Sleep stage classification
2. Arrhythmia classification
Sleep-EDF,
MIT-BIH-SUP
Positive sample pair
contians 8 different
samples belonging
to the sample category,
and the four left and
four right of the negative
sample pair belong to the
same categories, but the
left and right are different
categories.
Pretext: predict future (GRU)
CPC based model
Encoder: four blocks, each block:
a dense layer, a batch
normalization layer, an activation
layer, a dense layer.
Classification: 2 Conv1D layers,
Sleep:
macro avg
ACC:
88.7%

Arrhythmias:
ACC:
97.3%
-
A Contrastive
Learning Framework
for ECG Anomaly
Detection
Li et al.
(2022)
1. Unbalanced data
2. Lack robustness due to inconsistent ECG data representation
1. Effective sequence data augmentation methods are introduced to
ECG signal abnormal detection, aiming at alleviating category imbalance.
2. A new contrastive learning framework that address the challenge of
inconsistent data representation during model learning,
improve rubustness and accuracy.
ECG anomaly detection MIT-BIH
arrhythmia
dataset,
PTB
ECG signals were
preprocessed and
segmented. with each
segment corresponding
to one heartbeat.

Augmentation:
two methods:
BiLSTM-CNN,
TimeGAN,
(both used in
this model)
Contrastive learning:
Input->BiLSTM&TimeGAN-> Encoder->
Transformer(based on attention
mechanism with efficient parallel
computing capabilities)-> Non-linear
projection head->Maximize similarity
Detection:
input-> 2 layers of (Conv+Batch Norm)
-> max pool -> transformer
Arrhythmia:
ACC:96.3%

PTB
diagnostic
ECG:
ACC: 94.5%
-
Listen to your heart:
A self-supervised
approach for
detecting murmur
in heart-beat
sounds for the
Physionet
2022 challenge
Ballas et al.
(2022)
Lack of labels in ML tranining Propose two augmentation combinations to construct effective positive pairs Murmur classification,
and clinical outcome classification
PhysioNet 2016
and PhysioNet2022
challenge datasets
5 sec is a window,
50% overlapping

Augmentation:
View1:250Hz
high pass filtering
View 2: pollute
with uniform noise
and then upsampling
with 0.5 probability
CNN as encoder, 3-layer MLP
as prediction head.
0.606 in
F-score
in murmur
classification;
0.657 in
outcome
classification
(F1)
-
Weak self-supervised
learning for seizure
forecasting:
a feasibility study
Yang et al.
(2022)
Reduce the burden of manucal labeling Perform a feasibility study on seizure predeciton, which is identified as
an ideal test case, as pre-ictal brainwaves are patient-specific,
and tailoring models to individual patients is known to improve
forecasting performance significantly.
Seizure detection
and forecasting
TUH seizure,
EPILEPSIAE dataset,
RPAH dataset
(pravite)
12s window, ICA and
STFT are applied
to the EEG before
pre-trianed seizure
detection.
ICA is used for
removing EOG artefact.
STFT is then applied
to the clean EEG with
a 250 sample window(1s)
and 50% overlap.
DC component removed.
Same preprocessing
used on EEG for
prediction.
Forecasting model: pre-trained
with EPLIEPSIAE,
Detection model: pre-trained
with TUH,
Both model: 3 layers of
ConvLSTM, 2 layers of FC
(with sigmoid).
All three tests, both
pseudo-prospectively
inference-only real-time
tested on the RPAH dataset.
Average
relative
improvement
in
sensitivity
by 14.3%,
a reduction
in false
alarms by
19.6% in
early
seizure
forecasting.
Link
Contrastive
Heartbeats:
Contrastive
Learning for
Self-Supervised
ECG Representation
and Phenotyping
Wei et al.
(2022)
High cost of manual labels Propose a contrastive learning approach, to utilize the periodic and
meaningful patterns from ECG.
Cardiac arrhythmia classification MIT-BIH,
Chapman,
private
large-scale
ECG dataset
Exclude samples
with <48 bpm, within
the ten-second
measurement;
Positive pair:
the anchor heartbeat
with a positive
heartbeat(sample
from the same ECG);
Negative pair:
the anchor heartbeat
with a negative
heartbeat(sample
from other ECG).
Heartbeat extract in the
full-length ECG by the
Hamilton R-peak segmentation
algorithm;
Backbone model: Causal CNN;
Projector: additional fully
connected layer(project the
features of the anchor);
Loss: multi-similarity loss.
Linear
evaluation
on:
MIT-BIH:
ACC:
89.25;
(AUROC=
0.9424)
Chapman:
AUROC:
0.920

Semi-
supervised
learning:
(finetune use
partial labels)
MIT-BIH: ACC:
(50%) 0.9461
-
Practical
cardiac events
intelligent
diagnostic
algorithm for
wearable
12-lead ECG
via
self-supervised
learning on
large-scale
dataset
Yang et al.
(2022)
1. Collected 658,948 ECG, 164,538 were diagnosed,
and the remaining 493.948 ECGs were without diagnosis.
2. Train a Siamese network via contrastive learning, transferred the
pretained weights to downstream classification.
3. Designed four data augmentation operations for 1D digital myltilead ECG signals.
Cardiac events diagnostic
(55 cardiac events)
CPSC 2018,
large-scale
ECG dataset
(can not be
open-scourced)
5th order
Butterworth
high-pass filter,
with the lower
cutoff frequency
of 0.5 Hz.
Data augmentation:
1. frequency dropout;
2. crop resize;
3. cycle mask:
detect the position
of R peak and segment
the same position
in each heartbeat
to zero;
4. channel mask.
Momentum contrast(MOCO):
an encoder and a momentum
encoder, and a projection
head at the bottom of each
encoder.
On CPSC
2018:
F1 score:
0.839
Link
As easy as APC:
overcoming missing
data and class
imbalance in time
series with
self-supervised
learning
Wever et al.
(2021)
High levels of missing data and strong class imbalance Demonstrate how Autoregressive Predictive Coding (APC),
can be leveraged to overcome both missing data and class
imbalance simultaneouly without strong assumptions.
Overcome high missingness
and severe class imbalance
Synthetic
dataset,
Physionet
challenge
2012,
menstrual
cycle
tracking
app Clue
Encoder: GRU-D (GRU Decay)
APC
MaskedMSE
Physionet2012
(binary):
AUROC:
(GRU-APC
without
class
imbalance
method):
86.0+-0.5

Clue dataset
(multi-class
classification):
weighted F1:
(GRU-APC):
90.7+-0.1
Link
DeepClean:
Self-Supervised
Artefact Rejection
for Intensive Care
Waveform Data Using
Deep Generative
Learning
Edinburgh et al.
(2020)
Waveform physiological data in ICU are susceptible to artefacts, removal of artefacts reduced bias and uncertainty in clinical assessment and false positive rate of ICU alarms. 1. A prototype self-supervised artifact detection system using a
convolutional variational autoencoder deep neural network that
avoids manual annotation, requiring only
easily-obtained good data for training.
2. Can identify regions of artefact with high accuracy.
Artefact detection on
ICU waveform physiological data
ABP waveform data
from single
anonymised
patient
throughout a stay
Split the data
into 100-second
windows,
normalising
across the whole
dataset, sampled
uniformly within
the selected
windowto generate
10-second sample
to join the test
set (main contain
marked(abnormal
marked by expert)
sample).
VAE with CNNs for both
encoder and decoder.
Accuracy:
(mean)
VAE:
0.901
ROCAUC:
0.973
Link
SOM-CPC:
Unsupervised
Contrastive
Learning with
Self-Organizing
Maps for
Structured
Representations
of High-Rate
Time Series
Huijben et al.
(2022)
High-dimensional real-world data are difficult to interpret.
Deep learning aim to identify this manifold, but do not promote structure nor interpretability.
1. SOM-CPC, suitable for learning structured and interpretable
2D representations of high-rate time series by encoding subsequent
data windows to a topologically ordered set of quantization vectors.
2. Requires far less auxiliary loss function
(and associated hyperparameter tuning)
Clustering Synthetic
dataset,
subset 3
of MASS,
subset of
LibriSpeech
dataset
For MASS:
select three EEG
channels(F4, C4,
O2), two EOG channels,
one chin EMG derivaiton,
downsampled to 128Hz,
non-overlapping
30-second window.
Before downsampling,
all derivations
filtered with a
zero-phase 5th order
Butterworth band-pass
filter, another
zero-phase 5th order
Butterworth notch filter.
Channels normalized
within-patient and per
channel, yielding mean
substraction, and
normalization.
SOM-CPC
Encoder: CNNs
(details in appendix)
On sleep
dataset:
Purity:
0.79
NMI:
0.28
Cohen's
kappa:
0.67
l_2 smooth:
1.22+-0.21
TE: 0.042
-
Subject-aware
contrastive
learning for
biosignals
Cheng et al.
(2020)
Dataset for biosignals, limited labels and subjects 1. Apply self-supervised learning to biosignals.
2. Develop data augumentation techniques for biosignals.
3. Integrate subject awareness into the self-supervised learning framework.
1) subject-specific distribution to compute contrastive loss
2) promoting subject invariance through adversarial training
EEG decoding, ECG anomaly detection Physionet
Motor Imagery,
MIT-BIH
arrhythmia
Raw EEG/ECG data
for input.
Data transformations:
temporal cutout,
temporal delays,
noise, bandstop filtering,
signal mixing,
spatial rotation(exception),
spatial shift(exception),
sensor dropout,
sensor cutout(exception)
Encoder and momentum
encoder: 1d ResNet with
ELU activation and batch
normalization.
Project head and momentum
project head: 4-layer
fully-connected network.
Linear classification
using logistic regression
with weight decay.
EEG: ACC
Intersubject:
81.6+-0.8
(subject-
specific,
2 class)
Intrasubject:
79.6+-2.3
(subject-
invariant,
2 class)
ECG:
Overall: ACC:
subject-
specific:
93.2+-1.6
-
Sense and learn:
Self-supervision
for omnipresent
sensors
Saeed et al. (2021) Non-generalizble representations; Lack of annotations 1. Propose 7 data augmentation schemes
2. Design a framework that uses all 7 schemes at the same time to
learn generalizable representations
EEG, EOG, Heart rate,
Skin conductance, accelerometer,
gyroscope
HHAR, MobiAct,
MotionSense,
UCI HAR, HAPT,
Sleep-EDF,
MIT DriveDb,
WiFi CSI
Blend detection,
Fusion magnitude prediction,
Feature prediction from
masked window,
Transformation recognition,
Temporal shift prediction,
Modality denoising,
Odd segment recognition
CNN as backbone Kappa
scores.
HHAR:
0.826,
MobiAct:
0.89,
MotionSense:
0.907,
UCI HAR:
0.888;
HAPT:
0.820;
Sleep-EDF:
0.702;
MIT DriveDb:
0.804;
WiFi CSI:
0.798
-
Self-Supervised
Learning From
Multi-Sensor Data
for Sleep
Recognition
Zhao et al.
(2020)
1. Most of sleep recognition methods are limited to single-task recognition, which only involve single-modal sleep data.
2. Shortage and imbalance of sleep samples.
1. Study the problem of sleep recognition at three levels:
sleep position/sleep stage recognition, insomnia detection.
2. Self-supervised sleep recognition model(SSRM) is proposed for
multi-sensor sleep recognition.
Sleep position/sleep stage
recognition, insomnia detection
Sleep
Bioradiolocation
dataset,
Pressure
Map dataset,
PSG dataset
Normalize to [0, 1]
For pressure map:
rotation and frequency-domain
feature extraction to
generate temporary labels.
For PSG: preprocess and
extract four-dimensional
feature and count feature.
Prediction
probability
of CRF as
the final
accuracy.
Bio-radar:
99.03
Pressure-e1:
99.55
Pressure-e2:
98.92
PSG-2class:
95.91
PSG-3class:
78.69
PSG-4class:
71.01
-
Contrastive
Embeddind
Learning
Method for
Respiratory
Sound
Classification
Song et al.
(2021)
1. Difficulty of collectionand expensive manual annotation, only limited samples availabe.
2. Do not explicitly encourage intra-class compactness and inter-class separability between the learned embeddings.
Propose a contrastive embedding learning method, input a contrastive tuple,
learn the slight differences among similar samples, the easily confused
samples are more likely to be identified.
Respiratory sound classification ICBHI 2017 Resample audio recordings
to 16kHz and segment them
into respiratory circles
according to onsets and offsets.
Convert the circles to 46-dimension
log Melspectrograms with a window
size of 1024 over a 256-sample hop.
Augmentation: white noise
adding, time shifting,
time stretching and pitch
shifting
Encoder: CNN
Classifier: linear layer
(logistic regression)
ACC: 78.73 -
A Semi-Supervised
Algorithm for
Improving the
Consistency of
Crowdsourced
Datasets:
The COVID-19 Case
Study on Respiratory
Disorder
Classification
Orlandic et al.
(2022)
Labelling inconsistencies and label sparsity in the crowdsourced dataset.
(1. potentially noisy user label, 2. often contradictory expert labels)
1. Provide an automated approach for increasing the
labeling quality of biosignal datasets.
2. The subsample of cough audio recordings identified
through our SSL approach was made public
Respiratory disorder
classification/COVID-19 detection
COUGHVID
dataset
A cough classifier was used to
remove non-cough recordings.
Normalization (4-order Butterworth
lowpass filter; cutoff 6kHz) to
reduce high-frequency noise.
Isolate each individual cough event.
Discard any cough-sound candidates
shorter than 200ms, include 200ms
before and after the cough candidate
in each segment.
Supervised(classifier):
user model(based on user label),
expert 1,2,4 model(based on
labels of experts1,2 and 4)
SSL model: majority agreement
combines the knowledge form
both users and experts, to
identify a subset of high-confidence
samples->used to train on final
classifier, the rest were discarded.
User: Linear discriminant analysis;
Expert1,2,4: Logistic regression;
SSL: Logistic regression.
SSL:
Test AUC
0.763
-
BENDR:
Using Transformers
and a Contrastive
Self-Supervised
Learning Task to
Learn From Massive
Amounts of
EEG Data
Kostas et al. (2021) Less of generability: task-specific model is required Propose a framework with contrastive pre-tranining,
it can be used to different tasks/datasets.
MMI, BCIC,
ERN, SSC,
P300
Augumentation: CPC (predict the future) CNN+Transformer-based CPC MMI:
(86.7
in BAC),
BCIC:
42.6 in
Accuracy,
ERN
0.65 in
AUROC,
SSC:
0.72 in
BAC;
P300:
0.72 in
AUROC
-
Unsupervised
Anomaly
Detection on
Temporal
Multiway Data
Nguyen et al.
(2020)
Unsupervised temporal models employed thus far typically work on sequences of feature vectors, and much less on temporal multiway data. 1. Investigate the applications of matrix recurrent neural networks
for unsupervised anomaly detection for temporal multiway data.
2. Two anomaly detection settings (reconstruction and prediction) are examined,
and the empirical results on synthetic data, moving digits
and ECG readings are reported.
Temporal multiway anomaly
detection (looks for
irregularities over space-time)
Use reconstruction loss:
an abnormal sequence does
not exhibit the regularities,
it is hardly compressible,
and thus its reconstruction
error is expected to be higher
than the error in the normal
cases.(if a sequence is regular
(normal), the history may contain
sufficient information to predict
several steps ahead)
Synthetic data,
MNIST,
MIT-BIH
Arrhythmia
dataset
For MIT-BIH: manually pick 38 subjects
(have both MLII and V1 channels a
nd no paced beats).
For each univariate signal, the raw ECG
is detrended by first fitting a 6-order
polynomial and then subtracting it
from the signal, a 6-order Butterworth
bandpass filter with 5Hz and 15Hz range,
filtered signals are normalized
individually by Z-score normalization.
Pre-training:
(Matrix) LSTM AutoEncoder model:
encoder: matLSTM (compresses X
into C by reading one matrix at
a time)
decoder: matLSTM decompresses
the memory by predicting one
matrix at a time
anomaly: reconstruction loss
Fine-tuning:
(Matrix) LSTM Encoder-Predictor
predictive model:
anomaly score:
mean prediction error
matLSTM:
(for
predicting
5
heartbeats)
AUC:
92.5±0.1
F1:
72.8±0.2
-
Self-supervised
EEG Representation
Learning for
Automatic Sleep
Staging
Yang et al.
(2021)
1. Unlabeled and noisy data.
2. Existing negative sampling strategies often incur sampling bias.
1. Pretext task: address the inherent limitations of
negative sampling in the existing self-supervised methods
(e.g., MoCo2, SimCLR3) by leveraging global data statistics.
2. strengthen our model with an instance-aware world representation
for each sample, where closer samples are assigned larger weights.
Sleep stage classification SHHS,
Sleep EDF,
MGH Sleep
Subjects are randomly assigned to
the pretext group, training group,
test group with different proportions.
Augmentation: Bandpass Filtering,
Noising, Channel Flipping, Shifting.
ContraWR: Contrast with the World
Representation(generate an average
representation of the dataset, 𝒛𝒘
as the only contrastive information.)
ContraWR+: Contrast with
Instance-aware World Representation
(weighted average of the world/dataset,
where the weight is set higher for
closer samples.)
Classifier: training a separate
logistic regression model
(on top of the encoder) on data
from the training group (during
which the encoder is frozen)
and test on new recordings.
Projector: 2-layer fully
connected network.
Encoder: STFT (Short-Time
Fourier Transforms) module,
resulting STFT spectrogram passes
convolutional layer with batch
normalization (CNN-based encoder
is built on top of the spectrogram)
5 class
classification
ContraWR+:
Sleep EDF:
86.90±0.2288
SHHS:
77.97±0.2693
MGH Sleep:
72.03±0.1823

Baseline:
MoCo
SimCLR
BYOL
SimSiam
Link
Self-Supervised
Learning for
Sleep Stage
Classification
with Predictive
and Discriminative
Contrastive Coding
Xiao et al.
(2021)
1. Labeling work is costly and laborious interms of specialist eperience and manual work.
2. ground truth lables annotated by sleep experts can also be contradictory, bad influence on label-relied tasks.
3. Extracted representations by supervised models are not generalized.
1. The proposed SleepDPC framework is a pioneer to apply SSL
on sleep stage classification.
2. Proposed two learning principles, Predictive contrastive coding,
Discriminative contrastive coding, enable extract high-level semantics
(underlying rhythms and patterns) from raw EEG.
Sleep stage classification Sleep-EDF,
ISRUC
Combining PCC and DCC:
PCC(predictive contrastive coding):
other representation(at different
timestep) in the mini-batch are
considered as "unrelated"(negative),
DCC(discriminative contrastive coding):
representations in different segment
of a mini-batch are temporally
distant, as negative pair.
Pre-train:
encoder: CNN
aggregator: GRU and LSTM
predictor: not mentioned
Fine-tuning:
encoder and aggregator are frozen.
classifier: one-layer
fully-connected network.
SleepDPC
(10% labels)
SleepEDF:
Accuracy:
0.701±0.008
F1-macro:
0.640±0.015

ISRUC:
Accuracy:
0.536±0.015
F1-macro:
0.489±0.018
Link
CoSleep:
A Multi-View
Representation
Learning Framework
for Self-Supervised
Learning of Sleep
Stage
Classification
Ye et al.
(2022)
1. Large-scale labeled datasets are still hard to acquire
2. DPC operates discrimination at an instance level(treats each instance as a single class); seasonality of time-searies indicates that distant instances can be semantically close
1. Novel co-training scheme by exploiting complementary information from
time and frequency view of physiological signals to mine more positive samples.
2. Extend the framework with a memory module,
implemented by a queue and a moving-averaged encoder,
to enlarge the pool of negative candidates.
Sleep stage classification SleepEDF,
ISRUC
Use multi-instance infoNCE loss,
calculating loss function using
multiple positive samples. Select
the Top-K positive samples by time-
and frequency-domain similarities.
Pre-training:
Two encoders: CNN with residual
connections (ResNet);
aggregator: GRU/LSTM

Finetuning:
encoder and aggregator are freezed.
classifier: one-layer
fully-connected network.(10%label)
CoSleep:
SleepEDF:
ACC:
0.716±0.043
F1:
0.558±0.03

ISRUC:
ACC:
0.579±0.051
F1:
0.501±0.056
Link
A Self-Supervised
Learning Based
Channel Attention
MLP-Mixer Network
for Motor
Imagery Decoding
He et al.
(2022)
1. CNN for MI EEG decoding 's performance is generally limited due to the small size sample problem.
2. To address 1, EEG trials segment into small slices, usually inevitably losses the longrange dependencies of temporal information.
1. A new EEG slice prediction task as pretext task to capture
the long-range information in time domain.
2. In the downstream task, a MLP-Mixer is for classification
task for signal(rather than image)
3. An attention mechanism is integrated into MLP-Mixer to
estimate the importance of each EEG channel.
Motor Imagery (movement
imagination classification)
MI-2 Dataset,
BCIC-IV-2A Dataset
150-time points sliding window
(overlap of 10 points),
z-score normalization
on each slice.
Pretext task: 3 adjacent EEG slices
-> local encoder(1D CNN) ->
concatenation -> LSTM layers ->
conv and linear -> predicted EEG slice
Downstream task: EEG slice ->
Local encoder(with Weights from pretext)
-> Channel-attention MPL-Mixer(CAU&TMU)
-> Classifier(global average pooling
-> Linear layer -> Softmax -> Prediction)
MI-2:
ACC:
78.5±0.64
F1:
78.39±0.67

BCI-IV-2A:
ACC:
79.43±1.73
F1:
79.42±1.74
-
Self-supervised
Contrastive
Learning for
EEG-based
Sleep Staging
Jiang et al.
(2021)
Data shortage of supervised learning Propose a self-supervised contrastive learning for EEG
sleep staging classification, measures the feature
similarity if transformed signal pairs.
EEG-based sleep staging
classification
Sleep-edf,
Sleep-edfx,
Dod-O,
Dod-H
Transformations:
Sleep-edf: crop&resize + permutation;
crop&resize + crop&resize.
together: crop&resize + time warping;
crop&resize + permutation
SSL training:
input: transformed unlabelled data;
encoder: ResNet based;
positive pair: homologous pair;
negative pairs: others.
Fine tuning: classifier: FC layers.
Healthy
subjects:
Acc: 88.16;
F1: 81.96

Healthy
and subjects
with sleep
disorders:
Acc: 84.42;
F1: 78.95
Link
Self-Supervised
Contrastive
Pre-Training For
Time Series via
Time-Frequency
Consistency
Zhang et al.
(2022)
Lack of data labels Propose the assumption of Time-Frequency Consistency:
the information is taken in the time domain and in
the frequency domain is equivalent.
Sleep disorder,
Eplipsy detection,
Mechanical fault detection,
etc.
Time domain: shift, jittering, etc.
Frequency domain: adding/removing
frequency component
CNN-based encoder,
MLP-based projector
- Link

back to top

About

A Systematic Review: Self-Supervised Contrastive Learning for Medical Time Series

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published