Welcome to Listen2YourHeart

This is the official code for the "Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning" preprint paper which has been submitted for publication.

Note: Repo might change in the future based on revisions and comments but can nevertheless be utilized as is, for research purposes.

An overview of the proposed training and evaluation protocol is depicted in the below image:

Listen2YourHeart is an IntelliJ project. The root folder (the one where this file is) contains two main directories. The scripts dir holds all necessary scripts for a) downloading all data used in this project (scripts\datasets) and b) the source code for submitting pretraining and fine-tuning jobs to a SLURM workload manager. As it names suggests, the src dir contains the source code for everything mentioned in our paper.

***If not obvious, repo name and previous work inspiration explained here.

Project Structure

scripts: directory containing scripts for downloading datasets and submitting SLURM jobs.
src: directory containing source code of project
requirements.txt: a simple requirements file that is used to re-create the environment of the project.
generate-requirements.sh: convenience script for generating requirements and appropriate package dependencies.
pcg-ssl.iml: the project file for IntelliJ.
README.md: this file.

Datasets

The datasets used in this project are:

FPCGDB: Fetal PCG Database
EPHNOGRAM: A Simultaneous Electrocardiogram and Phonocardiogram Database
PASCAL: Classifying Heart Sounds Challenge
PhysioNet2016: Classification of Heart Sound Recordings: The PhysioNet/Computing in Cardiology Challenge 2016
PhysioNet2022: Heart Murmur Detection from Phonocardiogram Recordings: The George B. Moody PhysioNet Challenge 2022

Quick Start

1. Download data

The first thing to do is download all necessary data. To download each dataset you can run the following scripts and commands from a terminal:

# EPHNOGRAM
./scripts/download-ephnogram.sh

# FPCGDB
./scripts/download-fpcgdb.sh

# PASCAL
./scripts/download-pascal.sh

# PhysioNet2016
wget -r -N -c -np https://physionet.org/files/challenge-2016/1.0.0/

# PhysioNet2022
wget -r -N -c -np https://physionet.org/files/challenge-2022/1.0.0/

2. Specify Experiment Configuration

Once you have downloaded the data, the next step is to specify all parameters needed for the SSL pretraining and downstream task fine-tuning experiments.

To do that, edit the configuration file --> ./src/configuration/config.yml

3. Fully-Supervised Model

To train a fully supervised CNN, run the following:

SLURM

./scripts/hpc/submit_baseline.sh "{experiment_name}" "./src/configuration/config.yml"

Python

python -m src.training.pretrain --ds_path "{path_to_save_model}" --conf_path "./src/configuration/config.yml"

4. Contrastive Self-Supervised Models

To train the proposed model via SSL, you once again need to specify all necessary parameters in the configuration file (./src/configuration/config.yml).

The most important parameters are the 2 augmentation combinations, specified by the [ssl][augmentation1] and [ssl][augmentation2] keys in the .yml. We have provided examples on how to specify the transformations in the template config file. However, you can also use that template to generate all the combinations mentioned in our paper by running the following scripts:

Populate different augmentation configurations

# Populate config files for 0vs1 & 1vs1 augmentation combinations

python3 -m src.configuration.populate_configs --config_file "./src/configuration/config.yml" --export_dir "{export dir of the populated configs}"

# Populate config files for 1vs2 & 2vs2 augmentation combinations

python3 -m src.configuration.populate_configs_2vs_2 --config_file "./src/configuration/config.yml" --export_dir "{export dir of the populated configs}"

To train classifiers using the proposed evaluation framework, you may run the following:

SLURM

# Run multiple experiments with one or several configuration files
# The script locates configuration .yml files under the specified 
# dir and submits jobs to the slurm partition specified in 
# `./scripts/hpc/pretrain.sh` and `./scripts/hpc/downstream.sh`  

python3 -m scripts.hpc.submit_batch --conf_path "{config dir}"

Python

# Run single experiment with single config file
# All arguments explained in pretrain.py

python3 -m listen2yourheart.src.training.pretrain /
--tmp_path {tmp_path} /
--initial_epoch 0 /
--ssl_job_epochs 200 /
--ssl_total_epochs 200 /
--conf_path {conf_path}

5. Results

All results will be available in a .csv file, which is specified in the [results] key of the configuration .yml.

Framework Extension Ideas

Our codebase presents future researchers and practioners with the opportunity to extend our initial findings towards the development of novel approaches and methods.

Specifically, some initial thoughts on the above are the following.

Augmentation Development

All implemented augmentations and transformations are located in: ./src/augmentations/augmentations.py

By creating the appropriate python class, one can develop novel (or known augmentations which have not yet been implemented) transformations and apply them during SSL pretraining.

Once developed, the _create_augmentors function in ./src/dataset/generics.py should also change accordingly to facilitate the new transformation class and specification in the configuration . yml.

Additonal Data

By creating an additional directory in ./src/datasets/ and developing the appropriate functions, you can add additional datasets for model training or evaluation.

For unlabelled dataset loading, you can refer to the structure of ./src/datasets/fpcgdb/fpcgdb.py or ./src/datasets/ephnogram/ephnogram.py. For labeled datasets refer to the structure of one of the pascal, physionet2016challenge or physionet2022challenge directories and scripts.

The basic idea is that you load the signals as numpy arrays, which are then preproceesed and split into windows. These windows are then utilized to create tf.data.Dataset objects, that are finally fed to the model during training.

Different 1D Signal Types - Transfer Learning

The Listen2YourHeart framework can also be utilized for training SSL models on different 1D signals (e.g ECG, EEG, EDA, etc). You can alter the dataloaders for the different datasets and use the same pipelines for SSL model training or evaluation.

Model Development

The implemented neural network adopted in our research is a 5-layer 1D CNN, illustrated in the below figure.

However, you can of course consider using a different or novel architecture for the same task.

All models are specified in the ./src/models/ directory.

Loss Function

Finally, in our paper we chose to implement the NT-XENT contrastive loss introduced in the SimCLR framework::

You can of course select to change it and experiment with different contrastive-based losses.

References

If you use the above code for your research please cite our papers:

"Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning", currently under review.

@misc{ballas2023outofdistribution,
      title={On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals}, 
      author={Aristotelis Ballas and Vasileios Papapanagiotou and Christos Diou},
      year={2023},
      eprint={2312.00502},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

This repo is an extension of our initial work in "Listen2YourHeart: A Self-Supervised Approach for Detecting Murmur in Heart-Beat Sounds".

@INPROCEEDINGS{10081680,
  author={Ballas, Aristotelis and Papapanagiotou, Vasileios and Delopoulos, Anastasios and Diou, Christos},
  booktitle={2022 Computing in Cardiology (CinC)}, 
  title={Listen2YourHeart: A Self-Supervised Approach for Detecting Murmur in Heart-Beat Sounds}, 
  year={2022},
  volume={498},
  number={},
  pages={1-4},
  doi={10.22489/CinC.2022.298}
  }

License

This source code is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
assets		assets
scripts		scripts
src		src
AUTHORS.txt		AUTHORS.txt
LICENSE		LICENSE
README.md		README.md
generate-requirements.sh		generate-requirements.sh
listen2yourheart.iml		listen2yourheart.iml
random_seeds.txt		random_seeds.txt
requirements.txt		requirements.txt

License

aristotelisballas/listen2yourheart

Folders and files

Latest commit

History

Repository files navigation

Welcome to Listen2YourHeart

Project Structure

Datasets

Quick Start

1. Download data

2. Specify Experiment Configuration

3. Fully-Supervised Model

4. Contrastive Self-Supervised Models

5. Results

Framework Extension Ideas

Augmentation Development

Additonal Data

Different 1D Signal Types - Transfer Learning

Model Development

Loss Function

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages