Selecting Data Augmentation for Simulating Interventions

by Maximilian Ilse (ilse.maximilian@gmail.com), Jakub M. Tomczak and Patrick Forré

Overview

PyTorch implementation of our paper "Selecting Data Augmentation for Simulating Interventions":

Ilse, M., Tomczak, J. M., & Forré, P. (2020). Selecting Data Augmentation for Simulating Interventions. https://arxiv.org/abs/2005.01856

Used modules

Python 3.6
PyTorch 1.0.1

Datasets

MNIST: http://yann.lecun.com/exdb/mnist/
PACS: https://domaingeneralization.github.io/

Pre-trained AlexNet

To reproduce our results on the PACS dataset, please use: https://drive.google.com/file/d/1wUJTH1Joq2KAgrUDeKJghP1Wf7Q9w4z-/view?usp=sharing

Story behind the paper

Everybody that works with medical imaging data eventually comes across the following problem: appearance variability. This variability is usually caused by the equipment used to generate medical imaging data, e.g., CT scanners from different vendors will generate images with different intensity patterns. If we train a CNN on data from a single scanner we are likely to overfit on the specific intensity pattern of the scanner. As a result, we are likely to fail to generalize to data from a different scanner.

In late 2018, we started to work on the problem of domain generalization/learning invariant representations motivated by the appearance variability in medical imaging data described above. In domain generalization, one tries to find a representation that generalizes across different environments, called domains, each with a different shift of the input.

This eventually led to a model that we called the Domain Invariant Variational Autoencoder (DIVA, https://arxiv.org/abs/1905.10427, thanks to my co-authors!). While the results of DIVA are promising, there were a couple of experiments that didn’t make it into the paper since the performance of DIVA didn’t match a simple baseline CNN. For a while, we thought it is probably due to optimization issues, etc. During 2019, we realized that we had a very poor understanding of the problem itself.

Questions and Issues

If you find any bugs or have any questions about this code please contact Maximilian. We cannot guarantee any support for this software.

Citation

Please cite our paper if you use this code in your research:

@article{ilse_selecting_2020,
	title = {Selecting {Data} {Augmentation} for {Simulating} {Interventions}},
	url = {http://arxiv.org/abs/2005.01856},
	urldate = {2020-05-06},
	journal = {arXiv:2005.01856 [cs, stat]},
	author = {Ilse, Maximilian and Tomczak, Jakub M. and Forré, Patrick},
	month = may,
	year = {2020},
	note = {arXiv: 2005.01856}

Acknowledgments

The work conducted by Maximilian Ilse was funded by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Grant DLMedIa: Deep Learning for Medical Image Analysis).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
PACS		PACS
colored_mnist		colored_mnist
rotated_MNIST		rotated_MNIST
synthetic_data		synthetic_data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Selecting Data Augmentation for Simulating Interventions

Overview

Used modules

Datasets

Pre-trained AlexNet

Story behind the paper

Questions and Issues

Citation

Acknowledgments

About

Releases

Packages

Languages

License

AMLab-Amsterdam/DataAugmentationInterventions

Folders and files

Latest commit

History

Repository files navigation

Selecting Data Augmentation for Simulating Interventions

Overview

Used modules

Datasets

Pre-trained AlexNet

Story behind the paper

Questions and Issues

Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages