This GitHub repository contains the code for the reproducible experiments presented in our paper MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting.
We provide the code to run the experiments to generate the figures and tables from our paper, these can be found in figures.
To use MMD-FUSE in practice, we recommend using our mmdfuse
package, more details available on the mmdfuse repository.
python 3.9
Only the jax
and jaxlib
packages are required to run MMD-FUSE (see mmdfuse), several other packages are required to run other tests we compare against (see env_mmdfuse.yml and env_autogluon.yml)
In a chosen directory, clone the repository and change to its directory by executing
git clone git@github.com:antoninschrab/mmdfuse-paper.git
cd mmdfuse-paper
We then recommend creating and activating a virtual environment using conda
by running
conda env create -f env_mmdfuse.yml
conda env create -f env_autogluon.yml
conda activate mmdfuse-env
# conda activate autogluon-env
# can be deactivated by running:
# conda deactivate
The results of the six experiments can be reproduced by running the code in the notebooks: experiments_mixture.ipynb, experiments_perturbations.ipynb, experiment_perturbations_vary_kernel.ipynb, experiments_galaxy.ipynb, experiments_cifar.ipynb, and experiments_runtimes.ipynb.
The results are saved as .npy
files in the directory results
.
The figures of the paper can be obtained from these by running the code in the figures.ipynb notebook.
All the experiments are comprised of 'embarrassingly parallel for loops', significant speed up can be obtained by using parallel computing libraries such as joblib
or dask
.
- GalaxyMNIST: Galaxy Zoo DECaLS, Walmsley et al., 2021
- CIFAR-10: Learning Multiple Layers of Features from Tiny Images, Krizhevsky, 2009
- CIFAR-10.1: Do CIFAR-10 Classifiers Generalize to CIFAR-10?, Recht et al., 2018
The MMD-FUSE test is implemented as the function mmdfuse
in mmdfuse.py in Jax. It requires only the jax
and jaxlib
packages.
To use our tests in practice, we recommend using our mmdfuse
package which is available on the mmdfuse repository. It can be installed by running
pip install git+https://github.com/antoninschrab/mmdfuse.git
Installation instructions and example code are available on the mmdfuse repository.
We also provide some code showing how to use MMD-FUSE in the demo_speed.ipynb notebook which also contains speed comparisons between running the code on CPU or on GPU:
Speed in s | Jax (GPU) | Jax (CPU) |
---|---|---|
MMD-FUSE | 0.0054 | 2.95 |
Interpretable Distribution Features with Maximum Testing Power. Wittawat Jitkrittum, Zoltán Szabó, Kacper Chwialkowski, Arthur Gretn. (paper, code)
Learning Deep Kernels for Non-Parametric Two-Sample Tests. Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland. (paper, code)
MMD Aggregated Two-Sample Test. Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Grett. (paper, code)
AutoML Two-Sample Test. Jonas M. Kübler, Vincent Stimper, Simon Buchholz, Krikamol Muandet, Bernhard Schölkopf. (paper, code)
Compress Then Test: Powerful Kernel Testing in Near-linear Time. Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey. (paper, code)
If you have any issues running the code, please do not hesitate to contact Antonin Schrab.
Centre for Artificial Intelligence, Department of Computer Science, University College London
Gatsby Computational Neuroscience Unit, University College London
Inria London
@article{biggs2023mmdfuse,
author = {Biggs, Felix and Schrab, Antonin and Gretton, Arthur},
title = {{MMD-FUSE}: {L}earning and Combining Kernels for Two-Sample Testing Without Data Splitting},
year = {2023},
journal = {Advances in Neural Information Processing Systems},
volume = {36}
}
MIT License (see LICENSE.md).