This repository contains the code to run KSD thinning and reproduce the results in our paper here
The KSD Thinning algorithm relies on the PruningContainer
module inside the package ksdp
.
The samples generated by most MCMC algorithms are correlated, and therefore some samples are redundant. Therefore we can thin many samples, reducing computational and memory costs, without sacrificing downstream metrics. In the example below, if we add the new yellow sample, we no longer need the red samples to approximate the distribution. Our algorithm provides online pruning so that redundant red samples are removed during the training process. We keep the yellow and green samples.
You may find the classes and functions in the ksdp
package useful for fast, online, or low-memory KSD computation. The PruningContainer
class in KSDP/ksdp/ksdp/pruning_container.py
can recompute the KSD online without rematerializing the entire NxN kernel matrix. Also, KSD computation is always done row-by-row unless specified otherwise. This avoids large NxN memory costs when dealing with large numbers of particles.
To create the conda environment KSDP
, install pre-requisites, and install our ksdp
package run
bash scripts/setup.sh
Scripts and instructions to reproduce our results for the BNN subspace are available in bnn_subspace
.
Scripts and instructions to reproduce our results for the bio MCMC problems are available in bio
.
Scripts and instructions to reproduce our results for the parameter and distribution sensitivity studies are available in toy
.