This repo contains code to reproduce the experiments found in our manuscript
Ryan Giordano, Runjing Liu, Michael I. Jordan, Tamara Broderick. "Evaluating sensitivity to the Stick Breaking Prior in Bayesian Nonparametrics." https://arxiv.org/abs/2107.03584
We evaluated local sensitivity on three data analysis problems:
- a Gaussian mixture model of the canonical iris data set;
- a regression model of time-course gene expression data;
- and a topic model inferring population structure from genetic data.
We recommend installing into a virtual environment, e.g. by running
python3 -m venv venv
source venv/bin/activate
To install the package used for all the models we consider, change to the root directory of the repository, and run
python3 -m pip install --upgrade pip
python3 -m pip install wheel
python3 -m pip install -e BNP_modeling
Dependencies include jax and the jax branch of paragami. These will be installed automatically with the command above.
Our iris experiments, mice experiments, and population genetics experiments are contained in the ./GMM_clustering/
, GMM_regression_clustering
, and ./structure/
folders, respectively. To install libraries specific to those experiments, run
python3 -m pip install -e GMM_clustering
python3 -m pip install -e GMM_regression_clustering
python3 -m pip install -e structure
respectively.
Finally, you need to install a Jupyter kernel for the notebooks. With your virtual environment activated, run
python3 -m ipykernel install --user --name=bnp_sensitivity_public
The results presented in our main paper are produced entirely within Jupyter notebooks.
In each expreriment folder (./GMM_clustering/
, GMM_regression_clustering
and ./structure/
),
the jupyter
subfolder contains notebooks to reproduce our results.
For example, the ./GMM_clustering/jupyter/parametric_sensitivity.ipynb file reproduces the parametric sensitivity results for our GMM/iris epxeriment (Figure 2 in the paper).
The expectd number of in-sample clusters (left) and the expected number of predictive cluster (right) as a function of the GEM concentration parameter. In red is the linear approximation, while in blue are results from re-fitting the variational approxmation.