Skip to content

Partially factorized mean-field variational Bayes for high-dimensional probit models

License

Notifications You must be signed in to change notification settings

augustofasano/Probit-PFMVB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scalable and Accurate VB for Binary Regression

This repository is associated with the article Fasano, Durante and Zanella (2020). Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models. The key contribution of the paper is outlined below.

In this article we develop a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit models with Gaussian priors. Our method leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables.

This repository provides codes and tutorials to implement the inference methods associated with such a new result. In particular, the focus is on the medical applications to Alzheimer’s and to gastrointestinal lesion data outlined in Section 4 of the paper. See also Craig-Shapiro et al. (2011) for the Alzheimer's application and Mesejo et al. (2016) for the lesion study. The complete tutorial can be found in the file ApplicationTutorial.md where we also provide details to pre-process the original datasets available in the R package AppliedPredictiveModeling and on the UCI repository, respectively. As explained in the tutorial, the results for the parkinson and voice datasets can be obtained by running the same code of the lesion example, after retrieving the corresponding datasets from the publicly available UCI repository.

The goal of the first part of the analysis (i.e. the Alzheimer's application) is to compare the performance of the proposed partially factorized mean-field (PFM) approximation relative to those state-of-the-art competitors which were feasible in this application. These include the classical mean-field (MF) variational approximation (Consonni and Marin, 2007) and Monte Carlo inference based on i.i.d. samples from the exact unified skew-normal posterior derived by Durante (2019). The latter serves also as a benchmark to study the accuracy of the approximate methods. See Section 2 in the article for details. We also tried to implement Hamiltonian Monte Carlo methods (R package rstan) and expectation-propagation (R package EPGLM), but these algorithms were impractical. Hence, we will not focus on such schemes in this repositiory. The second part focuses instead on the comparison of the test deviances obtained with the PFM and MF approximations for the lesion study. In both parts, we also compare predictive performance against the spike-and-slab variational approximation of the posterior distribution for logistic regression developed by Ray et al., (2020).

The functions to implement the above methods can be found in the R source file functionsVariational.R, and a tutorial explaining in detail the usage of these functions is available in the file functionsTutorial.md.

All the analyses are performed with a MacBook Pro (OS Mojave, version 10.14.6, Processor 2.7 GHz Intel Core i5, RAM 8 GB), using an R version 3.6.1.

IMPORTANT: Although a seed is set at the beginning of each routine, the outputs reported in ApplicationTutorial.md may be subject to slight variations depending on which version of the R packages has been used in the code implementation. This is due to possible internal changes of certain functions when the package version has been updated. However, the magnitude of these minor variations is negligible and does not affect the conclusions.

About

Partially factorized mean-field variational Bayes for high-dimensional probit models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages