Skip to content
/ sbn Public

Generalizing Phylogenetic Posterior Estimator from MCMC samples via subplit Bayesian networks

Notifications You must be signed in to change notification settings

zcrabbit/sbn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subsplit Bayesian Networks for Generalizing Phylogenetic Posterior Estimation

Thank you for your interest in our paper: Generalizing Tree Probability Estimation via Bayesian Networks.

Please consider citing the paper when any of the material is used for your research.

@incollection{NIPS2018_7418,
title = {Generalizing Tree Probability Estimation via Bayesian Networks},
author = {Zhang, Cheng and Matsen IV, Frederick A},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {1449--1458},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7418-generalizing-tree-probability-estimation-via-bayesian-networks.pdf}
}

Dependencies

Basic Usage

Load MCMC sample

from utils import summary, mcmc_treeprob
# for golden runs
tree_dict_total, tree_names_total, tree_wts_total = summary(dataname, data_directory)
# for sample runs
tree_dict, tree_names, tree_wts = mcmc_treeprob(path_to_data, 'nexus')

Run SBN

from models import SBN

# parameters to set up the model
#   @taxa is the taxa list of the dataset
#   @emp_tree_freq is the empirical frequency dictionary of the trees, can be left None if kl divergence computation is not required.
model = SBN(taxa, emp_tree_freq)

# parameters to train the model
#   @tree_dict is the unique tree dictionary
#   @tree_names is the name list of the trees
#   @tree_wts is the corresponding frequencies for the trees with names in tree_names

# run sbn-sa
model.bn_train_prob(tree_dict, tree_names, tree_wts)
# run sbn-em
logp = model.bn_em_prob(tree_dict, tree_names, tree_wts, maxiter=200, abstol=1e-05, monitor=True, MAP=False)

Once trained, one can compute the sbn probablities of trees

sbn_est_prob = model.bn_estimate(tree)

When emp_tree_freq is provided, one can evaluate the kl divergence

sbn_kl_div = model.kl_div(method='bn')['bn']

See more detailed examples in the jupyter notebooks.

About

Generalizing Phylogenetic Posterior Estimator from MCMC samples via subplit Bayesian networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published