The Markov Boundary Toolkit (MBTK)

MBTK is a library that assists in the research and development of Markov boundary discovery algorithms. It contains helpful tools, such as a Bayesian network sampler, data set generators and various components used to configure and run various experiments.

The MBTK itself was already used to build successful experiments. Most notably, the experiments/experiment_dcmi folder in MBTK contains the configuration and customizations needed to evaluate the dcMI optimization for the computation of the G-test, as part of a comparative experiment involving the IPC-MB algorithm (published at MDPI Entropy).

If you rely on MBTK in your work in any way, consider citing the article itself (DOI https://doi.org/10.3390/e23111501) and/or this repository (see the CITATION.cff file in the root of the repository).

A previous iteration of MBTK (see MBFF) was used to build an experiment that evaluated 4 individual optimizations for the older KS algorithm (published at ROMJIST, an open-access journal).

For a demonstration of MBTK, see below.

Dependencies

The following Python packages are imported by MBTK.

numpy
scipy
pytest
lark-parser
pympler
matplotlib
pudb

Please install them with pip install --user -r requirements.txt.

Alternatively, use pip install --user numpy scipy pytest lark-parser pympler matplotlib pudb directly.

Components

The MBTK contains many utilities, but also complete implementations of algorithms, data structures and optimizations.

Utilities

Bayesian Interchange Format (BIF) reader, which instantiates a Bayesian network from a BIF file.
Bayesian network sampler, which generates large amounts of fully defined samples from a Bayesian network, i.e. each sample contains a value for each variable in the network.
Data set generators with multiple sources: Bayesian networks, synthetic binary variables or the Reuters Corpus Volume 1 v2.
Experiment infrastructure utilities, such as experimental definition classes and algorithm runners, which manage and measure individual algorithm runs and also collect their results.

Data structures

Bayesian network, which can be constructed either from a BIF file or programmatically; includes an implementation of the d-separation algorithm to test for conditional independence.
AD-trees, which can represent an entire data set completely and allow for very fast access to sample counts, but can consume a lot of memory; both static and dynamic AD-trees are provided (see the static AD-tree article and the dynamic AD-tree article); data structures used by the AD-trees are also provided, namely implementations for contingency tables and contingency trees.

Algorithms

Iterative Parent-Child Markov Blanket algorithm (IPC-MB), implemented in a simple and general way, isolated from any independence test; for details on IPC-MB, see the original article.

Mathematics

G-test implemented in the canonical form; performs queries on the data set alone; used mostly as a baseline for efficiency comparisons.
G-test implemented with AD-trees (configurable for static or dynamic tree); when using the static AD-tree, the data set is not queried for independence tests at all, but the tree must be built completely before the algorithm starts; when using the dynamic AD-tree, no building step is required before the algorithm starts, but access to the data set becomes required when a query cannot be satisfied by the tree alone.
G-test implemented with the dcMI optimization, which decomposes the G statistic into joint entropy terms and caches them for future reuse; access to the data set is required when a test cannot be computed with the cached terms alone.

Demonstration

Make sure the dependencies have been installed before attempting to run the demonstration.

A demonstrative experiment can be performed by running python demo.py in the root MBTK folder. The demo.py script will run the IPC-MB algorithm for each of the 37 variables of a synthetic data set. This experiment consists of two runs: one where IPC-MB is optimized with a dynamic AD-tree and one where IPC-MB is optimized with dcMI. This demonstrative experiment compares dcMI and the dynamic AD-tree, highlighting their difference in efficiency when added to IPC-MB.

Name		Name	Last commit message	Last commit date
Latest commit History 557 Commits
doc		doc
experiments		experiments
mbtk		mbtk
meta		meta
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
adtree-server.py		adtree-server.py
control_env.sh		control_env.sh
demo.py		demo.py
dof-analysis.py		dof-analysis.py
download_rcv1v2.sh		download_rcv1v2.sh
mypy.ini		mypy.ini
profiling.py		profiling.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Markov Boundary Toolkit (MBTK)

Dependencies

Components

Utilities

Data structures

Algorithms

Mathematics

Demonstration

About

Releases

Packages

Languages

License

camilbancioiu/mbtk

Folders and files

Latest commit

History

Repository files navigation

The Markov Boundary Toolkit (MBTK)

Dependencies

Components

Utilities

Data structures

Algorithms

Mathematics

Demonstration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages