Big Data vs. complex physical models - a scalable inference algorithm

A algorithm for fitting models against many data sets, giving parameter probability distributions. The key is that model evaluations are efficiently re-used between data sets, making the algorithm scale sub-linearly.

See paper for details: https://arxiv.org/abs/1707.04476

How to run

You need to install

python-igraph
numpy, scipy
h5py
progressbar
gcc

Then run:

$ # build
$ make
$ # simulate data set
$ python gensimple_horns.py 10000
$ # analyse
$ OMP_NUM_THREADS=4 python sample.py data_widths_10000.hdf5 100
$ # simulate no-signal data set
$ python gennothing.py 10000 # simulate no-signal data set
$ # analyse
$ OMP_NUM_THREADS=4 python sample.py data_nothing_10000.hdf5 10000

See paper draft for details.

Improving Performance

See TODO.

Implementation notes and Code organisation

sample.py sets up everything
Set your problem definition (parameters, model, likelihood) in sample.py
Integrator: multi_nested_integrator.py . Calls sampler repeatedly.
Joint Sampler: multi_nested_sampler.py . This deals with managing the graph and the queues and which live points to use for a new draw. Calls draw_constrained
The queues (paper) are called shelves in the code.
RadFriends: hiermetriclearn.py: Suggests new samples from live points and filters with likelihood function to return a higher point.
clustering/: Fast C implementations for checking if a point is in the neighbourhood and computing safe distances.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
clustering		clustering
pres		pres
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
TODO.rst		TODO.rst
adaptive_progress.py		adaptive_progress.py
cachedconstrainer.py		cachedconstrainer.py
checkoutput.py		checkoutput.py
clike.c		clike.c
cmuselike.c		cmuselike.c
elldrawer.py		elldrawer.py
friends.py		friends.py
gen.py		gen.py
gen_realistic.py		gen_realistic.py
gennothing.py		gennothing.py
gensimple.py		gensimple.py
gensimple_bright.py		gensimple_bright.py
gensimple_faint.py		gensimple_faint.py
gensimple_horns.py		gensimple_horns.py
hiermetriclearn.py		hiermetriclearn.py
multi_nested_integrator.py		multi_nested_integrator.py
multi_nested_sampler.py		multi_nested_sampler.py
musefuse.py		musefuse.py
musefuse_postprocess.py		musefuse_postprocess.py
plotevidences.py		plotevidences.py
plotmuseposterior.py		plotmuseposterior.py
plotposterior.py		plotposterior.py
plotscaling.py		plotscaling.py
profile_generate_subsets.py		profile_generate_subsets.py
sample.py		sample.py
whitenedmcmc.py		whitenedmcmc.py

License

JohannesBuchner/massivedatans

Folders and files

Latest commit

History

Repository files navigation

Big Data vs. complex physical models - a scalable inference algorithm

How to run

Improving Performance

Implementation notes and Code organisation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages