Emcee and MPI with mpi4py #310

FabRez · 2019-09-25T10:27:16Z

General information:

emcee version:3.0rc2
platform: Ubuntu
installation method (pip/conda/source/other?): pip install

Problem description:

Hi all,

I am pretty new of MPI parallelization and this is indeed my first try to use it on a python code

I have tried a parallelization of emcee using mpi4py in a different way w.r.t. the minimal examples proposed in the documentation of the code (which actually uses multiprocessing).

The idea behind this workaround is that I want to have each process to write its own"backend" file when emcee is running which, to my knowledge so far, cannot be realized using the "pool" option of the sampler. Furthermore, w.r.t. the "pool" method I can draw much more model at
the same time with speed comparable to when I run the code without parallelization.

what I would like to know is if this is the right way to obtain what I wanted to achieve or if there's another way to do that using the "pool" method. The workaround works fine but I am concerned about the results I am drawing from this, are the results of each MCMC process really independent from one another in this way, or there's something I am missing ?

Best,
F.

Minimal example:

import emcee
from mpi4py import MPI

def lnlike(theta, x, y, yerr):
    m, b, lnf = theta
    model = m * x + b
    inv_sigma2 = 1.0/(yerr**2 + model**2*np.exp(2*lnf))
    return -0.5*(np.sum((y-model)**2*inv_sigma2 - np.log(inv_sigma2)))

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

sampler={}
backends= {}

filename = "examples_"+str(rank+1)+".h5"

ndim, nwalkers = 3, 10  

# use backends option of emcee to store chains 

backends[str(rank)]=emcee.backends.HDFBackend(filename)

backends[str(rank)].reset(nwalkers,ndim)

# create an initialization for the walkers 

pos = [p0+ 1e-4*np.random.randn(ndim) for i in range(nwalkers)]    

sampler[str(rank)] = emcee.EnsembleSampler(nwalkers, ndim, lnlike, args=(x, y, yerr),backend=backends[str(rank)])

#start the actual mcmc process

sampler[str(rank)].run_mcmc(pos, 10000,progress=True)

``

dfm · 2019-09-26T12:11:32Z

There isn't anything fundamentally wrong about this approach (it can actually be useful since you have many independent ensembles in this case). But one thing that you need to be careful about is making sure that the random number generator on each process gets properly initialized or you'll end up with chains that are correlated or even the same.

wenbinlu · 2019-10-31T00:44:44Z

I ran into a similar situation. Is it possible to random.seed() the random numbers used in emcee? Does emcee already adopt a particular way of seeding the random number generator (in this way I won't have a control)?

dfm · 2019-10-31T00:54:32Z

The EnsembleSampler creates a RandomState (that it owns) when it's initialized so the easiest way to have control over the state is to run a np.random.seed before initializing the sampler. It would be good to add a seed argument to the sampler, but it's not yet implemented.

axiezai mentioned this issue Jun 11, 2021

Reading backend while it is being written sometimes throws an error #389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emcee and MPI with mpi4py #310

Emcee and MPI with mpi4py #310

FabRez commented Sep 25, 2019 •

edited

Loading

dfm commented Sep 26, 2019

wenbinlu commented Oct 31, 2019

dfm commented Oct 31, 2019

Emcee and MPI with mpi4py #310

Emcee and MPI with mpi4py #310

Comments

FabRez commented Sep 25, 2019 • edited Loading

Minimal example:

dfm commented Sep 26, 2019

wenbinlu commented Oct 31, 2019

dfm commented Oct 31, 2019

FabRez commented Sep 25, 2019 •

edited

Loading