Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory consumption #57

Closed
jmatta1 opened this issue Sep 8, 2015 · 2 comments
Closed

Large memory consumption #57

jmatta1 opened this issue Sep 8, 2015 · 2 comments

Comments

@jmatta1
Copy link

jmatta1 commented Sep 8, 2015

I am using the triangle plots package together with emcee in a very simple minded way. Using emcee I generated a list of about 1,000,000 samples in 8 dimensions (using 1000 walkers and 1050 samples). and then I invoke the corner plot here is a snippet of code:

sampler = emcee.EnsembleSampler(1000, ndims,
                                ln_post_prob, args=(cs_lib, struct, bnds))
sampler.run_mcmc(starts, 1050)
num_samples = (1000 * (1050 - 50))
samples = sampler.chain[:, 50:, :].reshape((num_samples, ndims))
lbls = [r"$a_{%d}$" % i for i in range(ndims)]
ranges = [(-0.001, (0.001 + 1.1*samples[:,i].max())) for i in range(ndims)]
fig = tplot.corner(samples, labels=lbls, extents=ranges)
fig.savefig(fig_file_name)

During the MCMC sampling memory consumption of the python process is about 132MB. When the system attempts to create a corner plot using this package immediately after the sampling, memory consumption jumps to 840MB.

Repeating the exercise with 8,000,000 samples in 8 dimensions (using 4000 walkers and 2050 samples) gives memory consumption of about 624MB during sampling and then, during corner plot creation, climbs to 4.8GB.

I recognize that the growth in consumption (as a function of number of samples) is less than linear, but 4.8GB to make a corner plot for 8,000,000 samples in 8 dimensions seems excessive to me.

If this is a problem in the underlying matplotlib, I apologize for raising the issue here here, if there is some set of options that will reduce memory consumption that I neglected to use, then I apologize for not seeing them.

@dfm
Copy link
Owner

dfm commented Sep 8, 2015

This is always a problem matplotlib... triangle shouldn't normally copy the samples so that won't be the problem. I'd recommend thinning your chains by some fraction of the autocorrelation time (MCMC samples aren't independent) because I really can't see a reason why you ever need to plot 8M samples. Another option is to plot a few corner plots with subsets of the parameters instead of the full parameter space in one go.

Hope that helps!

@jmatta1
Copy link
Author

jmatta1 commented Sep 8, 2015

It certainly does, I was plotting 8M samples because the chi^2 function seems to be multimodal (though I cannot see it in the plots) and was was wondering if any evidence of that would appear in the probability distributions with enough sampling.

Thanks again!

@jmatta1 jmatta1 closed this as completed Sep 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants