-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF backend slow #393
Comments
Yes - this is a nice proposal and I'd be happy to review such a PR, but I'm not immediately sure how painful it would be to implement. If you're planning on thinning your chain eventually, you could use the |
I was about to suggest this as well, I even opened a discussions on the emcee google groups - link - and in the forth reply a user gave a small example of how to run emcee as an iterator, which I already was doing, and I was now trying to figure how to to periodically write the chain to disk. One thing that I had in mind was saving the chain at the same time it would compute the autocorrelation time, in parallel, because for longer chains that computation can take a while and that way you would make the most out of that time. In the mean time I'll keep trying to figure that out myself, but having this by default would be great! |
I've been digging through the source code and each time a step is computer its saved to the backend, using the method The same thing happens when getting a value from this backend, with the method I was trying to modify the If so then it would have to be added a way to communicate the buffer to the backend, which at the moment there doesn't seem to be any way of communicating with the backend. |
General information:
I am looking at your saving example at https://emcee.readthedocs.io/en/v3.0.2/tutorials/monitor/
Problem description:
I very much like the flexibility offered by the HDF backend, being able to save my chain to a file and to continue at a later point in time (especially as backup for long computations when my cpu node dies). However, when I have a fast log_prob function the overhead of opening/writing/closing the HDF file on each iteration seems to be disproportionately high and the overall computation performance is painfully slow.
Expected behavior:
Perhaps an easy solution would be an option to only save the chain state on every n-th iteration (where n is some adjustable number or n is calculated based on the relative progress). This may save some overhead by only opening/closing the HDF file once in a while.
The text was updated successfully, but these errors were encountered: