Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Define breaks for stopping #4

Open
peclayson opened this issue Apr 29, 2022 · 4 comments
Open

Feature Request: Define breaks for stopping #4

peclayson opened this issue Apr 29, 2022 · 4 comments

Comments

@peclayson
Copy link

I've been playing around with the package a bit. It's really cool!

I read the documentation for chkpt_brms, and I was wondering whether it would be possible in a future release to define actual breaks programmatically (so a user doesn't have to rely on the 'stop' button). I'm hoping for a way to circumvent the need for a user to interact with the fitting.

E.g., if iter_warmup = 5000, iter_sampling = 15000, iter_perchkpt = 1000, a separate input could force the fitting to stop every 5,000 iterations that could be picked up again by a later call from chkpt_brms.

The application I am thinking about is running models on a computer cluster, rather than a desktop. My hope is to force breaks to split up long jobs so they can be run on nodes with shorter wall times.

Thanks,
Peter

@donaldRwilliams
Copy link
Owner

hey !

I think that should be possible, but will have to think a bit about how to implement.

In R Studio, there is a way to schedule running a .R file. So here if you have chkpt_brms, then I dont think you would have to interact with it (pretty sure this will work, as this is the use case we had in mind).

Let me think about this a bit more !!

@peclayson
Copy link
Author

I don't see any issue using chkpt_brms on the cluster (I've only used it on my desktop so far). I plan to try it out after the semester is over. It will be helpful for saving time after node failures... :)

My hope is that if I have the break built in, once chkpt_brms gets to the breaking point, the function finishes, and then it moves on through the script to queue up another job on the cluster to pick up the baton.

Although it's possible to pick up where the job left off by submitting another job, I would like to automate queuing up the next job. If the script terminates due to reaching the max walltime, it wouldn't continue processing the code to pass the baton.

Thanks, Donald!

@donaldRwilliams
Copy link
Owner

My hope is that if I have the break built in, once chkpt_brms gets to the breaking point, the function finishes, and then it moves on through the script to queue up another job on the cluster to pick up the baton.

I see ! let met think about how best to implement this, and will update here with some ideas. Of course, open to ideas you have about how to implement that in the package..

@venpopov
Copy link

I implemented this in an open pull request and then saw there was already this request for it

#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants