Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart script with extended time #2780

Closed
NastasiaM opened this issue Oct 31, 2018 · 5 comments
Closed

Restart script with extended time #2780

NastasiaM opened this issue Oct 31, 2018 · 5 comments
Labels
Milestone

Comments

@NastasiaM
Copy link

Hi!
I have questions about the restart script.
1)Can I start all this thing from the last checkpoint of the simulation (I think I can, but I don't know exactly how)
2)Is it possible to violate a little bit the walltime limit with this thing? For example, I need the simulation to run for ~120 hours in total, and my limit is 100 hours, can I do it 2*60 hours?

Happy Halloween :)

@ax3l ax3l added the question label Nov 1, 2018
@ax3l
Copy link
Member

ax3l commented Nov 1, 2018

Hi,

  1. by default, we restart from the most-progressed checkpoint unless otherwise requested via:
--checkpoint.restart.step <N>

https://github.com/ComputationalRadiationPhysics/picongpu/blob/master/docs/TBG_macros.cfg#L208-L214

  1. yes just change the walltime in the .cfg file or maybe checkpoint more often and restart several times?

@ax3l ax3l added this to the 0.5.0 / 1.0.0: Next Stable milestone Nov 1, 2018
@NastasiaM
Copy link
Author

Hi,
I tried to start from the last checkpoint and it started from the beginning. Am I doing smth wrong?
Here is .cfg file.
0004gpus.txt

@ax3l
Copy link
Member

ax3l commented Nov 5, 2018

Are you sure your simulation already reached step 23555 and wrote at least some checkpoint? What does simOutput/output say and and ls simOutput/checkpoints/*? Otherwise, try adding --checkpoint.restart.step <N> as well to TBG_restart in your .cfg.

@ax3l
Copy link
Member

ax3l commented Nov 5, 2018

Ah I see it, your TBG_restart should also contain a simple --checkpoint.restart to trigger the restart.

TBG_restart="--checkpoint.restart --checkpoint.restart.backend adios --checkpoint.restart.directory /bigdata/hplsim/external/mukhar40/ColloidalMelting_TF_I=6.3_10ps/simOutput/checkpoints"

@psychocoderHPC
Copy link
Member

I will close this issue, I assume it is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants