You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running on GPU and I seem to use the restart from checkpoint but I end up loosing particles upon restart.
It looks like the simulation time also falls::
Just running normally start from step 0 without restart (killed since time limit has been reached)
STEP 30122 starts ...
STEP 30122 ends. TIME = 2.559001466e-12 DT = 8.495456697e-17
Evolve time = 77712.85459 s; This step = 2.491132788 s; Avg. per step = 2.579936743 s
Simulation time after restart at step 30000
STEP 30122 starts ...
STEP 30122 ends. TIME = 2.559001466e-12 DT = 8.495456697e-17
Evolve time = 1115.337932 s; This step = 2.220336455 s; Avg. per step = 9.142114198 s
It seems that the simulation time per step also falls from 2.5 s to 2.2 s!
I am also dumping out the fields with openPMD and the file size drops form 410 GB to 280 GB! (with just 100 steps after the restart)
Without the restart a smaller version of the simulation seems to work well.
Also upon restart I required to invert " amrex.abort_on_out_of_gpu_memory " from default else I was not able to run the simulation since it errors out as GPU out of memory!! I had to change this flag from the default to get warpx to run after restart from checkpoint.
I also seem to have exclusive access to the GPUS/node where others are not currently running.
I have tried the restart example in the code and have not yet been able to recreate the issue.
At the end it seems like the simulation works well but write particles after restart does not seem to be doing well since the fields exist!
At the start of the simulation I also get the warning below:
Multiple GPUs are visible to each MPI rank, but the number of GPUs per socket or node has not been provided.
This may lead to incorrect or suboptimal rank-to-GPU mapping.!
Attached is an example of the electron density (and seems to affect all the slices) (notice the particles at the front of the moving window )
WarpX Version info:
CUDA initialized with 1 GPU per MPI rank; 8 GPU(s) used in total
MPI initialized with 8 MPI processes
MPI initialized with thread support level 3
AMReX (21.12) initialized
WarpX (21.12-nogit)
PICSAR (7b5449f92a4b)
The text was updated successfully, but these errors were encountered:
Thanks for reporting this issue!
Would you be able to share the full input script for the first simulation and the restarted simulation? (or a modified version thereof that would still allow us to reproduce this issue)
I tried a much smaller simulation with less cells and the problem repeated.
Attached is the input files with different grids. The output dump at step 31000 is much smaller.
For the first line just comment out the restart line and change max step to 30000 for the next run uncomment the restart line and change max step to 60k
WarpX@21.12
Running on GPU and I seem to use the restart from checkpoint but I end up loosing particles upon restart.
It looks like the simulation time also falls::
Just running normally start from step 0 without restart (killed since time limit has been reached)
Simulation time after restart at step 30000
It seems that the simulation time per step also falls from 2.5 s to 2.2 s!
I am also dumping out the fields with openPMD and the file size drops form 410 GB to 280 GB! (with just 100 steps after the restart)
Without the restart a smaller version of the simulation seems to work well.
Also upon restart I required to invert " amrex.abort_on_out_of_gpu_memory " from default else I was not able to run the simulation since it errors out as GPU out of memory!! I had to change this flag from the default to get warpx to run after restart from checkpoint.
I also seem to have exclusive access to the GPUS/node where others are not currently running.
I have tried the restart example in the code and have not yet been able to recreate the issue.
At the end it seems like the simulation works well but write particles after restart does not seem to be doing well since the fields exist!
At the start of the simulation I also get the warning below:
Multiple GPUs are visible to each MPI rank, but the number of GPUs per socket or node has not been provided.
This may lead to incorrect or suboptimal rank-to-GPU mapping.!
Attached is an example of the electron density (and seems to affect all the slices) (notice the particles at the front of the moving window )
But the field seems to be good!
Some of the parameters are:
For restart:
With the full input script below
For saving the checkpoints I use:
WarpX Version info:
CUDA initialized with 1 GPU per MPI rank; 8 GPU(s) used in total
MPI initialized with 8 MPI processes
MPI initialized with thread support level 3
AMReX (21.12) initialized
WarpX (21.12-nogit)
PICSAR (7b5449f92a4b)
The text was updated successfully, but these errors were encountered: