-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model run hangs at the end when trying to write icebergs trajectories #1
Comments
I think it would be better to allow writing of trajectories while the run is ongoing rather than relying on memory that might be smaller on a different machine. In order to write trajectory data during the run, rather than at the end of the run, we need 1) trajectory i/o to not clobber files; 2) the trajectories to be reset to zero length after writing; 3) optionally, we should also walk through the list of bergs before calling write_trajectory() and call move_traj() to detach trajectories from active bergs similer to as is done in icebergs_end(). This is because write_trajectory() only writes from the trajectories list which is populated when a berg is destroyed (melts/leaves PE/end of run).
Re (1), we could stop clobbering and always append. I think the diag_manager policy is to always clobber, so we should probably should to? |
I did a little more experimenting. How could writing a file of the order of 100M cause a hang? There must be some buffer size environment variable that fixes this! |
Totalview shows that the root pe (0) is stuck at the call |
I adjusted a namelist to increase the # of bergs artificially ( initial_mass(1)= 8.8e4 and mass_scaling(1) = 1000) . |
Running in debug mode results a crash in icebergs_save_restart before the model gets to writing trajectories.
|
Back to the prod mode testing, I don't think it's a hang, neither is it in the I/O. The model is just soooo slow to "gather" the buffer on root pe (via mpp_send and mpp_recv) that would then write to file. I just ran a test on 128 pes that produced 40491 bergs in 5 days and it looked like it was hung at the end. But after 15 minutes waiting the model finished successfully and wrote a 52M trajectory file! |
I can see (via totalview) that the poor root pe has received a massive amount of blobs (single point-in-time iceberg data) from the pes in the tile and is sifting through them. It receives the trajectory for every iceberg that has ever come into existence (and may not be present anymore) which is something like N blobs (N=# of total timesteps) for each iceberg that has ever existed. E.g, for a 5 day run root pe has received 16000 blobs just from one pe, so totally something like one million blobs that have to be linked together to make the final trajectory file. I think it is the process of traversing the final linked list to find its tail (to add a new leaf) that may be the issue here (just because every time I halt the run I find it inside that traversing loop). There has to be a better way to do this:
Solutions I can think of are:
|
The 2nd solution above seems to have worked. There is no more "hang", just a 2 minutes delay to reverse a list of 3,241,686 berg nodes and a 557MB trajectory file for a 1 year run of Alon's model that was timing out before. I'll make a pull request for review. The git branch is user/nnz/fix_trajectory_append_hang |
- Model run apparently hangs at the end when trying to write icebergs trajectories This is not an i/o issue but due to the root pe (or io_tile_root_pes) traversing a linked list of millions of nodes. It just take a long long time. - This fix "push"es the nodes at the top of the list instead of trying to find the tail to append it. The list will be in reverse order and has to be reversed before writing to file.
I tried to keep running the model to see what happens. It's still bad. Look at what happens to the ratio of termination/main and the size of trajectory file :
This is for io_layout=0,0 I'll try to set a io_layout for bergs to see what happens. One thing that confuses me is how come I was getting only a few minutes for the 1st year above! |
I did a more careful analysis of the timings by putting clocks around the trajectory preparation (gather on io root) and actual writing to file. Here are the results for 2 year run of MOM6_GOLD_SIS2 (Alon's experiment) with 24650 bergs:
The above stdouts are at /lustre/f1/Niki.Zadeh/ulm_mom6_2015.06.03/MOM6_GOLD_SIS2_bergs/ncrc2.intel-prod/stdout/run/ So as you see the old way of writing a trajectory file per processor is the best in terms of performance and the new way of preparing the trajectories (meaning sending and gathering blobs on the tile io) is simply not acceptable for long model runs (Termination time > Main loop). Until then I updated my branch code to do it the old way by setting a new namelist in icebergs_nml
|
- write_trajectory() now appends data to an existing file. This is a pre-requisite to being able to write trajectories mid-run and thereby reduce the extreme i/o buffer sizes and i/o PE memory requirements when writing trajectories only at the end of the run. - Relevant to issue #1. - No answer changes.
- We now deallocate the links in the trajectory chain when packing into the i/o buffer. This is in preparation for repeatedly calling write_trajectory(), see issue #1. - No answer changes.
- traj_write_hrs controls interval between writing trajectories. - Should help with issue #1. - No answer changes.
This issue has not been seen again. Closing. |
Gnu fixes solo driver
Gnu fixes solo driver
1 year long run of an ocean-ice model hangs at the end of the run. stdout indicates that the last thing the model was doing was writing the iceberg trajectories:
The above was on 128 cores interactive on gaea.
Note that this happens for the longer runs that have a lot of icebergs at the end (this one has 15170). Since the trajectories stay in memory and gets written to file at the end of the run this may indicate an issue with the I/O buffer.
Is there a way to increase that buffer?
The text was updated successfully, but these errors were encountered: