Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory constraints and load balancing #806

Merged
merged 27 commits into from Sep 19, 2019
Merged

Memory constraints and load balancing #806

merged 27 commits into from Sep 19, 2019

Conversation

tskisner
Copy link
Member

@tskisner tskisner commented Aug 9, 2019

Add memory constraints to the spectral grouping and redshift tasks. Also add functionality to the nersc_job_size function that detects large load imbalance and reduces the job size to compensate.

@julienguy
Copy link
Contributor

For the spectra task, you may scale memory with the sum of the number of targets in the healpix_frame matching rows instead of just the number of matching rows; ntargets is an column of the table. This would be a modest improvement, given the number of targets per frame is fixed. But it would still be an improvement as the number of targets from a frame intersecting a pixel varies; and this is also a function of nside.

@sbailey
Copy link
Contributor

sbailey commented Aug 9, 2019

@julienguy although the output file scales with the number of targets, we read in the entire frame since the unused targets are likely to be needed by a neighboring healpix that will be processed soon. So the memory scales with number of frames (for loading the inputs) and the number of targets that are covered by the healpix (for constructing the output).

I'd like to test this after the cori I/O outage before merging. Thanks.

@tskisner
Copy link
Member Author

tskisner commented Aug 9, 2019

In addition to what @sbailey said, I also set the memory requirement based only on the frames in healpix_frame with matching pixel- I do not consider the "state=1" readiness, since at the beginning of a fresh production this regrouping job may be scheduled before any of the frames are ready. So the time / memory estimate is a worst case.

@tskisner
Copy link
Member Author

tskisner commented Aug 9, 2019

I just rebased this against master after the merge of #805

… bug fixes. Add memory and time requirements to spectral regrouping tasks.
.format(nproc)
)
nworker = nproc
taskproc = nproc // nworker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this line to

taskproc = min(nproc // nworker, task_classes[tasktype].run_max_procs())

fixes a problem where serial tasks (like fiberflatnight) could be given communicators with multiple ranks that all tried to do the same thing and would clobber each other. That worked for fixing the fiberflatnight step; I'm now running a larger production from the beginning with that one-line change and will report back if this has some other unintended consequence.

Test production at NERSC in /global/cscratch1/sd/sjbailey/desi/svdc2019d/spectro/redux/v11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The larger test is still running but so far the only problems have been the logfile redirection timeouts and problems that cascade from there due to dependencies. I'll commit this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in run.py around this section is wrong, and I had a local commit Saturday on cori to fix it, but did not push because there was another issue... I'll try to reconcile this with my changes and open a PR against this branch.

Stephen Bailey and others added 8 commits September 1, 2019 20:25
…n and use that for both run_task_list and dryrun.
…time rather than trying to deduce this from the MPI communicator size. Also fix the use of the realtime queue on cori.
Move common calculation of runtime distribution to a separate function
Implement a "desi_pipe status" command
@sbailey
Copy link
Contributor

sbailey commented Sep 19, 2019

I reverted the starfit --ncpu 1 change, since that made that step take longer instead of run faster. For memory reasons that step is allocated 4 cores anyway (16 hypercores on KNL), and without --ncpu 1 it uses multiprocessing parallelized 16x to run faster.

image

the tail of the distribution is driven by how many standard stars on on a tile, which is variable (e.g. SV MWS tiles have a lot of standard stars).

This does have the MPI+multiprocessing combination, which has been fragile in the past, though only when we've tried to use the fancier "copy on write" features of multiprocessing that doesn't work with Cray MPI. Empirically this step with simpler MPI+multiprocessing usage has been working robustly, with only one case of a timeout failure.

The current state of affairs is that the pipeline runs through spectra unless it hits a bad node or srun problem, but it is still mis-tuned for the redshift fitting step which consistently gets killed by slurm. The logs just say "srun: error: nid12668: task 1216: Aborted" without mentioning "memory" or "oom", but past cases of this message were also tied to memory problems.

Even though this branch still doesn't succeed at running end to end (even when there aren't any NERSC I/O or node problems), it does get up to the last step so I'm inclined to merge it and then restart the redshift task optimization in a new branch.

@sbailey
Copy link
Contributor

sbailey commented Sep 19, 2019

For the record: the test failures in this branch were due to unrelated QA code and were fixed in PR #814 and work on master.

@tskisner
Copy link
Member Author

That plan sounds good to me (merge sooner rather than later). Many things that were previously completely wrong with the timing estimates in the code (and which worked by happy coincidence) are now fixed and more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants