-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Our memory estimates turned out to be very imprecise (by a factor of ~2), so we do not want to override --maxSharedMemory explicitly set by user. Before calling initialize_bigint_syrk_context(), we look at /proc/meminfo MemAvailable and calculate how much memory will be allocated for different BigFloat matrices (excluding what was already allocated, e.g. SDP or X,Y). We set --maxSharedMemory to 90% of remaining RAM. What will be allocated by SDP_Solver: - dX, dY (from step()) - R, Z (from compute_search_direction()) - A_X_inv, A_Y - schur_complement, schur_complement_cholesky - schur_off_diagonal - Q by approx_objective: - schur_complement, schur_complement_cholesky - schur_off_diagonal - Q - new SDP (created in quadratic_approximate_objectives): bilinear_bases, bases_blocks, free_var_matrix TODO: check for other temporary matrices, see if we can ignore some of them TODO: account for buffers for reduce-scatter Q. We can do it in BigInt_Shared_Memory_Syrk_Context, when we know the split factor.
…fers in --maxSharedMemory limit (update for #206 Determine --maxSharedMemory automatically) Strictly speaking, these buffers are not shared-memory. But their size depends on output_window_split_factor, so it is natural to account them together with shared memory windows.
…(including what is already allocated). When we do timing run and then actual run, some memory is not released explicitly. Thus, for actual run MemAvailable from /proc/meminfo can be much smaller than really available memory. So we have to look at MemTotal and account for all memory that should be used by SDPB. TODO account for extra overhead - e.g. on Expanse at "start sdpb.solve" MemUsed = 24GB (before even reading SDP!)
We print total allocated memory from all ranks on a node, e.g.: node=0: allocate Y_cholesky: 2.8 KB
…28 cores at Expanse HPC, it is ~26GB.
…ally) stress-tensors-3d, nmax=18, 4 nodes (Expanse), --maxSharedMemory=50GB: Timing run with remaining memory estimate = 113.5GB succeeds Actual run with remaining memory estimate = 106.7GB fails with OOM This indicates that 50% is too much, so we'll try 40%.
…igint_syrk/Readme.md
…ng memory. When we have a lot of RAM, 50% is enough. In extreme cases (stress-tensors-3d nmax=18, 4 or 5 nodes), 40% doesn't prevent OOM, we need to reduce e.g. to 25% or even lower. This is too small for most of the cases, we don't want to be too restrictive. Since we cannot guarantee it anyway, we'll simply choose a nice number, 50%. In case of OOM user should adjust the limit manually.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If
--maxSharedMemory=0
(set by default), SDPB will try to determine it automatically.SDPB will estimate how much memory is required to allocate all matrices used by solver, and set the limit for shared windows (used in
bigint_syrk_blas()
for calculating Q matrix) to 50% of the remaining memory.Note that the memory estimates are rather imprecise, and sometimes one has to set a lower limit (e.g. 25%) to prevent OOM.
Choosing 50% for automatic limit is somewhat arbitrary. User should use this value as a starting point and decrease it when necessary. I added these notes to Usade.md.
Note also that the limit is set for each node separately, because each node has its own shared windows and may have blocks of different size.
Example output:
In debug mode, SDPB will also print matrix sizes, e.g.
and matrix allocation messages, e.g.