Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

Merged
merged 10 commits into from
Apr 17, 2024

Conversation

vasdommes
Copy link
Collaborator

@vasdommes vasdommes commented Apr 16, 2024

If --maxSharedMemory=0 (set by default), SDPB will try to determine it automatically.

SDPB will estimate how much memory is required to allocate all matrices used by solver, and set the limit for shared windows (used in bigint_syrk_blas() for calculating Q matrix) to 50% of the remaining memory.

Note that the memory estimates are rather imprecise, and sometimes one has to set a lower limit (e.g. 25%) to prevent OOM.
Choosing 50% for automatic limit is somewhat arbitrary. User should use this value as a starting point and decrease it when necessary. I added these notes to Usade.md.

Note also that the limit is set for each node separately, because each node has its own shared windows and may have blocks of different size.

Example output:

Warning: node=0: 
	MemTotal: 251.5 GB
	Required memory estimate (excluding shared memory windows): 81.5 GB
	To prevent OOM, SDPB will set --maxSharedMemory to 50% of the remaining memory, i.e. 85.0 GB
	In case of OOM, consider increasing number of nodes and/or decreasing --maxSharedMemory limit.

In debug mode, SDPB will also print matrix sizes, e.g.

node=4 matrix sizes and memory estimates: 
	#(SDP) = 134307093
	#(X) = 2347739
	#(A_X_inv) = 9390902
	#(schur_complement) = 28771896
	#(B) = 128743304
	#(Q) = 3723776
	BigFloat size: 168 B
	Total BigFloats to be allocated: 359533942 elements = 56.3 GB
	Initial MemUsed (at SDPB start) = 25.9 GB
	Total non-shared memory estimate: 82.1 GB (88187737456 bytes)

and matrix allocation messages, e.g.

node=0: allocate schur_complement: 9.1 GB
node=0: allocate schur_off_diagonal: 35.3 GB

Our memory estimates turned out to be very imprecise (by a factor of ~2), so we do not want to override --maxSharedMemory explicitly set by user.

Before calling initialize_bigint_syrk_context(),
we look at /proc/meminfo MemAvailable
and calculate how much memory will be allocated for different BigFloat matrices (excluding what was already allocated, e.g. SDP or X,Y).
We set --maxSharedMemory to 90% of remaining RAM.

What will be allocated by SDP_Solver:
- dX, dY (from step())
- R, Z (from compute_search_direction())
- A_X_inv, A_Y
- schur_complement, schur_complement_cholesky
- schur_off_diagonal
- Q

by approx_objective:
- schur_complement, schur_complement_cholesky
- schur_off_diagonal
- Q
- new SDP (created in quadratic_approximate_objectives): bilinear_bases, bases_blocks, free_var_matrix

TODO: check for other temporary matrices, see if we can ignore some of them
TODO: account for buffers for reduce-scatter Q. We can do it in BigInt_Shared_Memory_Syrk_Context, when we know the split factor.
…fers in --maxSharedMemory limit

(update for #206 Determine --maxSharedMemory automatically)

Strictly speaking, these buffers are not shared-memory.
But their size depends on output_window_split_factor, so it is natural to account them together with shared memory windows.
…(including what is already allocated).

When we do timing run and then actual run, some memory is not released explicitly.
Thus, for actual run MemAvailable from /proc/meminfo can be much smaller than really available memory.
So we have to look at MemTotal and account for all memory that should be used by SDPB.

TODO account for extra overhead - e.g. on Expanse at "start sdpb.solve" MemUsed = 24GB (before even reading SDP!)
We print total allocated memory from all ranks on a node, e.g.:
node=0: allocate Y_cholesky: 2.8 KB
…ally)

stress-tensors-3d, nmax=18, 4 nodes (Expanse), --maxSharedMemory=50GB:
Timing run with remaining memory estimate = 113.5GB succeeds
Actual run with remaining memory estimate = 106.7GB fails with OOM

This indicates that 50% is too much, so we'll try 40%.
…ng memory.

When we have a lot of RAM, 50% is enough.
In extreme cases (stress-tensors-3d nmax=18, 4 or 5 nodes), 40% doesn't prevent OOM, we need to reduce e.g. to 25% or even lower.
This is too small for most of the cases, we don't want to be too restrictive.

Since we cannot guarantee it anyway, we'll simply choose a nice number, 50%.
In case of OOM user should adjust the limit manually.
@vasdommes vasdommes merged commit be132bd into master Apr 17, 2024
2 checks passed
@vasdommes vasdommes deleted the memory-limit branch April 17, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant