Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

vasdommes · 2024-04-16T19:25:59Z

If --maxSharedMemory=0 (set by default), SDPB will try to determine it automatically.

SDPB will estimate how much memory is required to allocate all matrices used by solver, and set the limit for shared windows (used in bigint_syrk_blas() for calculating Q matrix) to 50% of the remaining memory.

Note that the memory estimates are rather imprecise, and sometimes one has to set a lower limit (e.g. 25%) to prevent OOM.
Choosing 50% for automatic limit is somewhat arbitrary. User should use this value as a starting point and decrease it when necessary. I added these notes to Usade.md.

Note also that the limit is set for each node separately, because each node has its own shared windows and may have blocks of different size.

Example output:

Warning: node=0: 
	MemTotal: 251.5 GB
	Required memory estimate (excluding shared memory windows): 81.5 GB
	To prevent OOM, SDPB will set --maxSharedMemory to 50% of the remaining memory, i.e. 85.0 GB
	In case of OOM, consider increasing number of nodes and/or decreasing --maxSharedMemory limit.

In debug mode, SDPB will also print matrix sizes, e.g.

node=4 matrix sizes and memory estimates: 
	#(SDP) = 134307093
	#(X) = 2347739
	#(A_X_inv) = 9390902
	#(schur_complement) = 28771896
	#(B) = 128743304
	#(Q) = 3723776
	BigFloat size: 168 B
	Total BigFloats to be allocated: 359533942 elements = 56.3 GB
	Initial MemUsed (at SDPB start) = 25.9 GB
	Total non-shared memory estimate: 82.1 GB (88187737456 bytes)

and matrix allocation messages, e.g.

node=0: allocate schur_complement: 9.1 GB
node=0: allocate schur_off_diagonal: 35.3 GB

Our memory estimates turned out to be very imprecise (by a factor of ~2), so we do not want to override --maxSharedMemory explicitly set by user. Before calling initialize_bigint_syrk_context(), we look at /proc/meminfo MemAvailable and calculate how much memory will be allocated for different BigFloat matrices (excluding what was already allocated, e.g. SDP or X,Y). We set --maxSharedMemory to 90% of remaining RAM. What will be allocated by SDP_Solver: - dX, dY (from step()) - R, Z (from compute_search_direction()) - A_X_inv, A_Y - schur_complement, schur_complement_cholesky - schur_off_diagonal - Q by approx_objective: - schur_complement, schur_complement_cholesky - schur_off_diagonal - Q - new SDP (created in quadratic_approximate_objectives): bilinear_bases, bases_blocks, free_var_matrix TODO: check for other temporary matrices, see if we can ignore some of them TODO: account for buffers for reduce-scatter Q. We can do it in BigInt_Shared_Memory_Syrk_Context, when we know the split factor.

…fers in --maxSharedMemory limit (update for #206 Determine --maxSharedMemory automatically) Strictly speaking, these buffers are not shared-memory. But their size depends on output_window_split_factor, so it is natural to account them together with shared memory windows.

…(including what is already allocated). When we do timing run and then actual run, some memory is not released explicitly. Thus, for actual run MemAvailable from /proc/meminfo can be much smaller than really available memory. So we have to look at MemTotal and account for all memory that should be used by SDPB. TODO account for extra overhead - e.g. on Expanse at "start sdpb.solve" MemUsed = 24GB (before even reading SDP!)

We print total allocated memory from all ranks on a node, e.g.: node=0: allocate Y_cholesky: 2.8 KB

…28 cores at Expanse HPC, it is ~26GB.

…ally) stress-tensors-3d, nmax=18, 4 nodes (Expanse), --maxSharedMemory=50GB: Timing run with remaining memory estimate = 113.5GB succeeds Actual run with remaining memory estimate = 106.7GB fails with OOM This indicates that 50% is too much, so we'll try 40%.

…ages

…igint_syrk/Readme.md

…ng memory. When we have a lot of RAM, 50% is enough. In extreme cases (stress-tensors-3d nmax=18, 4 or 5 nodes), 40% doesn't prevent OOM, we need to reduce e.g. to 25% or even lower. This is too small for most of the cases, we don't want to be too restrictive. Since we cannot guarantee it anyway, we'll simply choose a nice number, 50%. In case of OOM user should adjust the limit manually.

vasdommes added enhancement performance labels Apr 16, 2024

vasdommes added this to the 3.0.0 milestone Apr 16, 2024

vasdommes added 3 commits April 17, 2024 00:08

vasdommes force-pushed the memory-limit branch from 3b66fe7 to 32a3fba Compare April 17, 2024 04:12

vasdommes added 7 commits April 17, 2024 00:15

debug mode: print memory allocation messages for different matrices

0efd902

We print total allocated memory from all ranks on a node, e.g.: node=0: allocate Y_cholesky: 2.8 KB

Account for initial MemUsed (at SDPB start) in memory estimates. On 1…

67d0d68

…28 cores at Expanse HPC, it is ~26GB.

fix get_max_shared_memory_bytes(): remove default parameter, fix mess…

78fa9f0

…ages

Reformulate --maxSharedMemory warnings again.

85e3f60

Update --maxSharedMemory related documentation in docs/Usage.md and b…

82bbd77

…igint_syrk/Readme.md

vasdommes force-pushed the memory-limit branch from 32a3fba to 8501ea6 Compare April 17, 2024 04:16

vasdommes merged commit be132bd into master Apr 17, 2024
2 checks passed

vasdommes deleted the memory-limit branch April 17, 2024 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

vasdommes commented Apr 16, 2024 •

edited

Loading

Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

Fix #206 Determine --maxSharedMemory automatically, if not set by user. #229

Conversation

vasdommes commented Apr 16, 2024 • edited Loading

vasdommes commented Apr 16, 2024 •

edited

Loading