Optimize memory tracking with a `memThreshold` #18465

ronawho · 2021-09-27T13:03:49Z

Memory tracking uses a global mutex to serialize access to a hash table,
this makes concurrent allocations very slow. Previously, even if a
memory threshold was used we would still grab the table lock when
free'ing because we didn't know the pointer size. Here add a
chpl_mem_real_alloc_size that will use the jemalloc API to ask for the
real size of the allocation before acquiring the lock. This allows us
to avoid taking the lock when free'ing allocations below the threshold,
which saves a lot of time.

Note that chpl_mem_real_alloc_size returns the real allocation size,
not requested size. e.g. (chpl_real_alloc_size(chpl_malloc(7)) returns
8. This means that we'll still do some unnecessary locking if the
allocation size is between the requested size and the real size of an
allocation. If we wanted to avoid that we could silently adjust
memThreshold up to the next allocation size class.

Here's a concurrent allocation micro-benchmark that demonstrates the
overhead. Results are on 128-core Rome CPU:

use Time;
config const trials = 1_000_000;

var t: Timer; t.start();
coforall 1..here.maxTaskPar do
  for i in 1..trials do
    var s = i:string;
writeln(t.elapsed());

config	Time
w/o memTrack	0.19s
w/ memTrack	144.50s
w/ threshold before	33.06s
w/ threshold now	0.22s

This is motivated by Arkouda, which uses memTrack as a means to detect
if an operation will exceed memory. We recently noticed concurrent
allocations were slower than expected and tracked it down to this.

Related to #10415
Resolves https://github.com/Cray/chapel-private/issues/1330

Memory tracking uses a global mutex to serialize access to a hash table, this makes concurrent allocations very slow. Previously, even if a memory threshold was used we would still grab the table lock when free'ing because we didn't know the pointer size. Here add a `chpl_mem_real_alloc_size` that will use the jemalloc API to ask for the real size of the allocation before acquiring the lock. This allows us to avoid taking the lock when free'ing allocations below the threshold, which saves a lot of time. Note that `chpl_mem_real_alloc_size` returns the real allocation size, not requested size. e.g. (`chpl_real_alloc_size(chpl_malloc(7))` returns `8`. This means that we'll still do some unnecessary locking if the allocation size is between the requested size and the real size of an allocation. If we wanted to avoid that we could silently adjust memThreshold up to the next allocation size class. Here's a concurrent allocation micro-benchmark that demonstrates the overhead. Results are on 128-core Rome CPU: ```chpl use Time; config const trials = 1_000_000; var t: Timer; t.start(); coforall 1..here.maxTaskPar do for i in 1..trials do var s = i:string; writeln(t.elapsed()); ``` | config | Time | | ------------------- | ------: | | w/o memTrack | 0.19s | | w/ memTrack | 144.50s | | w/ threshold before | 33.06s | | w/ threshold now | 0.22s | This is motivated by Arkouda, which uses `memTrack` as a means to detect if an operation will exceed memory. We recently noticed concurrent allocations were slower than expected and tracked it down to this. Related to 10415 Resolves 1330 Signed-off-by: Elliot Ronaghan <ronawho@gmail.com>

gbtitus

Looks great!

@gbtitus

Fix `memThreshold` memory tracking optimization [discussed with @gbtitus, full review post-commit] #18465 optimized memory leak tracking to avoid taking a lock when the real allocation size is below `memThreshold`, but there were a few bugs in the initial implementation. For cases where the memory layer doesn't support `chpl_real_alloc_size` (`CHPL_MEM=cstdlib`) we weren't handling the unknown size sentinel so we weren't ever running the free hook, which made it look like we were leaking all memory. Fix that to run the hook if the "real size" is 0 (the unknown sentinel.) This was also broken under configurations that separately allocate arrays (like `CHPL_COMM=ugni`). For those configs we were taking a comm layer allocation and trying to ask jemalloc for the size of it, but this results in mixed allocator calls, which is undefined behavior. Fix that by making the higher level interface compute the size and passing it in instead of computing it below the level where we know what allocator memory came from.

ronawho force-pushed the opt-mem-track-with-threshold branch 2 times, most recently from b501f6b to 061ddf8 Compare September 27, 2021 13:19

ronawho mentioned this pull request Sep 27, 2021

Concurrent memory allocation is slowed by memTrack Bears-R-Us/arkouda#929

Closed

ronawho force-pushed the opt-mem-track-with-threshold branch from 061ddf8 to b666b57 Compare September 27, 2021 21:51

ronawho requested a review from gbtitus September 27, 2021 22:05

ronawho marked this pull request as ready for review September 27, 2021 22:05

gbtitus approved these changes Sep 27, 2021

View reviewed changes

ronawho merged commit a115645 into chapel-lang:main Sep 28, 2021

ronawho deleted the opt-mem-track-with-threshold branch September 28, 2021 01:36

This was referenced Sep 28, 2021

Fix cstdlib memleak support #18474

Closed

Fix memThreshold memory tracking optimization #18480

Merged

ronawho mentioned this pull request Sep 29, 2021

Avoid tracking small allocations by setting memThreshold Bears-R-Us/arkouda#935

Merged

This was referenced Jan 19, 2022

Regex optimizations Bears-R-Us/arkouda#940

Closed

substring_search benchmark is timing out in nightly performance run Bears-R-Us/arkouda#917

Closed

gbtitus mentioned this pull request Apr 26, 2022

improve worst-case memory tracking performance #10415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory tracking with a `memThreshold` #18465

Optimize memory tracking with a `memThreshold` #18465

ronawho commented Sep 27, 2021 •

edited

Loading

gbtitus left a comment

Optimize memory tracking with a memThreshold #18465

Optimize memory tracking with a memThreshold #18465

Conversation

ronawho commented Sep 27, 2021 • edited Loading

gbtitus left a comment

Choose a reason for hiding this comment

Optimize memory tracking with a `memThreshold` #18465

Optimize memory tracking with a `memThreshold` #18465

ronawho commented Sep 27, 2021 •

edited

Loading