Skip to content

Commit

Permalink
Optimize memory tracking with a memThreshold
Browse files Browse the repository at this point in the history
Memory tracking uses a global mutex to serialize access to a hash table,
this makes concurrent allocations very slow. Previously, even if a
memory threshold was used we would still grab the table lock when
free'ing because we didn't know the pointer size. Here add a
`chpl_mem_real_alloc_size` that will use the jemalloc API to ask for the
real size of the allocation before acquiring the lock. This allows us
to avoid taking the lock when free'ing allocations below the threshold,
which saves a lot of time.

Note that `chpl_mem_real_alloc_size` returns the real allocation size,
not requested size. e.g. (`chpl_real_alloc_size(chpl_malloc(7))` returns
`8`. This means that we'll still do some unnecessary locking if the
allocation size is between the requested size and the real size of an
allocation. If we wanted to avoid that we could silently adjust
memThreshold up to the next allocation size class.

Here's a concurrent allocation micro-benchmark that demonstrates the
overhead. Results are on 128-core Rome CPU:

```chpl
use Time;
config const trials = 1_000_000;

var t: Timer; t.start();
coforall 1..here.maxTaskPar do
  for i in 1..trials do
    var s = i:string;
writeln(t.elapsed());
```

| config              | Time    |
| ------------------- | ------: |
| w/o memTrack        |   0.19s |
| w/ memTrack         | 144.50s |
| w/ threshold before |  33.06s |
| w/ threshold now    |   0.22s |

This is motivated by Arkouda, which uses `memTrack` as a means to detect
if an operation will exceed memory. We recently noticed concurrent
allocations were slower than expected and tracked it down to this.

Related to 10415

Signed-off-by: Elliot Ronaghan <ronawho@gmail.com>
  • Loading branch information
ronawho committed Sep 27, 2021
1 parent aba916c commit 061ddf8
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 17 deletions.
8 changes: 8 additions & 0 deletions runtime/include/chpl-mem.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,14 @@ static inline size_t chpl_mem_good_alloc_size(size_t minSize, int32_t lineno, in
return chpl_good_alloc_size(minSize);
}

// Query the allocator to ask for the size of an existing allocation.
//
// If an allocator does not have the ability to get this information, 0 will be
// returned.
static inline size_t chpl_mem_real_alloc_size(void* ptr, int32_t lineno, int32_t filename) {
return chpl_real_alloc_size(ptr);
}

// free a c_string, no error checking.
// The argument type is explicitly c_string, since only an "owned" string
// should be freed.
Expand Down
4 changes: 4 additions & 0 deletions runtime/include/mem/cstdlib/chpl-mem-impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ static inline size_t chpl_good_alloc_size(size_t minSize) {
#endif
}

static inline size_t chpl_real_alloc_size(void* ptr) {
return 0;
}

#define CHPL_USING_CSTDLIB_MALLOC 1

#ifdef __cplusplus
Expand Down
6 changes: 6 additions & 0 deletions runtime/include/mem/jemalloc/chpl-mem-impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ extern "C" {
#define CHPL_JE_RALLOCX CHPL_JE_(rallocx)
#define CHPL_JE_DALLOCX CHPL_JE_(dallocx)
#define CHPL_JE_NALLOCX CHPL_JE_(nallocx)
#define CHPL_JE_SALLOCX CHPL_JE_(sallocx)
#define CHPL_JE_MALLCTL CHPL_JE_(mallctl)


Expand Down Expand Up @@ -107,6 +108,11 @@ static inline size_t chpl_good_alloc_size(size_t minSize) {
return CHPL_JE_NALLOCX(minSize, MALLOCX_NO_FLAGS);
}

static inline size_t chpl_real_alloc_size(void* ptr) {
return CHPL_JE_SALLOCX(ptr, MALLOCX_NO_FLAGS);
}


#ifdef __cplusplus
}
#endif
Expand Down
36 changes: 19 additions & 17 deletions runtime/src/chplmemtrack.c
Original file line number Diff line number Diff line change
Expand Up @@ -685,25 +685,27 @@ void chpl_track_malloc(void* memAlloc, size_t number, size_t size,


void chpl_track_free(void* memAlloc, int32_t lineno, int32_t filename) {
memTableEntry* memEntry = NULL;
if (chpl_memTrack) {
memTrack_lock();
memEntry = removeMemTableEntry(memAlloc);
if (memEntry) {
if (chpl_verbose_mem) {
fprintf(memLogFile, "%" PRI_c_nodeid_t ": %s:%" PRId32
": free %zuB of %s at %p\n",
chpl_nodeID, (filename ? chpl_lookupFilename(filename) : "--"),
lineno, memEntry->number * memEntry->size,
chpl_mem_descString(memEntry->description), memAlloc);
if (chpl_mem_real_alloc_size(memAlloc, lineno, filename) > memThreshold) {
memTableEntry* memEntry = NULL;
if (chpl_memTrack) {
memTrack_lock();
memEntry = removeMemTableEntry(memAlloc);
if (memEntry) {
if (chpl_verbose_mem) {
fprintf(memLogFile, "%" PRI_c_nodeid_t ": %s:%" PRId32
": free %zuB of %s at %p\n",
chpl_nodeID, (filename ? chpl_lookupFilename(filename) : "--"),
lineno, memEntry->number * memEntry->size,
chpl_mem_descString(memEntry->description), memAlloc);
}
sys_free(memEntry);
}
sys_free(memEntry);
memTrack_unlock();
} else if (chpl_verbose_mem && !memEntry) {
fprintf(memLogFile, "%" PRI_c_nodeid_t ": %s:%" PRId32 ": free at %p\n",
chpl_nodeID, (filename ? chpl_lookupFilename(filename) : "--"),
lineno, memAlloc);
}
memTrack_unlock();
} else if (chpl_verbose_mem && !memEntry) {
fprintf(memLogFile, "%" PRI_c_nodeid_t ": %s:%" PRId32 ": free at %p\n",
chpl_nodeID, (filename ? chpl_lookupFilename(filename) : "--"),
lineno, memAlloc);
}
}

Expand Down

0 comments on commit 061ddf8

Please sign in to comment.