Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce active defragmentation #448

Closed
romange opened this issue Oct 30, 2022 · 5 comments · Fixed by #523
Closed

Introduce active defragmentation #448

romange opened this issue Oct 30, 2022 · 5 comments · Fixed by #523
Assignees

Comments

@romange
Copy link
Collaborator

romange commented Oct 30, 2022

Following feedback from our customers - Dragonfly suffers from external fragmentation in some cases.
This problem is common for any slab allocator like tcmalloc, jemalloc and mimalloc.
Consider the following scenario:

  1. an application allocates lots of "blocks" (from mimalloc jargon) of size 384. Then the allocator mmaps (allocates from the OS) lots of slabs that handle blocks of size 384 and those slabs take up gigabytes of memory. This is a healthy situation because usedmemory ~= rss memory.
  2. Then the application frees the vast majority of blocks across all the slabs of size 384. Most slabs are "almost" free but the allocator can not return it to the OS because some blocks are still busy.
  3. As a result one can observe external fragmentation of the "unused" memory hidden in those slabs (or pages in mimalloc jargon).
    see the graph below. This situation is problematic and this is what we need to fix.

I do not see any other solution besides active defragmentation, i.e. go over all the allocated items in the heap and grouping them together, thus compressing them in fewer slabs. Something similar is done by active-defrag in Redis using je_get_defrag_hint utility function.

We need to be smart about initial trigger conditions for the sweep and how we estimate the effectiveness of this defragmentation procedure in order not to introduce infinite loops.

image

@romange
Copy link
Collaborator Author

romange commented Oct 31, 2022

The solution is to leverage helio's OnIdle task management framework.
In addition, we should add a heuristic that can categorize which pages (areas, slabs) should be defragmented.
This is how it was initially done in redis redis/redis@e8099ca though it evolved later.

I suggest the following heuristic as a base line for mimalloc:

// taken from mimalloc private code.
static inline mi_slice_t* mi_slice_first(const mi_slice_t* slice) {
  mi_slice_t* start = (mi_slice_t*)((uint8_t*)slice - slice->slice_offset);
  return start;
}

// taken from mimalloc private code.
static inline mi_segment_t* _mi_ptr_segment(const void* p) {
  return (mi_segment_t*)((uintptr_t)p & ~MI_SEGMENT_MASK);
}

#define mi_likely(x) __builtin_expect(!!(x), true)


// returns true if page is not active, and its used blocks <= capacity * ratio
bool mi_heap_page_is_underutilized(mi_heap_t* heap, void* p, float ratio) {
  mi_segment_t* const segment = _mi_ptr_segment(p);

  // from _mi_segment_page_of
  ptrdiff_t diff = (uint8_t*)p - (uint8_t*)segment;
  size_t idx = (size_t)diff >> MI_SEGMENT_SLICE_SHIFT;
  mi_slice_t* slice0 = (mi_slice_t*)&segment->slices[idx];
  mi_slice_t* slice = mi_slice_first(slice0);  // adjust to the block that holds the page data
  mi_page_t* page = (mi_page_t*)slice;
  // end from _mi_segment_page_of //

  // from mi_page_heap
  mi_heap_t* page_heap = (mi_heap_t*)(mi_atomic_load_relaxed(&(page)->xheap));

  // the heap id matches and it is not a full page
  if (mi_likely(page_heap == heap && page->flags.x.in_full == 0)) {
    // mi_page_queue_t* pq = mi_heap_page_queue_of(heap, page);

    // first in the list, meaning it's the head of page queue, thus being used for malloc
    if (page->prev == NULL)
      return false;

    // this page belong to this heap and is not first in the page queue. Lets check its
    // utilization.
    return page->used <= unsigned(page->capacity * ratio);
  }
  return false;
}

@romange
Copy link
Collaborator Author

romange commented Oct 31, 2022

a. We can trigger defragmentation based on thread-local conditions like RSS/Used ratio in a specific thread.
We can easily track "used per thread" via EngineShard::UsedMemory. RSS usage can be approximated via committed stats in mi_heap_t.tls->stats. The thing is that we call mi_stats_merge from all the shard threads and I suspect it resets thread local heap stats during the merge. We should fix per-thread accounting if we want to use thread-local triggers.

b. Another option is to use process-global conditions like GetMallocCurrentCommitted and used_mem_current, but I prefer to explore option (a) first.

@boazsade boazsade self-assigned this Nov 10, 2022
@boazsade
Copy link
Contributor

  • Will add idle task to check whether we need to run this operation. In case this is not required, we will return the lowest value (meaning let it wait for as long as possible for this task), if we do need to do the operation, it would return the highest value for the task so that this task will start executing until the memory defrag is done.
  • There is one issue with the above options: option a will not work, see this issue with mi_malloc. so we will restore for the second option - using global from GetMallocCurrentCommitted and the value used_mem_current to calculate whether to run this or not.

@boazsade
Copy link
Contributor

This will implemented as follow:

  1. Check if memory usage is high enough
  2. Check if diff between commited and used memory is high enough
  3. Check if we have memory changes (to ensure that we not running endlessly).
  4. if all the above pass -> run on the shard and try to de-fragment memory by re-allocating values
  • if the cursor for this is signal that we are not done, schedule the task to run at high
  • priority otherwise lower the task priority so that it would not use the CPU when not required

The first version will not include the actual step of the memory re-allocation, to make this a simpler version of the
algorithm. The main reason for this is to be able to verify that we can use the Idle task and to identify that cases where this option is needed. Once we establish that this is indeed working in term of new task in the system, the low level details including the code above will be added.

boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 22, 2022
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 22, 2022
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 22, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 23, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
@boazsade
Copy link
Contributor

The next step is to add the function mi_heap_page_is_underutilized as a patch to the build of third parties at helio. This would enable the implementation of the actual defragmentation.
It should also fix the issue when building mimalloc as release and dragonfly as debug.
It would allow adding unit tests to verify that the the function can checks whether a pointer is located at a location that can be moved.

boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 24, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 24, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 27, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 28, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 28, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 28, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Nov 28, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
…onflydb#448

feat(server): memory defrag support - mi_option changed dragonflydb#448

feat(server): active fragmentation - spell check issue dragonflydb#448

feat(server): active memory defrag task dragonflydb#448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 1, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 4, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 4, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
boazsade added a commit to boazsade/dragonfly that referenced this issue Dec 4, 2022
Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
romange pushed a commit that referenced this issue Dec 4, 2022
…#523)

feat(server): active memory defrag task #448

Signed-off-by: Boaz Sade <boaz@dragonflydb.io>
romange added a commit that referenced this issue Dec 5, 2022
Complements #448.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue Dec 5, 2022
Complements #448.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants