Active defrag v2 #4691

oranagra · 2018-02-18T15:57:45Z

First commit: Avoid latency spikes when defragging huge keys

big keys are not defragged in one go from within the dict scan instead they are scanned in parts after the main dict hash bucket is done.
add latency monitor sample for defrag
change default active-defrag-cycle-min to induce lower latency
make active defrag start a new scan right away if needed, so it's easier (for the test suite) to detect when it's done
make active defrag quit the current cycle after each db / big key
defrag some non key long term global allocations
some refactoring for smaller functions and more reusable code

Second commit: Adding real allocator fragmentation to INFO and MEMORY command + active defrag test

other fixes / improvements:

LUA script memory isn't taken from zmalloc (taken from libc malloc) so it can cause high fragmentation ratio to be displayed (which is false)
there was a problem with "fragmentation" info being calculated from RSS and used_memory sampled at different times (now sampling them together)

other details:

adding a few more allocator info fields to INFO and MEMORY commands
improve defrag test to measure defrag latency of big keys
increasing the accuracy of the defrag test (by looking at real grag info), this way we can use an even lower threshold and still avoid false positives
keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
add these the MEMORY DOCTOR command
deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
reduce sampling rate of the rss and allocator info

- big keys are not defragged in one go from within the dict scan instead they are scanned in parts after the main dict hash bucket is done. - add latency monitor sample for defrag - change default active-defrag-cycle-min to induce lower latency - make active defrag start a new scan right away if needed, so it's easier (for the test suite) to detect when it's done - make active defrag quick the current cycle after each db / big key - defrag some non key long term global allocations - some refactoring for smaller functions and more reusable code - during dict rehashing, one scan iteration of the dict, can end up scanning one bucket in the smaller dict and many many buckets in the larger dict. so waiting for 16 scan iterations before checking the time, may be much too long.

…ve defrag test other fixes / improvements: - LUA script memory isn't taken from zmalloc (taken from libc malloc) so it can cause high fragmentation ratio to be displayed (which is false) - there was a problem with "fragmentation" info being calculated from RSS and used_memory sampled at different times (now sampling them together) other details: - adding a few more allocator info fields to INFO and MEMORY commands - improve defrag test to measure defrag latency of big keys - increasing the accuracy of the defrag test (by looking at real grag info) this way we can use an even lower threshold and still avoid false positives - keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things - add these the MEMORY DOCTOR command - deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used") - reduce sampling rate of the rss and allocator info

oranagra · 2018-03-14T05:51:42Z

@antirez from time to time people keep complaining that defrag doesn't work. (last example: #4751)
please consider merging this PR, second commit should clear that confusion.

antirez · 2018-03-14T10:32:57Z

Hello @oranagra, I'm going to merge this because I believe it's too complex for me every time to dig into the defragmentation code, so I nominate you, if you accept, the maintainer of this Redis sub-systems and will just merge anything you PR. However if I understand correctly, this changes also non-only-defrag related things, like the way fragmentation is computed.

Questions:

Is the total fragmentation fixed just accounting for Lua memory (good idea indeed), and sampling at the same time (also cool)?
If there is any other change, could you explain the changes at higher level, and tell me if we now hook to APIs (Jemalloc or whatever) that may be slower sometimes introducing any latency spike? So far we just queried our total memory allocations + /proc which is also just technically a file system but actually we can kinda imagine it being real-time.

Remarks:

Thanks for improving MEMORY DOCTOR this is cool.
Thanks for all the tests.

Thank you

oranagra · 2018-03-14T12:48:46Z

Hi.
This sounds like a good plan, especially since i did do intensive code review for that code with @guybe7, and we also QA it internally at RL. (and considering that this is off by default and marked as experimental, that's low risk anyway).

I've split that PR into two commits so that it'll be easier for you to review just the second one (which has changes outside the defrag code).
despite that, I do think you might also want to take a quick look at the changes of the first commit too, the parts outside that defrag.c file (e.g. redis.conf).

Anyway, regarding the second commit and the changes it brings to redis INFO / MEMORY command:

The idea this time was not to change the behavior of the old info fields (they report the same measurement that they used to report), but now, i add new info fields with more info as to what this RSS overhead (incorrectly called "fragmentation") is made of.
in this case, the old fragmentation_ratio field will still include the lua overhead (no change here).
but the new allocator_frag field will not include it (even if both redis and LUA are using glibc malloc).

the thing about sampling memory usage and RSS at the same time is actually a bugfix, otherwise the ratio you calculate can be wrong if the RSS sampled at the server cron was very different than the memory usage at the time of infoCommand.

i will obviously turn your attention to risky things i'm adding, even if you won't review them.
in this case i did add calls to je_mallctl() to collect stats from the server cron, but i also reduced the rate in which they are called (and also the existing call to zmalloc_get_rss) to no more than one in 10ms (in case someone increases hz. i don't believe this is problematic, but if you do, we can reduce the rate further.

all the other changes in that commit are just adding more fields to MEMORY and INFO commands and improving the DOCTOR.

oranagra · 2018-03-15T12:35:12Z

@antirez i forgot to tag you for the above reply.

antirez · 2018-03-16T10:27:02Z

Thank you @oranagra, so I'll follow your advice and today I'll read both the commit and merge or comment. Also thanks for splitting the PR into two commits for easy reading.

antirez · 2018-03-22T08:16:28Z

Hello @oranagra, I read the two commits, on the big one implementing defrag V2 I only scanned the different functions to get an idea of what it does, the code seems well written to me and the improvement in the latency very important, however I cannot claim that my was a review able to uncover any bug. It will be cool if we will be able to reserve some "office time" in San Francisco in case you are willing to give me a 20 minutes tour on how it exactly works. Also it is pretty clear that if we go with a radix-tree+listpacks for all the data types (but sorted sets), this will be a big win from the POV of the simplicity that can be achieved inside the defragmentation code, a single code path for Streams, Lists (we should move away from quicklists as well), Hashes, Sets, would be cool.

About the second commit, it looks cool as well, also the hints in MEMORY DOCTOR are useful. I'm merging everything as it is, but the only thing I would like to improve before Redis 5.0 RC1 is the naming of the fields. mem_fragmentation vs allocator_fragmentation is complex to understand. I wonder if we can pick more specific names, and document somewhere effectively the difference between all those.

So starting with merging the PR, and we can continue later from here. Thanks for your work!

oranagra · 2018-03-25T06:29:34Z

@antirez thanks for merging.
Sure, if we can find some time in RedisConf i'll be happy to go over it with you.

Regarding the names, i originally wanted to change the meaning of mem_fragmentation and have it show what's currently displayed in allocator_fragmentation, and add a field named rss_overhead that will show what's currently displayed in mem_fragmentation.
I think that's more "accurate" but i understand we may want to keep some backwards compatibility, and also show the most inclusive (highest ratio) number were people are used to look for it, and then expose details elsewhere.

if you want to suggest some alternative names, please give an example, and we'll go from there.
thanks again.

Active defrag v2

oranagra mentioned this pull request Feb 18, 2018

add test for active defrag, fix and improve fragmentation info #3786

Closed

oranagra force-pushed the active_defrag_v2 branch 2 times, most recently from 17f747a to d5f5b83 Compare February 20, 2018 09:13

oranagra force-pushed the active_defrag_v2 branch from d5f5b83 to 03917bd Compare March 7, 2018 08:36

oranagra added 2 commits March 12, 2018 15:07

oranagra force-pushed the active_defrag_v2 branch from 03917bd to 806736c Compare March 12, 2018 13:09

oranagra mentioned this pull request Mar 12, 2018

Active defrag not working on Redis 4.0.3 ? #4751

Closed

antirez merged commit da62178 into redis:unstable Mar 22, 2018

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Apr 10, 2018

Merge pull request redis#4691 from oranagra/active_defrag_v2

7105ab4

Active defrag v2

pulllock pushed a commit to pulllock/redis that referenced this pull request Jun 28, 2023

Merge pull request redis#4691 from oranagra/active_defrag_v2

6e09787

Active defrag v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active defrag v2 #4691

Active defrag v2 #4691

oranagra commented Feb 18, 2018 •

edited

oranagra commented Mar 14, 2018

antirez commented Mar 14, 2018

oranagra commented Mar 14, 2018

oranagra commented Mar 15, 2018

antirez commented Mar 16, 2018

antirez commented Mar 22, 2018

oranagra commented Mar 25, 2018

Active defrag v2 #4691

Active defrag v2 #4691

Conversation

oranagra commented Feb 18, 2018 • edited

oranagra commented Mar 14, 2018

antirez commented Mar 14, 2018

oranagra commented Mar 14, 2018

oranagra commented Mar 15, 2018

antirez commented Mar 16, 2018

antirez commented Mar 22, 2018

oranagra commented Mar 25, 2018

oranagra commented Feb 18, 2018 •

edited