-
Notifications
You must be signed in to change notification settings - Fork 23.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis 2.8.13 OOM crash even with maxmemory configured #2136
Comments
Either the limit is too high or there are other non-trivial users of instance memory. Please could you post server INFO output as provided on the crash trace by Redis? Thanks. |
@antirez we did attache gdb to the crashed redis when we first started it. So what is the easiest way to get the crash trace for you? We did not find crash trace in Redis log file. Any other way we can provide more information for you? This is what looks like when I grep redis related processes on the server:
pid 74655 is the crashed redis instance. |
Thanks, yep when it's killed by the OOM killer no crash report, you are right. I was curious to check if Redis was persisting on disk (forked a process) when OOM killer killed it. Btw from the OOM killer info we have some info, plus, if you have logs, you can get the whole picture:
I hope this helps, |
@antirez Thanks for the feedback and suggestion. Since last crash, we have set the
For this instance, we disabled both AOF and RDB. If memory keeps growing like this, is Redis still be a good fit for LRU cache? I have set the maxmemory to be 75% of the instance memory, and still the memory keeps growing. Then it seems to me it is not very memory efficient to use Redis as LRU cache. |
I forgot to mention that the value which keeps growing is used_memory_rss. The |
Summary:
@antirez Is there a reason the Other people have mentioned this before too. When Redis does hit Right now Redis enforces a data size limit, but not a process size limit. By enfocing the data size limit, the process size limit can grow unbounded under eviction pressure, which seems really bad. So, technically The quickest 90% fix would be to base maxmemory from RSS instead of logical memory usage: diff --git a/src/redis.c b/src/redis.c
index eef5251..6d9f131 100644
--- a/src/redis.c
+++ b/src/redis.c
@@ -3160,7 +3160,7 @@ int freeMemoryIfNeeded(void) {
/* Remove the size of slaves output buffers and AOF buffer from the
* count of used memory. */
- mem_used = zmalloc_used_memory();
+ mem_used = server.resident_set_size;
if (slaves) {
listIter li;
listNode *ln; That approach still allows memory growth beyond the limit because Redis doesn't count replication buffers towards eviction memory usage, but replication buffers tend to be small anyway (maybe? if they are small, why are we ignoring their memory usage since they don't count for much? if they are large, then... why ignore their memory usage since they can overflow the limits? boggle). [Sidenote: my favorite party trick with maxmemory + a global eviction policy: have a client run a pipeline request containing thousands of commands so Redis has to build up a huge result buffer in memory (it could even be something dumb like |
@mattsta I agreed with your suggestion. Currently the |
Note: replying to the original issue first and how it can be solved, next is a comment to reply to @mattsta. @benzheren it looks like the problem is not the LRU algorithm of Redis itself: from his point of view it is expiring memory indeed, but the fact that for your work load unfortunately jemalloc is fragmenting, however usually fragmentation is logarithmic. The best thing to do is trying to work in the reverse way: Set a memory limit that allows a much higher fragmentation, for example up to 1.6 to be sure you have enough room, and monitor the fragmentation. You can do this with CONFIG SET maxmemory at runtime (but is blocking if you don't do it progressively), but I've some concern at this point that you may have THP enabled, and this may interact with the ability of jemalloc to free memory. So also make sure at some point to disable transparent huge pages and restart the server (see http://redis.io/topics/admin if you don't know how to disable them). Let's see what happens, maybe there is something else we are not considering here, since there are some parts of Redis that use normal malloc and are not traced for memory usage, and we could have a leak there. However Lua memory usage that could be one culprit is low, so unlikely. For now the best bet is basically to consider the max fragmentation you'll experience upfront. Much more details in the next reply to Matt. |
@benzheren oh a few questions to understand what could be a cause of fragmentation:
Thanks, |
@mattsta @benzheren Limiting memory usage via RSS would be great, but it is extremely impractical, or better, impossible, if not in two specific cases:
However limiting observing the RSS is not possible since when you start freeing data, the RSS does not changes dynamically: it may remain high since the allocator does not unmap a given memory region, or can go low later, incrementally, as the allocator performs some cleanup. So if we do something like you suggest: Matt suggestion was actually implemented by me time ago, with different changes in order to try to cope with the RSS "slowness" to change, but eventually I gave up in favor of another approach I never used but looks more promising, which is: we should try to instead adjust the instantaneous memory reporting based on the fragmentation experienced. For example if the max fragmentation seen so far when max_memory is near peak_memory is 1.3, the actual instantaneous memory usage to be used for memory limit goals should be zmalloc_used_memory * fragmentation. This could improve things already, since we would start lowering the memory limit earlier, but could still be not perfect with certain patterns. For example we could reach the memory limit with perfect fragmentation of 1.0, and later the system may start to fragment. The memory reporting could start do adapt and we could start freeing more stuff, but if the objects added will not be able to use the old allocations, we could have RSS still going up, with little chances of going down in the future. However I believe this could be already pretty cool. Even to the above problem, there is a fix, which is, to start with a pessimistic fragmentation figure, for example 1.3 or more. When maxmemory is reached in this way, we keep monitoring the actual fragmentation, and slowly adapt it based on the real data. If fragmentation will change we'll adapt on the long run and use all the memory in case it is very low. The key of the implementation of such a system, is to make gentle adjustments, otherwise you get trapped into similar problems of RSS monitoring itself (but not as strong, of course). For example if you start with an estimated fragmentation F, you adapt it at every cycle by composing F like this: give F 99.99% of what it is already, and 0.01% of what the real fragmentation is. Just examples, you compute the percentage based on the delay you want it to be in "following" the current fragmentation figure. Ok, I'll try to implement this stuff monday and report back... however I would love to have some help from @benzheren because this is the kind of stuff that should be confirmed in the field. We can make Redis better together. Thanks. |
@antirez Answers to your questions:
|
Hello @benzheren and @mattsta. I investigated how to improve maxmemory today. These are my findings. First, let's start with what who is using Redis in LRU mode today should do:
Now on the topic about making the above less a pain, and more automated: I can confirm that the algorithm can't adjust itself once the user-defined maxmemory limit is reached, because if the RSS is already higher than the configured parameter, it will stay higher even if we evict 50% of the objects in memory. Jemalloc will only be able to actually reclaim memory if we evict almost the whole data set. Moreover, if we already went over the configured limit, it is already a problem. It is also not possible to take decisions once the RSS reaches the configured limit: when this happens, we'll likely find no fragmentation at all. Shit happens once the server starts to evict since maxmemory was already reached. If at this point we try to evict more objects to get a live feedback from the RSS reporting, nothing good happens, since the RSS will stay at the same level. So basically what is possible to do, is to mimic what the operation persons would do, that is, to guess a fragmentation, set a maxmemory parameter, observe what happens, and modify the setting accordingly. I implemented the above strategy and indeed, it looks to work. This is how it works:
It is still not perfect, but I've a patch for the above we can use to evaluate if it is worth it or not. You can find it here: https://github.com/antirez/redis/commits/rssmaxmemory |
p.s. the patch is not good at handling the user messing with |
@antirez more findings, currently every time Redis automatically frees up more memory ( maybe b/c of the maxmemory settings), we will get some Redis connection timeout errors on the application side. And at this moment, the
|
@benzheren thanks for the update. It is possible that the fragmentation will go down later, however it looks like there is a workload stressing the allocator. This usually happens with progressively larger objects. It is very hard to handle this well. For example, memcached would not be able to re-use the slabs of small objects easily AFAIK (confirmation welcomed), so it is not like trivially solvable using other approaches, i.e., copying what memcached is doing (this would have many other side effects for Redis). About the latency spikes, freeMemoryIfNeeded() is instrumented with the latency monitor, so please if you can enable it with I'm not sure the latency issues are due to the eviction: it is performed incrementally at every command run, so you should see it continuously. However the latency monitor will give you some good info. After some time, and when you see again latency spikes, you can use: However note that if the latency is due to transparent huge pages, the latency monitoring system will not be able to detect it, since it happens at random in non instrumented places. |
p.s. of course please post the |
Ah, more importantly, your reported fragmentation is non real:
The instance had a peak of 93 GB. Maybe you used CONFIG SET maxmemory to set a lower memory limit? RSS does not go backward, so this inflates the fragmentation figure. |
@benzheren if you just use Redis as a cache. see this document. http://www.slideshare.net/deview/2b3arcus |
Flushing all data (FLUSHALL) will make the RSS small again, but is not practical in most environments IMHO. Fragmentation of the allocator if something to deal with. Moreover IMHO @benzheren may not have a real fragmentation problem. The last reported fragmentation is high because there was a previous peak, the old fragmentation reported is 1.19 which is non perfect but non critical, so in this specific case the best thing is: set a lower limit, restart the server, observe the fragmentation with your workload, setup a new limit, again with some spare memory. Some fragmentation is something we need to deal with, probably the user reporting the issue here has nothing to complain for its use case about the fact that Redis is making a less than perfect use of memory, but has to complain about the fact that it is counter intuitive you have to set a lower limit since it is unknown when you start what the actual memory usage will be. And I agree with him, however there is no silver bullet. |
We have some INFO reports showing Some examples: |
@antirez @mattsta follow up with this issue: we've set up a new server with THP disabled (the only system level config we changed compared to our last instance server) and we set the maxmemory to a much lower value according to 1.4 In addition to that, in our application level, we change the code to decrease the size of the objects, which gets stored in redis, a lot. Now after a week, the servers is much more stable with I will follow up with more data. |
@charsyam FLUSHDB is not a good idea in production environment. Especially if you have lots of data and busy traffic, this could block the redis for noticeable amount of time. |
@benzheren In my system. we can run it with zookeeper, so server change A redis server to B redis. and |
@benzheren and I think using muliple instances in a physical server is more useful for this situation :) |
Not sure if that helps, but I have catched following case: All servers start reporting OOM error I have run Looking at code where exception is thrown I do see in comments that error is thrown if redis was unable to free up enough memory, and looking inside So the question is: is it possible that memory is taken but there is no keys or that was result of hang And should |
Hello, many things changed in Redis internals since this issue was open, but there are currently no known bugs in that code path. However the Redis 4.0 It is possible that memory is taken without keys, depends on: clients output buffers, AOF buffers, Pub/Sub backlog with slow consumers and so forth. Recent versions of Redis terminate clients when they are using too much memory and can report all this conditions. Closing this issue since as I said there is no known bug, nor there is here a precise hint about something specific not working as expected (with full investigation and outputs of INFO and so forth), so there is nothing I can proceed with. |
We are running single Redis 2.8.13 instance on AWS EC2 instance with 122 GB memory, we try to config it as LRU cache with configurations like:
We disabled both AOF and RDB persistence.
There is no crash log in redis log output, but the
dmesg
output after crash is[605806.165743] cron invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[605806.165748] cron cpuset=/ mems_allowed=0
[605806.165750] CPU: 7 PID: 1401 Comm: cron Not tainted 3.13.0-29-generic #53-Ubuntu
[605806.165751] Hardware name: Xen HVM domU, BIOS 4.2.amazon 06/02/2014
[605806.165753] 0000000000000000 ffff881dead09980 ffffffff8171a214 ffff8800376617f0
[605806.165756] ffff881dead09a08 ffffffff81714b4f 0000000000000000 0000000000000000
[605806.165759] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[605806.165761] Call Trace:
[605806.165768] [] dump_stack+0x45/0x56
[605806.165771] [] dump_header+0x7f/0x1f1
[605806.165774] [] oom_kill_process+0x1ce/0x330
[605806.165778] [] ? security_capable_noaudit+0x15/0x20
[605806.165780] [] out_of_memory+0x414/0x450
[605806.165783] [] __alloc_pages_nodemask+0xa5c/0xb80
[605806.165786] [] alloc_pages_current+0xa3/0x160
[605806.165789] [] __page_cache_alloc+0x97/0xc0
[605806.165792] [] filemap_fault+0x185/0x410
[605806.165795] [] __do_fault+0x6f/0x530
[605806.165798] [] handle_mm_fault+0x492/0xf10
[605806.165801] [] ? arch_vtime_task_switch+0x94/0xa0
[605806.165803] [] ? vtime_common_task_switch+0x3d/0x40
[605806.165806] [] ? finish_task_switch+0x128/0x170
[605806.165809] [] __do_page_fault+0x184/0x560
[605806.165813] [] ? sched_clock+0x9/0x10
[605806.165815] [] ? sched_clock_local+0x1d/0x80
[605806.165818] [] ? acct_account_cputime+0x1c/0x20
[605806.165820] [] ? account_user_time+0x8b/0xa0
[605806.165822] [] ? vtime_account_user+0x54/0x60
[605806.165824] [] do_page_fault+0x1a/0x70
[605806.165827] [] page_fault+0x28/0x30
[605806.165829] Mem-Info:
[605806.165830] Node 0 DMA per-cpu:
[605806.165832] CPU 0: hi: 0, btch: 1 usd: 0
[605806.165833] CPU 1: hi: 0, btch: 1 usd: 0
[605806.165834] CPU 2: hi: 0, btch: 1 usd: 0
[605806.165835] CPU 3: hi: 0, btch: 1 usd: 0
[605806.165836] CPU 4: hi: 0, btch: 1 usd: 0
[605806.165837] CPU 5: hi: 0, btch: 1 usd: 0
[605806.165838] CPU 6: hi: 0, btch: 1 usd: 0
[605806.165838] CPU 7: hi: 0, btch: 1 usd: 0
[605806.165839] CPU 8: hi: 0, btch: 1 usd: 0
[605806.165840] CPU 9: hi: 0, btch: 1 usd: 0
[605806.165842] CPU 10: hi: 0, btch: 1 usd: 0
[605806.165843] CPU 11: hi: 0, btch: 1 usd: 0
[605806.165844] CPU 12: hi: 0, btch: 1 usd: 0
[605806.165844] CPU 13: hi: 0, btch: 1 usd: 0
[605806.165845] CPU 14: hi: 0, btch: 1 usd: 0
[605806.165846] CPU 15: hi: 0, btch: 1 usd: 0
[605806.165847] Node 0 DMA32 per-cpu:
[605806.165849] CPU 0: hi: 186, btch: 31 usd: 0
[605806.165850] CPU 1: hi: 186, btch: 31 usd: 0
[605806.165851] CPU 2: hi: 186, btch: 31 usd: 0
[605806.165851] CPU 3: hi: 186, btch: 31 usd: 0
[605806.165852] CPU 4: hi: 186, btch: 31 usd: 139
[605806.165853] CPU 5: hi: 186, btch: 31 usd: 0
[605806.165854] CPU 6: hi: 186, btch: 31 usd: 52
[605806.165855] CPU 7: hi: 186, btch: 31 usd: 30
[605806.165856] CPU 8: hi: 186, btch: 31 usd: 0
[605806.165857] CPU 9: hi: 186, btch: 31 usd: 0
[605806.165858] CPU 10: hi: 186, btch: 31 usd: 0
[605806.165859] CPU 11: hi: 186, btch: 31 usd: 0
[605806.165860] CPU 12: hi: 186, btch: 31 usd: 0
[605806.165861] CPU 13: hi: 186, btch: 31 usd: 0
[605806.165862] CPU 14: hi: 186, btch: 31 usd: 0
[605806.165863] CPU 15: hi: 186, btch: 31 usd: 0
[605806.165864] Node 0 Normal per-cpu:
[605806.165865] CPU 0: hi: 186, btch: 31 usd: 0
[605806.165866] CPU 1: hi: 186, btch: 31 usd: 0
[605806.165867] CPU 2: hi: 186, btch: 31 usd: 0
[605806.165868] CPU 3: hi: 186, btch: 31 usd: 0
[605806.165869] CPU 4: hi: 186, btch: 31 usd: 0
[605806.165870] CPU 5: hi: 186, btch: 31 usd: 0
[605806.165871] CPU 6: hi: 186, btch: 31 usd: 0
[605806.165872] CPU 7: hi: 186, btch: 31 usd: 0
[605806.165873] CPU 8: hi: 186, btch: 31 usd: 0
[605806.165874] CPU 9: hi: 186, btch: 31 usd: 0
[605806.165875] CPU 10: hi: 186, btch: 31 usd: 0
[605806.165876] CPU 11: hi: 186, btch: 31 usd: 0
[605806.165877] CPU 12: hi: 186, btch: 31 usd: 0
[605806.165878] CPU 13: hi: 186, btch: 31 usd: 0
[605806.165879] CPU 14: hi: 186, btch: 31 usd: 0
[605806.165880] CPU 15: hi: 186, btch: 31 usd: 0
[605806.165882] active_anon:31039555 inactive_anon:76 isolated_anon:0
[605806.165882] active_file:60 inactive_file:60 isolated_file:0
[605806.165882] unevictable:0 dirty:0 writeback:0 unstable:0
[605806.165882] free:139633 slab_reclaimable:6506 slab_unreclaimable:11931
[605806.165882] mapped:7 shmem:95 pagetables:61334 bounce:0
[605806.165882] free_cma:0
[605806.165885] Node 0 DMA free:15904kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[605806.165889] lowmem_reserve[]: 0 3744 122934 122934
[605806.165891] Node 0 DMA32 free:478708kB min:2056kB low:2568kB high:3084kB active_anon:3317128kB inactive_anon:20kB active_file:36kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3836820kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:20kB slab_reclaimable:1292kB slab_unreclaimable:1440kB kernel_stack:40kB pagetables:29904kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9510 all_unreclaimable? yes
[605806.165895] lowmem_reserve[]: 0 0 119190 119190
[605806.165897] Node 0 Normal free:63920kB min:65516kB low:81892kB high:98272kB active_anon:120841092kB inactive_anon:284kB active_file:204kB inactive_file:204kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:124067840kB managed:122050804kB mlocked:0kB dirty:0kB writeback:0kB mapped:24kB shmem:360kB slab_reclaimable:24732kB slab_unreclaimable:46284kB kernel_stack:4392kB pagetables:215432kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:850 all_unreclaimable? yes
[605806.165900] lowmem_reserve[]: 0 0 0 0
[605806.165902] Node 0 DMA: 0_4kB 0_8kB 0_16kB 1_32kB (U) 2_64kB (U) 1_128kB (U) 1_256kB (U) 0_512kB 1_1024kB (U) 1_2048kB (R) 3_4096kB (M) = 15904kB
[605806.165910] Node 0 DMA32: 1454_4kB (UEM) 3121_8kB (UEM) 1970_16kB (UEM) 919_32kB (EM) 469_64kB (UEM) 293_128kB (UEM) 212_256kB (EM) 60_512kB (UM) 157_1024kB (UM) 0_2048kB 18_4096kB (MR) = 478720kB
[605806.165920] Node 0 Normal: 1085_4kB (UEM) 510_8kB (UEM) 255_16kB (UEM) 345_32kB (UEM) 159_64kB (UEM) 71_128kB (UEM) 26_256kB (E) 14_512kB (EM) 7_1024kB (U) 0_2048kB 0*4096kB = 63796kB
[605806.165929] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[605806.165930] 221 total pagecache pages
[605806.165931] 0 pages in swap cache
[605806.165932] Swap cache stats: add 0, delete 0, find 0/0
[605806.165933] Free swap = 0kB
[605806.165934] Total swap = 0kB
[605806.165935] 31999901 pages RAM
[605806.165936] 0 pages HighMem/MovableOnly
[605806.165937] 504259 pages reserved
[605806.165937] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[605806.165947] [ 691] 0 691 12473 222 27 0 -1000 systemd-udevd
[605806.165949] [ 897] 0 897 2556 575 8 0 0 dhclient
[605806.165951] [ 1283] 101 1283 65115 3682 35 0 0 rsyslogd
[605806.165952] [ 1370] 0 1370 3635 41 12 0 0 getty
[605806.165953] [ 1373] 0 1373 3635 39 12 0 0 getty
[605806.165955] [ 1378] 0 1378 3635 42 12 0 0 getty
[605806.165956] [ 1379] 0 1379 3635 41 12 0 0 getty
[605806.165958] [ 1382] 0 1382 3635 40 12 0 0 getty
[605806.165959] [ 1401] 0 1401 5914 56 17 0 0 cron
[605806.165960] [ 1402] 0 1402 4785 42 13 0 0 atd
[605806.165962] [ 1414] 0 1414 15341 169 33 0 -1000 sshd
[605806.165963] [ 1435] 0 1435 1092 35 8 0 0 acpid
[605806.165965] [ 1436] 102 1436 9803 105 23 0 0 dbus-daemon
[605806.165967] [ 1442] 0 1442 4863 117 13 0 0 irqbalance
[605806.165968] [ 1456] 0 1456 10883 90 26 0 0 systemd-logind
[605806.165970] [ 1512] 0 1512 3635 41 12 0 0 getty
[605806.165971] [ 1514] 0 1514 3197 39 12 0 0 getty
[605806.165973] [ 3073] 0 3073 4869 51 13 0 0 upstart-udev-br
[605806.165974] [ 3077] 0 3077 3819 58 12 0 0 upstart-file-br
[605806.165976] [ 3078] 0 3078 3815 58 11 0 0 upstart-socket-
[605806.165977] [39705] 999 39705 19922 941 39 0 0 gmond
[605806.165979] [74655] 998 74655 31118305 31025991 60725 0 0 redis-server
[605806.165981] [76879] 1003 76879 6511 175 16 0 0 screen
[605806.165982] [76880] 1003 76880 5510 691 15 0 0 bash
[605806.165984] [77322] 0 77322 15918 116 34 0 0 sudo
[605806.165985] [77323] 0 77323 17122 3850 37 0 0 gdb
[605806.165986] [82568] 0 82568 26408 246 55 0 0 sshd
[605806.165988] [82665] 1003 82665 26408 249 53 0 0 sshd
[605806.165989] [82666] 1003 82666 5535 707 15 0 0 bash
[605806.165991] Out of memory: Kill process 74655 (redis-server) score 987 or sacrifice child
[605806.172830] Killed process 74655 (redis-server) total-vm:124473220kB, anon-rss:124103964kB, file-rss:0kB
My question is why even with maxmemory config, OOM still happens. I happened to come cross this article online: http://www.couyon.net/blog/using-redis-as-a-lru-cache-dont-do-it.
Is there something more we should pay attention to when we use Redis as LRU cache?
The text was updated successfully, but these errors were encountered: