Sentinel crashes when receiving unexpected replies from the Redis server. #3737

Open
yurongga opened this Issue Jan 9, 2017 · 5 comments

Projects

None yet

2 participants

@yurongga
yurongga commented Jan 9, 2017

Hello,

We are using redis with version 2.8.19. We have around 500 redis groups to be monitored by 5 sentinels. when we add monitoring around at 400 groups, sentinel failed and reported an assertion error below:

redis-sentinel: async.c:450: redisProcessCallbacks: Assertion (c->flags & 0x20 || c->flags & 0x40) failed.

Jan 6 18:17:14 SERVER abrt[53743]: Saved core dump of pid 49701 (/opt/app/redis/sbin/redis-sentinel) to /var/spool/abrt/ccpp-2017-01-06-18:17:13-49701 (159539200 bytes)

Jan 6 18:17:14 SERVER abrtd: Directory 'ccpp-2017-01-06-18:17:13-49701' creation detected Jan 6 18:17:14 SERVER abrtd: Package 'XXXX-Redis' isn't signed with proper key

Jan 6 18:17:14 SERVER abrtd: 'post-create' on '/var/spool/abrt/ccpp-2017-01-06-18:17:13-49701' exited with 1

Jan 6 18:17:14 SERVER abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2017-01-06-18:17:13-49701', deleting

Tried to add one group after every 30 seconds, but still encounter the error near after 400 groups. Tried many times, encountered the same error some times after 300 groups, and some time after 400 groups.
Rebuild a new machine to host sentinel from scratch, and the same error encountered. any idea what could be the possible reason?

@antirez
Owner
antirez commented Jan 13, 2017

Hello, what version of Sentinel are you using? It is strongly advised to use the latest version of Sentinel available. 2.8.x is no longer supported.

@yurongga

Thank you very much for your reply

Sentinel we are using is 2.8.19. The sentinel code is included in the redis code, so it is not a small task to upgrade to the latest version. (We have hundreds of redis instance in production). When assertion triggered, c->flags value is REDIS_CONNECTED (not in REDIS_monitoring state or in REDIS_SUBSCRIBED state).

We tried to reduce down the master it is monitoring from 500 -> 400 -> 300 -> 200 -> 100. Still encounter the assertion failure when master number are 200. We will try to see if it works if mater number is 100.

@yurongga

we analyzed the dump, seems like in reply, one redis replies with PONG, PONG, PONG, and then followed by "unsupported command", which messed up the reply message processing. Suspecting that one redis master does not work well. We will capture a network trace to see which redis server replies with "unsupported command", and its status.

(gdb) info local
c = 0x15f51c0
cb = {next = 0x0, fn = 0x468c02 , privdata = 0x0}
reply = 0x173e240
status =
counter = 2
PRETTY_FUNCTION = "redisProcessCallbacks"
(gdb) p c->reader->buf
$14 = 0x1c2dd20 "+PONG\r\n:0\r\n+PONG\r\n+PONG\r\n-unsupported command:\r\n:0\r\n"
(gdb) p c->reader->reply
$15 = (void *) 0x0
(gdb)

@yurongga

The root cause is identified. It is because we have a customized Redis Server, which changed the command set. When Sentinel send command to that specific Redis Server, and the command cannot be recognized, and reply with unsupported command, which causes the problem.

@antirez
Owner
antirez commented Jan 20, 2017

Yet Sentinel should not crash when receiving non conforming input. I'll fix it, thanks.

@antirez antirez added the sentinel label Jan 20, 2017
@antirez antirez added this to the Urgent milestone Jan 20, 2017
@antirez antirez changed the title from redis-sentinel assertion failure to Sentinel crashes when receiving unexpected replies from the Redis server. Jan 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment