Redis 3.2.0 crashed by signal: 11 #3607

racielrod · 2016-11-14T05:31:45Z

From the logs:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
981:M 13 Nov 22:04:13.019 # Redis 3.2.0 crashed by signal: 11
981:M 13 Nov 22:04:13.019 # Crashed running the instuction at: 0x7f1eb624ea44
981:M 13 Nov 22:04:13.020 # Accessing address: 0x3735010200
981:M 13 Nov 22:04:13.020 # Failed assertion: (:0)

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(+0x153a44)[0x7f1eb624ea44]

Backtrace:
/usr/local/bin/redis-server(logStackTrace+0x29)[0x45dfc9]
/usr/local/bin/redis-server(sigsegvHandler+0xaa)[0x45e4fa]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f1eb64d0330]
/lib/x86_64-linux-gnu/libc.so.6(+0x153a44)[0x7f1eb624ea44]
/usr/local/bin/redis-server(clusterLoadConfig+0xb9)[0x463229]
/usr/local/bin/redis-server(clusterInit+0xfd)[0x464d6d]
/usr/local/bin/redis-server(initServer+0x40c)[0x42adec]
/usr/local/bin/redis-server(main+0x48a)[0x41e89a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f1eb611cf45]
/usr/local/bin/redis-server[0x41ea92]

------ INFO OUTPUT ------
945:M 13 Nov 22:24:16.653 * Increased maximum number of open files to 10032 (it was originally set to 1024).

Info Server

Server

redis_version:3.2.0
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:13911a99b348671e
redis_mode:cluster
os:Linux 4.2.0-41-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.4
process_id:971
run_id:b7391f7ede1a59304f9069c85200573e8d5bd721
tcp_port:6379
uptime_in_seconds:6694
uptime_in_days:0
hz:10
lru_clock:2706637
executable:/usr/local/bin/redis-server
config_file:/etc/redis/6379.conf

This happened after the 3 HYPER-V hosts were reset by mistake. There are 6 nodes in total distributed in 3 physical servers.

The rest of the nodes are fine. I can't get this one to work after this issue.

Any ideas how do I recover from this?
I can setup another slave and join it to the cluster but ideally I would like to use the one I'm not able to start.

antirez · 2016-11-14T07:50:58Z

Hello please, could you share with me the nodes.conf file that caused the crash?

antirez · 2016-11-14T07:52:04Z

Also if possible please update to latest 3.2.x, it is able to report more info during crashes. Thanks.

racielrod · 2016-11-14T12:18:29Z

Thanks for looking at this so quickly.
Please find attached the nodes.conf (renamed to nodes.png) for the node that is crashing.
Note the node crashing is 192.168.10.38.

Additional info:

It looks like the 3 masters were started first and the slaves were started 30 seconds after.
The slave 192.168.10.39 never took over as a mater after 192.168.10.38 crashed and failed to start.
I had to do a "cluster failover force" on 192.168.10.39 to make the cluster operational.

Not sure if the info above would add any value, but I figured it wouldn't hurt mentioning some of the facts.
Should I update the failing node to 3.2.5? Would it be able to work with the rest of the nodes if they are running 3.2.0?

My main goal is to restore the cluster to be fully operational with the minimum impact possible during the day. I can update all the nodes tonight.

Thanks again!

antirez · 2016-11-14T22:23:15Z

Hello, I've a fever so not really able to analyze the situation, however I've an idea of what is going wrong here. For now I hope this helps you: to restart try to remove the final line in nodes.conf file, that is just a lot of zero bytes if you edit it with vim/emac/whatever. After removing the final strange line the clusters should start.

antirez · 2016-11-14T22:24:31Z

Btw the original bug causing this is related to the way the cluster configuration file is generated with the help of truncate. I'll investigate and fix as soon as my fever is gone and I'm back at the PC.

antirez · 2016-11-14T22:24:45Z

p.s. Thanks a lot for your help

racielrod · 2016-11-14T22:33:44Z

I took the long route and wipe out Redis from the crashing node and re-install. I removed the node from the cluster and added it back with redis-trib.rb add-node --slave.
I was planning to update the production cluster to 3.2.5 but will hold until the fix for this particular issue is release.

Thank you!

antirez · 2016-11-16T16:51:58Z

Thanks @racielrod. A very important point is the following: what happened to your VMs, is similar to a sudden power outage? The file zero-padding could be a result of lack of flushing of metadata of the file. Btw I pushed a patch that is able to ignore zero-padding when loading the file, so this could be avoided the next time, however other corruptions are possible, so I'm also exploring the idea to change the implementation to use rename instead of write + truncate.

racielrod · 2016-11-16T19:06:43Z

@antirez it was a very similar scenario. Someone took those hosts down for maintenance all at once on a maintenance window.
We are making sure we are manually failing over now and doing maintenance one at a time, to avoid this issue in the future.
I'm glad this mistake helped find and edge case here.

antirez · 2017-01-26T16:19:34Z

Thanks, opened an issue about switching to rename(). Also the code was modified when this was reported in order to avoid crashing on trailing zeroes.

antirez added cluster crash report labels Nov 14, 2016

antirez mentioned this issue Jan 26, 2017

Redis Cluster: use rename instead of write+truncate when writing nodes.conf #3779

Open

antirez closed this as completed Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis 3.2.0 crashed by signal: 11 #3607

Redis 3.2.0 crashed by signal: 11 #3607

racielrod commented Nov 14, 2016 •

edited

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

racielrod commented Nov 14, 2016

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

racielrod commented Nov 14, 2016

antirez commented Nov 16, 2016

racielrod commented Nov 16, 2016 •

edited

antirez commented Jan 26, 2017

Redis 3.2.0 crashed by signal: 11 #3607

Redis 3.2.0 crashed by signal: 11 #3607

Comments

racielrod commented Nov 14, 2016 • edited

Server

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

racielrod commented Nov 14, 2016

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

antirez commented Nov 14, 2016

racielrod commented Nov 14, 2016

antirez commented Nov 16, 2016

racielrod commented Nov 16, 2016 • edited

antirez commented Jan 26, 2017

racielrod commented Nov 14, 2016 •

edited

racielrod commented Nov 16, 2016 •

edited