2.6 Branch Segfault #391

Closed
janoberst opened this Issue Mar 15, 2012 · 7 comments

Projects

None yet

2 participants

@janoberst
Contributor

When trying out the current 2.6 branch I got a random segfault.

I'm sorry, but I don't know at which point this instance crashed. It's either because I started a slave that uses the crashed instance as its master, or my application started sending requests. Judging on the connected clients I assume it's just the slave connection.

Let me know if you need more info on config/OS/data etc.

client buffer config:

client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
=== REDIS BUG REPORT START: Cut & paste starting from here ===
[10700] 15 Mar 18:52:27 #     Redis 2.5.2 crashed by signal: 11
[10700] 15 Mar 18:52:27 #     Failed assertion: <no assertion failed> (<no file>:0)
[10700] 15 Mar 18:52:27 # --- STACK TRACE
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(sdsAllocSize+0) [0x415e20]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(sdsAllocSize+0) [0x415e20]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(clientsCronResizeQueryBuffer+0xd) [0x4127ed]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(clientsCron+0x73) [0x4128e3]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(serverCron+0x13d) [0x413b4d]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(aeProcessEvents+0x1d3) [0x40fbc3]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(aeMain+0x2b) [0x40fe8b]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server(main+0x243) [0x40ec43]
[10700] 15 Mar 18:52:27 # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7224f6430d]
[10700] 15 Mar 18:52:27 # /usr/local/bin/redis-server() [0x40edb5]
[10700] 15 Mar 18:52:27 # --- INFO OUTPUT
[10700] 15 Mar 18:52:27 # # Server
redis_version:2.5.2
redis_git_sha1:749817b7
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.6.1
process_id:10700
run_id:659d42c88415c6d5ddc734e84c6c93017b94f9e8
tcp_port:10002
uptime_in_seconds:1625
uptime_in_days:0
lru_clock:1063178

# Clients
connected_clients:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:650392960
used_memory_human:620.26M
used_memory_rss:665554944
used_memory_peak:667271664
used_memory_peak_human:636.36M
used_memory_lua:25600
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-2.2.5

# Persistence
loading:0
aof_enabled:0
changes_since_last_save:6
bgsave_in_progress:0
last_save_time:1331836816
last_bgsave_status:ok
bgrewriteaof_in_progress:0

# Stats
total_connections_received:222
total_commands_processed:256
instantaneous_ops_per_sec:0
rejected_connections:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:582140

# Replication
role:master
connected_slaves:1
slave0:10.1.17.140,58100,online

# CPU
used_cpu_sys:5.90
used_cpu_user:23.27
used_cpu_sys_children:2.24
used_cpu_user_children:10.23

# Commandstats
cmdstat_hincrby:calls=6,usec=52,usec_per_call=8.67
cmdstat_ping:calls=29,usec=129,usec_per_call=4.45
cmdstat_sync:calls=1,usec=588879,usec_per_call=588879.00
cmdstat_info:calls=216,usec=29146,usec_per_call=134.94
cmdstat_slaveof:calls=2,usec=203,usec_per_call=101.50
cmdstat_config:calls=2,usec=63,usec_per_call=31.50

# Keyspace
db0:keys=3291330,expires=9554
hash_init_value: 1332503738

[10700] 15 Mar 18:52:27 # --- CLIENT LIST OUTPUT
[10700] 15 Mar 18:52:27 # addr=10.1.17.140:58100 fd=6 age=819 idle=1 flags=S db=0 sub=0 psub=0 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync

[10700] 15 Mar 18:52:27 # --- REGISTERS
[10700] 15 Mar 18:52:27 # 
RAX:0000000000000125 RBX:00007f7209dfc000
RCX:0000000000000001 RDX:0000000000000002
RDI:0000000000000000 RSI:00007f72240a1140
RBP:0000000000000002 RSP:00007ffffd8524b8
R8 :0000000000000001 R9 :0000000000000000
R10:000000000000051f R11:0000000000000001
R12:0000000000000005 R13:0000000000003b65
R14:0000000000000001 R15:0000000000000000
RIP:0000000000415e20 EFL:0000000000010202
CSGSFS:000000000000e033
[10700] 15 Mar 18:52:27 # (00007ffffd852530) -> 0000000000000008
[10700] 15 Mar 18:52:27 # (00007ffffd852528) -> 000000000040fbc3
[10700] 15 Mar 18:52:27 # (00007ffffd852520) -> 0000000000000001
[10700] 15 Mar 18:52:27 # (00007ffffd852518) -> 0000000000000000
[10700] 15 Mar 18:52:27 # (00007ffffd852510) -> 00007f722404c140
[10700] 15 Mar 18:52:27 # (00007ffffd852508) -> 00007f722404c100
[10700] 15 Mar 18:52:27 # (00007ffffd852500) -> 000000004f623a6b
[10700] 15 Mar 18:52:27 # (00007ffffd8524f8) -> 000000000040f42e
[10700] 15 Mar 18:52:27 # (00007ffffd8524f0) -> 00007f722404c150
[10700] 15 Mar 18:52:27 # (00007ffffd8524e8) -> 0000000000413b4d
[10700] 15 Mar 18:52:27 # (00007ffffd8524e0) -> 0000000000000280
[10700] 15 Mar 18:52:27 # (00007ffffd8524d8) -> 0000000000000005
[10700] 15 Mar 18:52:27 # (00007ffffd8524d0) -> 00007f722404c140
[10700] 15 Mar 18:52:27 # (00007ffffd8524c8) -> 00000000004128e3
[10700] 15 Mar 18:52:27 # (00007ffffd8524c0) -> 00007f7209dfc000
[10700] 15 Mar 18:52:27 # (00007ffffd8524b8) -> 00000000004127ed
[10700] 15 Mar 18:52:27 # 
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
@antirez
Owner
antirez commented Mar 15, 2012

This sounds like a real bug indeed, I think I introduced it with the recent query buffer resizing patch. Trying to underestand how it happens, please if you find a way to reproduce it ping me, thank you.

@antirez
Owner
antirez commented Mar 15, 2012

I think I understood what the problem is: just a question, what is the output of CONFIG GET timeout? Thanks.

@janoberst
Contributor

I moved a couple of commits back to version ecb7767f2ac3d1fe96b6e19fcfca5dc6fdd3089c.

It'll take ma a couple of minutes to reproduce the crash if it would help you.

I'm using timeout 30 in the config. I assume you're looking for the value in redis.conf (I haven't reset it with CONFIG SET.)

@antirez
Owner
antirez commented Mar 15, 2012

timeout 30 is the cause ;) Committing a fix.

@janoberst
Contributor

Awesome. Thank you!

@antirez antirez added a commit that referenced this issue Mar 15, 2012
@antirez Fix for issue #391.
Use a simple protocol between clientsCron() and helper functions to
understand if the client is still valind and clientsCron() should
continue processing or if the client was freed and we should continue
with the next one.
f1eaf57
@antirez antirez added a commit that referenced this issue Mar 15, 2012
@antirez Fix for issue #391.
Use a simple protocol between clientsCron() and helper functions to
understand if the client is still valind and clientsCron() should
continue processing or if the client was freed and we should continue
with the next one.
c9d3dda
@antirez
Owner
antirez commented Mar 15, 2012

Fixed, closing. Thank you a ton for reporting.

@antirez antirez closed this Mar 15, 2012
@janoberst
Contributor

Thank you for fixing it so promptly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment