Skip to content

redis rewriting AOF failed and crash #868

Closed
shuge opened this Issue Jan 9, 2013 · 5 comments

2 participants

@shuge
shuge commented Jan 9, 2013
...
[9561] 09 Jan 17:10:08.374 * Starting automatic rewriting of AOF on 100% growth
[9561] 09 Jan 17:10:08.552 * Background append only file rewriting started by pid 29812
[9561] 09 Jan 17:10:31.007 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:35.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:39.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:43.002 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:46.007 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[29812] 09 Jan 17:10:47.220 #


[29812] 09 Jan 17:10:47.537 # === ASSERTION FAILED ===
[29812] 09 Jan 17:10:47.539 # ==> t_hash.c:325 'vptr != NULL' is not true
[29812] 09 Jan 17:10:47.539 # (forcing SIGSEGV to print the bug report.)
[29812] 09 Jan 17:10:47.539 #     Redis 2.6.6 crashed by signal: 11
[29812] 09 Jan 17:10:47.539 #     Failed assertion: vptr != NULL (t_hash.c:325)
[29812] 09 Jan 17:10:47.539 # --- STACK TRACE
/usr/local/redis/bin/redis-server(logStackTrace+0x75)[0x440fb5]
/usr/local/redis/bin/redis-server(_redisAssert+0x6f)[0x440e2f]
/lib/libpthread.so.0(+0xf8f0)[0x7f0cdee938f0]
/usr/local/redis/bin/redis-server(_redisAssert+0x6f)[0x440e2f]
/usr/local/redis/bin/redis-server(hashTypeNext+0xe6)[0x438996]
/usr/local/redis/bin/redis-server(rewriteHashObject+0xd2)[0x43d772]
/usr/local/redis/bin/redis-server(rewriteAppendOnlyFile+0x1f4)[0x43f424]
/usr/local/redis/bin/redis-server(rewriteAppendOnlyFileBackground+0xc4)[0x43f7e4]
/usr/local/redis/bin/redis-server(serverCron+0x443)[0x41dce3]
/usr/local/redis/bin/redis-server(aeProcessEvents+0x202)[0x417802]
/usr/local/redis/bin/redis-server(aeMain+0x2b)[0x4179fb]
/usr/local/redis/bin/redis-server(main+0x23b)[0x41e6ab]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f0cdeb1fc4d]
/usr/local/redis/bin/redis-server[0x416dc9]
[29812] 09 Jan 17:10:47.810 # --- INFO OUTPUT
[29812] 09 Jan 17:10:47.810 # # Server
redis_version:2.6.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Linux 2.6.32-21-server x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.3
process_id:29812
run_id:321b8046be44120fd80746fddf64199ab0f13892
tcp_port:6386
uptime_in_seconds:786487
uptime_in_days:9
lru_clock:1554532

# Clients
connected_clients:38
client_longest_output_list:0
blocked_clients:0

# Memory
used_memory:5507789808
used_memory_human:5.13G
used_memory_rss:5633081344
used_memory_peak:5513067952
used_memory_peak_human:5.13G
used_memory_lua:31744
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.2.0

# Persistence
loading:0
rdb_changes_since_last_save:20450345
rdb_bgsave_in_progress:0
rdb_last_save_time:1357660856
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:55
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:68
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_current_size:11727211392
aof_base_size:5863605642
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:38

# Stats
total_connections_received:78996577
total_commands_processed:8553080268
instantaneous_ops_per_sec:14392
rejected_connections:0
expired_keys:0
evicted_keys:0

keyspace_hits:8137002949
keyspace_misses:22946621
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:219996

# Replication
role:master
connected_slaves:0

# CPU
used_cpu_sys:5.52
used_cpu_user:33.16
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
cmdstat_del:calls=2535263,usec=17922460,usec_per_call=7.07
cmdstat_lpush:calls=2648761,usec=11724381,usec_per_call=4.43
cmdstat_rpop:calls=3164942,usec=16855835,usec_per_call=5.33
cmdstat_llen:calls=91,usec=229,usec_per_call=2.52
cmdstat_hset:calls=384768500,usec=2597644930,usec_per_call=6.75
cmdstat_hget:calls=5979730156,usec=26494193464,usec_per_call=4.43
cmdstat_hgetall:calls=2180219323,usec=47638987054,usec_per_call=21.85
cmdstat_bgsave:calls=9,usec=1751970,usec_per_call=194663.33
cmdstat_info:calls=13223,usec=4085738,usec_per_call=308.99

# Keyspace
db0:keys=6377758,expires=0
hash_init_value: 1357679091

[29812] 09 Jan 17:10:47.810 # --- CLIENT LIST OUTPUT

...

[29812] 09 Jan 17:10:47.811 # --- REGISTERS
[29812] 09 Jan 17:10:47.811 #
RAX:0000000000000000 RBX:0000000000000145
RCX:0000000000016e40 RDX:00007f0cdee7fe98
RDI:0000000000000000 RSI:0000000000000000
RBP:00000000004991a0 RSP:00007fff0413a3c0
R8 :0000000001cb8e90 R9 :00000000ffffffff
R10:7562206568742074 R11:0000000000000206
R12:00000000004991ba R13:ffffffffffffffda
R14:0000000000000000 R15:0000000000000040
RIP:0000000000440e2f EFL:0000000000010202
CSGSFS:0000000000000033
[29812] 09 Jan 17:10:47.811 # (00007fff0413a438) -> 00007fff0413a4b0
[29812] 09 Jan 17:10:47.811 # (00007fff0413a430) -> ffffffffffffffff
[29812] 09 Jan 17:10:47.811 # (00007fff0413a428) -> 00007fff0413a470
[29812] 09 Jan 17:10:47.811 # (00007fff0413a420) -> 00007f0cde188300
[29812] 09 Jan 17:10:47.811 # (00007fff0413a418) -> 00007f0c77d0bf60
[29812] 09 Jan 17:10:47.811 # (00007fff0413a410) -> 00007f0bb1216e80
[29812] 09 Jan 17:10:47.811 # (00007fff0413a408) -> 00007fff0413a4b0
[29812] 09 Jan 17:10:47.811 # (00007fff0413a400) -> 00007f0bf1f29088
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3f8) -> 000000000043d772
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3f0) -> 00007f0b98870580
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3e8) -> 00007fff0413a470
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3e0) -> 0000000000000020
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3d8) -> 0000000000438996
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3d0) -> 00007f0c6abe5800
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3c8) -> 00007f0c6abf0b27
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3c0) -> 00007f0b98870580
[29812] 09 Jan 17:10:47.812 #

@antirez
Owner
antirez commented Jan 9, 2013

From the bug report, it is either an issue with the hardware (like a memory error), or a subtle bug in the ziplist implementation.

A few things:

  • Please could you test the system memory and report back?
  • In general the system appears to work well? It is the first time you get a crash?

If system is ok it's worth investigating the dump of registers and the stack to see if there is some clue about the values that triggered this issue.

Thanks for reporting

@antirez
Owner
antirez commented Jan 9, 2013

Sorry another question.

From your bug report it seems like only the child crashed, so is the instance actually still running?

In that case, does BGSAVE successfully produce an RDB file, or the child crashes again? Thanks.

@shuge
shuge commented Jan 16, 2013

Please could you test the system memory and report back?

It's a production box, I can't stop current redis and other k-v services for testing,
however, memcached and the old version redis (v2.4.x) is stable and works well on that box for a couple of years.

In general the system appears to work well? It is the first time you get a crash?

I have never see this in v2.4.x, I got first time it crashes after I upgrade it to 2.6.x.

does BGSAVE successfully produce an RDB file, or the child crashes again?

Our engineer write some scripts to monitor and autorestart redis service,
a script try to start redis daemon process after it crashes, I aware it AUTO does rewrite aof,
but it rewrite it failed, then it does it again and again.

Finally, I kill it by SIGKILL, remove all auto-rewrite temporary files, do redis-check-aof for foo.aof,
start it by manual and it works again.

@antirez
Owner
antirez commented Jan 21, 2013

Please could you report if this happened again after the first time? Thanks.

@antirez
Owner
antirez commented Jul 17, 2013

Can't act for lack of info, closing. Thanks for reporting, please reopen if it happens with a recent Redis 2.6.x release.

@antirez antirez closed this Jul 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.