-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyDB dying via SIGABORT #170
Comments
Hi @smartattack This assert wasn't setup to generate stack traces. I've update the code to ensure a proper debug report is created. Could you try updating to unstable so we can get the callstack? If you use docker the unstable image will be updated. |
That did the trick - trace attached: === KEYDB BUG REPORT START: Cut & paste starting from here === ------ STACK TRACE ------ Backtrace: ------ INFO OUTPUT ------ Serverredis_version:0.0.0 Clientsconnected_clients:124 Memoryused_memory:24334569568 Persistenceloading:0 Statstotal_connections_received:33745 Replicationrole:master CPUused_cpu_sys:75901.614487 ModulesCommandstatscmdstat_command:calls=1,usec=2269,usec_per_call=2269.00 Clustercluster_enabled:0 Keyspacedb0:keys=113355027,expires=113355025,avg_ttl=26339476 ------ CLIENT LIST OUTPUT ------ ------ REGISTERS ------ ------ MODULES INFO OUTPUT ------ ------ FAST MEMORY TEST ------ |
I am running into the same issue. Any updates on this? |
6.0 blocking |
To be a bit more verbose, I have been able to repro this with the test suite but its a rare condition. I'm still trying to track down the underlying cause. |
Just FYI I’m running 5.3.3 and this is happening on a high load server. |
Can you let me know which if any of the following features are used:
If you use both which is used more frequently? I've been running the test suite for the last 6 hours with my extra debugging logic and unfortunately the issue hasn't repro'd a second time. It's time to get more surgical. We know one side of the lock is free'ing the client due to disconnect, the remaining piece of the picture is who is waiting on the other side of the lock. |
I got it, not a race condition at all - ignore my questions above. The assert is incorrect in a very subtle way:
That equation relies upon m_avail wrapping as a uint16_t but the -1 promotes it to an integer. So when m_avail is 0 the comparison will be accidentally false. This situation could happen on average once in every 65536 unlocks. But under heavy load that can be hit relative quickly as you've experienced. |
This fix is currently in unstable and will be released in 6.0 |
KeyDB has been dying with a SIGABORT sporadically every few days. This is on a CentOS 7 system using systemd. Systemd appears to be hiding the error message, but when run from a terminal the output during the crash is:
-bash-4.2$ /usr/bin/keydb-server
/etc/keydb/keydb-tty.conf keydb-server: fastlock.cpp:436: void fastlock_free(fastlock*): Assertion `(lock->m_ticket.m_active == lock->m_ticket.m_avail) || (lock->m_pidOwner == gettid() && (lock->m_ticket.m_active == lock->m_ticket.m_avail-1))' failed.
There is no logfile entry to correspond with this crash.
Configuration is as follows:
bind 127.0.0.1 10.10.5.51
protected-mode no
port 6379
tcp-backlog 8192
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /var/run/keydb/keydb-server.pid
loglevel verbose
logfile /var/log/keydb/keydb-server.log
databases 2
always-show-logo yes
save 3600 100
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/keydb
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
maxmemory 100gb
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
server-threads 4
The text was updated successfully, but these errors were encountered: