Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Crash under high load (2.6.5 and 2.6.6) - handleClientsBlockedOnLists #801

Closed
foxx opened this Issue · 19 comments

3 participants

@foxx

Hopefully this crash report means more to you than it does to me.

If you need any further info or debugging or testing, I'll be more than happy to lend a hand!!

Also, thank you for your hard work on Redis, amazing piece of kit :)

I also ran redis-server --test-memory with no problem.

The core dump can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/core.1432.gz
(it is 713M in size, compressed to 126M, and contains no sensitive data)

The redis binary can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/redis-server-1432

Here is the gdb output;

GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/redis-server...(no debugging symbols found)...done.
Attaching to program: /usr/bin/redis-server, process 1432
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7f1a727fe700 (LWP 1434)]
[New Thread 0x7f1a72fff700 (LWP 1433)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f1a73a26f23 in epoll_wait () from /lib/libc.so.6
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000430672 in handleClientsBlockedOnLists ()
(gdb) bt
#0  0x0000000000430672 in handleClientsBlockedOnLists ()
#1  0x000000000041b116 in processCommand ()
#2  0x00000000004259df in processInputBuffer ()
#3  0x0000000000425af0 in readQueryFromClient ()
#4  0x0000000000416cec in aeProcessEvents ()
#5  0x0000000000416fab in aeMain ()
#6  0x000000000041d25f in main ()
(gdb)


(gdb) info registers
rax            0x0      0
rbx            0x0      0
rcx            0x20     32
rdx            0x7f1a730114c0   139751575327936
rsi            0x7f1a46c00000   139750832865280
rdi            0x7f1a73006ae8   139751575284456
rbp            0x7f1a59daf000   0x7f1a59daf000
rsp            0x7fff3a5feb40   0x7fff3a5feb40
r8             0xb67    2919
r9             0x3cd    973
r10            0xfffffffffffffff8       -8
r11            0x206    518
r12            0x7f1a46fd3f00   139750836879104
r13            0x1      1
r14            0x7f1a46fd3f40   139750836879168
r15            0x1      1
rip            0x430672 0x430672 <handleClientsBlockedOnLists+306>
eflags         0x10293  [ CF AF SF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x37f    895
fstat          0x20     32
ftag           0xffff   65535
fiseg          0x7f1a   32538
fioff          0x7399ec2d       1939467309
foseg          0x7fff   32767
fooff          0x3a5fdf50       979361616
fop            0x0      0
mxcsr          0x1fa5   [ IE ZE PE IM DM ZM OM UM PM ]

Here is some info about the server;


$ uname -a
Linux test01.internal 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux

$ cat /etc/issue
Debian GNU/Linux 6.0 \n \l

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8)

$ apt-cache showpkg redis-server
Package: redis-server
Versions:
2:2.6.5-1~dotdeb.0 (/var/lib/apt/lists/php53.dotdeb.org_dists_stable_all_binary-amd64_Packages) (/var/lib/apt/lists/packages.dotdeb.org_dists_squeeze_all_binary-amd64_Packages) (/var/lib/dpkg/status)
 Description Language:
                 File: /var/lib/apt/lists/php53.dotdeb.org_dists_stable_all_binary-amd64_Packages
                  MD5: 9160ed1405585ab844f8750a9305d33f

2:1.2.6-1 (/var/lib/apt/lists/ftp.us.debian.org_debian_dists_squeeze_main_binary-amd64_Packages)
 Description Language:
                 File: /var/lib/apt/lists/ftp.us.debian.org_debian_dists_squeeze_main_binary-amd64_Packages
                  MD5: 9160ed1405585ab844f8750a9305d33f


Reverse Depends:
Dependencies:
2:2.6.5-1~dotdeb.0 - libc6 (2 2.7) adduser (0 (null))
2:1.2.6-1 - libc6 (2 2.7) adduser (0 (null))
Provides:
2:2.6.5-1~dotdeb.0 -
2:1.2.6-1 -
Reverse Provides:

Here is the redis crash dump from logs;

[1432] 01 Dec 05:39:22.420 # Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 992.
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.6.5 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1432
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

[1432] 01 Dec 05:39:22.421 # Server started, Redis version 2.6.5
[1432] 01 Dec 05:39:22.421 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[1432] 01 Dec 05:39:23.382 * DB loaded from disk: 0.961 seconds
[1432] 01 Dec 05:39:23.382 * The server is now ready to accept connections on port 6379
[1432] 01 Dec 05:40:23.006 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:40:23.008 * Background saving started by pid 1920
[1920] 01 Dec 05:40:27.962 * DB saved on disk
[1920] 01 Dec 05:40:27.963 * RDB: 39 MB of memory used by copy-on-write
[1432] 01 Dec 05:40:27.966 * Background saving terminated with success
[1432] 01 Dec 05:41:28.007 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:41:28.009 * Background saving started by pid 2416
[2416] 01 Dec 05:41:34.776 * DB saved on disk
[2416] 01 Dec 05:41:34.779 * RDB: 41 MB of memory used by copy-on-write
[1432] 01 Dec 05:41:34.789 * Background saving terminated with success
[1432] 01 Dec 05:42:35.002 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:42:35.008 * Background saving started by pid 2858
[2858] 01 Dec 05:42:43.990 * DB saved on disk
[2858] 01 Dec 05:42:43.994 * RDB: 87 MB of memory used by copy-on-write
[1432] 01 Dec 05:42:44.024 * Background saving terminated with success
[1432] 01 Dec 05:43:45.001 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:43:45.006 * Background saving started by pid 3299
[3299] 01 Dec 05:43:55.500 * DB saved on disk
[3299] 01 Dec 05:43:55.506 * RDB: 90 MB of memory used by copy-on-write
[1432] 01 Dec 05:43:55.521 * Background saving terminated with success
[1432] 01 Dec 05:44:56.003 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:44:56.008 * Background saving started by pid 3730
[3730] 01 Dec 05:45:07.539 * DB saved on disk
[3730] 01 Dec 05:45:07.545 * RDB: 89 MB of memory used by copy-on-write
[1432] 01 Dec 05:45:07.561 * Background saving terminated with success
[1432] 01 Dec 05:46:08.003 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:46:08.009 * Background saving started by pid 4189
[4189] 01 Dec 05:46:21.364 * DB saved on disk
[4189] 01 Dec 05:46:21.368 * RDB: 65 MB of memory used by copy-on-write
[1432] 01 Dec 05:46:21.389 * Background saving terminated with success
[1432] 01 Dec 05:47:22.000 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:47:22.009 * Background saving started by pid 4640
[4640] 01 Dec 05:47:36.094 * DB saved on disk
[4640] 01 Dec 05:47:36.103 * RDB: 71 MB of memory used by copy-on-write
[1432] 01 Dec 05:47:36.127 * Background saving terminated with success
[1432] 01 Dec 05:48:37.008 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:48:37.018 * Background saving started by pid 5091
[5091] 01 Dec 05:48:52.222 * DB saved on disk
[5091] 01 Dec 05:48:52.232 * RDB: 161 MB of memory used by copy-on-write
[1432] 01 Dec 05:48:52.255 * Background saving terminated with success
[1432] 01 Dec 05:49:53.009 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:49:53.027 * Background saving started by pid 5561
[5561] 01 Dec 05:50:08.434 * DB saved on disk
[5561] 01 Dec 05:50:08.442 * RDB: 158 MB of memory used by copy-on-write
[1432] 01 Dec 05:50:08.469 * Background saving terminated with success
[1432] 01 Dec 05:51:09.004 * 10000 changes in 60 seconds. Saving...
[1432] 01 Dec 05:51:09.018 * Background saving started by pid 6058
[6058] 01 Dec 05:51:26.793 * DB saved on disk
[6058] 01 Dec 05:51:26.801 * RDB: 72 MB of memory used by copy-on-write
[1432] 01 Dec 05:54:32.694 #

=== REDIS BUG REPORT START: Cut & paste starting from here ===
[1432] 01 Dec 05:54:32.694 #     Redis 2.6.5 crashed by signal: 11
[1432] 01 Dec 05:54:32.694 #     Failed assertion: <no assertion failed> (<no file>:0)
[1432] 01 Dec 05:54:32.694 # --- STACK TRACE
/usr/bin/redis-server(logStackTrace+0x75)[0x43f9f5]
/usr/bin/redis-server(handleClientsBlockedOnLists+0x132)[0x430672]
/lib/libpthread.so.0(+0xeff0)[0x7f1a73cc7ff0]
/usr/bin/redis-server(handleClientsBlockedOnLists+0x132)[0x430672]
/usr/bin/redis-server(processCommand+0x326)[0x41b116]
/usr/bin/redis-server(processInputBuffer+0x4f)[0x4259df]
/usr/bin/redis-server(readQueryFromClient+0xa0)[0x425af0]
/usr/bin/redis-server(aeProcessEvents+0x13c)[0x416cec]
/usr/bin/redis-server(aeMain+0x2b)[0x416fab]
/usr/bin/redis-server(main+0x24f)[0x41d25f]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f1a73975c8d]
/usr/bin/redis-server[0x416309]
[1432] 01 Dec 05:54:32.695 # --- INFO OUTPUT
[1432] 01 Dec 05:54:32.695 # # Server
redis_version:2.6.5
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Linux 2.6.32-5-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:1432
run_id:30b17198e464c30500830a5f53874524ee4d48e9
tcp_port:6379
uptime_in_seconds:711
uptime_in_days:0
lru_clock:1216379

# Clients
connected_clients:12
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:702887224
used_memory_human:670.33M
used_memory_rss:718589952
used_memory_peak:703121600
used_memory_peak_human:670.55M
used_memory_lua:31744
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.0.0

# Persistence
loading:0
rdb_changes_since_last_save:857865
rdb_bgsave_in_progress:1
rdb_last_save_time:1354341008
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:15
rdb_current_bgsave_time_sec:203
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok

# Stats
total_connections_received:486
total_commands_processed:8957227
instantaneous_ops_per_sec:12254
rejected_connections:0
expired_keys:0
evicted_keys:0
keyspace_hits:1194351
keyspace_misses:118807
pubsub_channels:2
pubsub_patterns:0
latest_fork_usec:14382

# Replication
role:master
connected_slaves:0

# CPU
used_cpu_sys:325.08
used_cpu_user:31.38
used_cpu_sys_children:4.12
used_cpu_user_children:32.77

# Commandstats
cmdstat_get:calls=1181,usec=8775,usec_per_call=7.43
cmdstat_set:calls=563,usec=4747,usec_per_call=8.43
cmdstat_setnx:calls=12,usec=52,usec_per_call=4.33
cmdstat_del:calls=18,usec=83,usec_per_call=4.61
cmdstat_exists:calls=538090,usec=3245347,usec_per_call=6.03
cmdstat_lpush:calls=514190,usec=2929716,usec_per_call=5.70
cmdstat_lpop:calls=295476,usec=2980988,usec_per_call=10.09
cmdstat_blpop:calls=1,usec=7,usec_per_call=7.00
cmdstat_sadd:calls=515977,usec=3480924,usec_per_call=6.75
cmdstat_smove:calls=295475,usec=2408266,usec_per_call=8.15
cmdstat_sismember:calls=523213,usec=3470256,usec_per_call=6.63
cmdstat_scard:calls=114048,usec=511604,usec_per_call=4.49
cmdstat_smembers:calls=5948,usec=80863,usec_per_call=13.59
cmdstat_hset:calls=2210398,usec=19508953,usec_per_call=8.83
cmdstat_hsetnx:calls=809663,usec=6495547,usec_per_call=8.02
cmdstat_hget:calls=647976,usec=4955075,usec_per_call=7.65
cmdstat_hmset:calls=1776,usec=15927,usec_per_call=8.97
cmdstat_hincrby:calls=809665,usec=8414183,usec_per_call=10.39
cmdstat_hdel:calls=1332,usec=8764,usec_per_call=6.58
cmdstat_hkeys:calls=1784,usec=13908,usec_per_call=7.80
cmdstat_hgetall:calls=19008,usec=115967,usec_per_call=6.10
cmdstat_multi:calls=825713,usec=4789939,usec_per_call=5.80
cmdstat_exec:calls=825713,usec=54336230,usec_per_call=65.81
cmdstat_info:calls=1,usec=122,usec_per_call=122.00
cmdstat_subscribe:calls=6,usec=46,usec_per_call=7.67

# Keyspace
db0:keys=32,expires=0
hash_init_value: 1354744661

[1432] 01 Dec 05:54:32.695 # --- CLIENT LIST OUTPUT
[1432] 01 Dec 05:54:32.695 # addr=127.0.0.1:53710 fd=5 age=328 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=90 oll=0 omem=0 events=rw cmd=exec
addr=127.0.0.1:53714 fd=6 age=327 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53715 fd=7 age=327 idle=327 flags=N db=0 sub=2 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe
addr=127.0.0.1:53716 fd=8 age=327 idle=0 flags=u db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=96 oll=0 omem=0 events=rw cmd=blpop
addr=127.0.0.1:53717 fd=9 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53718 fd=10 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53719 fd=11 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53720 fd=12 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53721 fd=13 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53722 fd=14 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53723 fd=15 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:53724 fd=16 age=327 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset

[1432] 01 Dec 05:54:32.695 # --- CURRENT CLIENT INFO
[1432] 01 Dec 05:54:32.695 # client: addr=127.0.0.1:53710 fd=5 age=328 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=90 oll=0 omem=0 events=rw cmd=exec
[1432] 01 Dec 05:54:32.695 # argv[0]: 'EXEC'
[1432] 01 Dec 05:54:32.695 # --- REGISTERS
[1432] 01 Dec 05:54:32.695 #
RAX:0000000000000000 RBX:0000000000000000
RCX:0000000000000020 RDX:00007f1a730114c0
RDI:00007f1a73006ae8 RSI:00007f1a46c00000
RBP:00007f1a59daf000 RSP:00007fff3a5feb40
R8 :0000000000000b67 R9 :00000000000003cd
R10:fffffffffffffff8 R11:0000000000000206
R12:00007f1a46fd3f00 R13:0000000000000001
R14:00007f1a46fd3f40 R15:0000000000000001
RIP:0000000000430672 EFL:0000000000010293
CSGSFS:0000000000000033
[1432] 01 Dec 05:54:32.695 # (00007fff3a5febb8) -> 00007f1a5682b000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5febb0) -> 0000000000004000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feba8) -> 000000000041b116
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feba0) -> 0000000000000005
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb98) -> 0000000000000001
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb90) -> 0000000000004000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb88) -> 0000000000000000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb80) -> 0000000000000000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb78) -> 00007f1a5682b000
[1432] 01 Dec 05:54:32.695 # (00007fff3a5feb70) -> 00007f1a46fd3f18
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb68) -> 0000000000000004
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb60) -> 00007f1a730114c0
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb58) -> 00007f1a46fd3f50
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb50) -> 00007f1a46fe2920
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb48) -> 00007f1a73011130
[1432] 01 Dec 05:54:32.696 # (00007fff3a5feb40) -> 0000000000000007
[1432] 01 Dec 05:54:32.696 #
=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash opening an issue on github:

           http://github.com/antirez/redis/issues

  Suspect RAM error? Use redis-server --test-memory to veryfy it.

@foxx

Just tried this with 2.6.6, same thing happened.

This time I used "make noopt" and got a bit more info and a smaller core dump.

Core can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/core.19418.gz

Redis binary can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/redis-server-19418

gdb output below;

Program received signal SIGSEGV, Segmentation fault.
0x0000000000438c1d in handleClientsBlockedOnLists () at t_list.c:968
968                             redisClient *receiver = clientnode->value;
(gdb) bt
#0  0x0000000000438c1d in handleClientsBlockedOnLists () at t_list.c:968
#1  0x000000000041e248 in processCommand (c=0x7f7683569000) at redis.c:1678
#2  0x000000000042a7cb in processInputBuffer (c=0x7f7683569000) at networking.c:1007
#3  0x000000000042aa73 in readQueryFromClient (el=0x7f7683455150, fd=5, privdata=0x7f7683569000, mask=1) at networking.c:1070
#4  0x0000000000418642 in aeProcessEvents (eventLoop=0x7f7683455150, flags=3) at ae.c:378
#5  0x00000000004187d7 in aeMain (eventLoop=0x7f7683455150) at ae.c:421
#6  0x000000000042097d in main (argc=2, argv=0x7ffff12e4db8) at redis.c:2621
(gdb)

(gdb) info registers
rax            0x0      0
rbx            0x7f7680cde3e0   140146943845344
rcx            0x20     32
rdx            0x42     66
rsi            0x7f7680c00000   140146942935040
rdi            0x7f7683406ae8   140146984905448
rbp            0x7ffff12e4b00   0x7ffff12e4b00
rsp            0x7ffff12e4a90   0x7ffff12e4a90
r8             0x288    648
r9             0xd8     216
r10            0xfffffffffffffff8       -8
r11            0x206    518
r12            0x416e80 4288128
r13            0x7ffff12e4db0   140737239731632
r14            0x0      0
r15            0x0      0
rip            0x438c1d 0x438c1d <handleClientsBlockedOnLists+242>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x37f    895
fstat          0x20     32
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x83f01c2d       -2081416147
foseg          0x0      0
fooff          0xf12e3e50       -248627632
fop            0x0      0
mxcsr          0x1f85   [ IE ZE IM DM ZM OM UM PM ]

Log output here;

                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.6.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 19418
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

[19418] 01 Dec 06:14:47.026 # Server started, Redis version 2.6.6
[19418] 01 Dec 06:14:47.026 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[19418] 01 Dec 06:14:47.026 * The server is now ready to accept connections on port 6379
[19418] 01 Dec 06:16:20.926 #

=== REDIS BUG REPORT START: Cut & paste starting from here ===
[19418] 01 Dec 06:16:20.926 #     Redis 2.6.6 crashed by signal: 11
[19418] 01 Dec 06:16:20.926 #     Failed assertion: <no assertion failed> (<no file>:0)
[19418] 01 Dec 06:16:20.926 # --- STACK TRACE
src/redis-server(logStackTrace+0x67)[0x44de45]
src/redis-server(handleClientsBlockedOnLists+0xf2)[0x438c1d]
/lib/libpthread.so.0(+0xeff0)[0x7f768422aff0]
src/redis-server(handleClientsBlockedOnLists+0xf2)[0x438c1d]
src/redis-server(processCommand+0x582)[0x41e248]
src/redis-server(processInputBuffer+0xef)[0x42a7cb]
src/redis-server(readQueryFromClient+0x277)[0x42aa73]
src/redis-server(aeProcessEvents+0x25a)[0x418642]
src/redis-server(aeMain+0x48)[0x4187d7]
src/redis-server(main+0x494)[0x42097d]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f7683ed8c8d]
src/redis-server[0x416ea9]
[19418] 01 Dec 06:16:20.927 # --- INFO OUTPUT
[19418] 01 Dec 06:16:20.927 # # Server
redis_version:2.6.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Linux 2.6.32-5-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:19418
run_id:7d8181770394b9465c238919f545420679780b8d
tcp_port:6379
uptime_in_seconds:39
uptime_in_days:0
lru_clock:1216524

# Clients
connected_clients:12
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:21241896
used_memory_human:20.26M
used_memory_rss:22994944
used_memory_peak:21476072
used_memory_peak_human:20.48M
used_memory_lua:31744
mem_fragmentation_ratio:1.08
mem_allocator:jemalloc-3.2.0

# Persistence
loading:0
rdb_changes_since_last_save:238091
rdb_bgsave_in_progress:0
rdb_last_save_time:1354342487
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok

# Stats
total_connections_received:35
total_commands_processed:327681
instantaneous_ops_per_sec:13759
rejected_connections:0
expired_keys:0
evicted_keys:0
keyspace_hits:43539
keyspace_misses:6641
pubsub_channels:2
pubsub_patterns:0
latest_fork_usec:0

# Replication
role:master
connected_slaves:0

# CPU
used_cpu_sys:12.16
used_cpu_user:1.47
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
cmdstat_get:calls=53,usec=386,usec_per_call=7.28
cmdstat_set:calls=27,usec=214,usec_per_call=7.93
cmdstat_setnx:calls=4,usec=20,usec_per_call=5.00
cmdstat_exists:calls=19838,usec=131842,usec_per_call=6.65
cmdstat_lpush:calls=18653,usec=125151,usec_per_call=6.71
cmdstat_lpop:calls=10660,usec=119645,usec_per_call=11.22
cmdstat_blpop:calls=1,usec=51,usec_per_call=51.00
cmdstat_sadd:calls=18730,usec=137456,usec_per_call=7.34
cmdstat_srem:calls=20,usec=127,usec_per_call=6.35
cmdstat_smove:calls=10656,usec=97600,usec_per_call=9.16
cmdstat_sismember:calls=19121,usec=143242,usec_per_call=7.49
cmdstat_scard:calls=5592,usec=25518,usec_per_call=4.56
cmdstat_smembers:calls=282,usec=4159,usec_per_call=14.75
cmdstat_hset:calls=79927,usec=748054,usec_per_call=9.36
cmdstat_hsetnx:calls=29305,usec=259255,usec_per_call=8.85
cmdstat_hget:calls=24108,usec=192825,usec_per_call=8.00
cmdstat_hmset:calls=96,usec=937,usec_per_call=9.76
cmdstat_hincrby:calls=29305,usec=278862,usec_per_call=9.52
cmdstat_hdel:calls=69,usec=493,usec_per_call=7.14
cmdstat_hkeys:calls=92,usec=627,usec_per_call=6.82
cmdstat_hgetall:calls=932,usec=6206,usec_per_call=6.66
cmdstat_multi:calls=30104,usec=177231,usec_per_call=5.89
cmdstat_exec:calls=30104,usec=2045129,usec_per_call=67.94
cmdstat_subscribe:calls=2,usec=70,usec_per_call=35.00

# Keyspace
db0:keys=32,expires=0
hash_init_value: 1354347558

[19418] 01 Dec 06:16:20.927 # --- CLIENT LIST OUTPUT
[19418] 01 Dec 06:16:20.927 # addr=127.0.0.1:57012 fd=5 age=25 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=89 oll=0 omem=0 events=rw cmd=exec
addr=127.0.0.1:57016 fd=6 age=23 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57017 fd=7 age=23 idle=23 flags=N db=0 sub=2 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe
addr=127.0.0.1:57018 fd=8 age=23 idle=0 flags=u db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=96 oll=0 omem=0 events=rw cmd=blpop
addr=127.0.0.1:57021 fd=9 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57022 fd=10 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57023 fd=11 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57024 fd=12 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57025 fd=13 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57026 fd=14 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57027 fd=15 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset
addr=127.0.0.1:57028 fd=16 age=22 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hset

[19418] 01 Dec 06:16:20.927 # --- CURRENT CLIENT INFO
[19418] 01 Dec 06:16:20.927 # client: addr=127.0.0.1:57012 fd=5 age=25 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=89 oll=0 omem=0 events=rw cmd=exec
[19418] 01 Dec 06:16:20.927 # argv[0]: 'EXEC'
[19418] 01 Dec 06:16:20.927 # --- REGISTERS
[19418] 01 Dec 06:16:20.927 #
RAX:0000000000000000 RBX:00007f7680cde3e0
RCX:0000000000000020 RDX:0000000000000042
RDI:00007f7683406ae8 RSI:00007f7680c00000
RBP:00007ffff12e4b00 RSP:00007ffff12e4a90
R8 :0000000000000288 R9 :00000000000000d8
R10:fffffffffffffff8 R11:0000000000000206
R12:0000000000416e80 R13:00007ffff12e4db0
R14:0000000000000000 R15:0000000000000000
RIP:0000000000438c1d EFL:0000000000010202
CSGSFS:0000000000000033
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4b08) -> 000000000041e248
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4b00) -> 00007ffff12e4b30
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4af8) -> 0000000000000005
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4af0) -> 00007f7680d122b8
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ae8) -> 00007f7680cde530
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ae0) -> 0000000000000003
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ad8) -> 0000000000000000
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ad0) -> 00007f76836a8000
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ac8) -> 0000000000000000
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ac0) -> 0000000280cde548
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ab8) -> 00007f7683771190
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4ab0) -> 00007f76834a4460
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4aa8) -> 00007f7680cde590
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4aa0) -> 00007f7680cde560
[19418] 01 Dec 06:16:20.927 # (00007ffff12e4a98) -> 00007f7680cd7f00
[19418] 01 Dec 06:16:20.928 # (00007ffff12e4a90) -> 00007f7683411130
[19418] 01 Dec 06:16:20.928 #
=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash opening an issue on github:

           http://github.com/antirez/redis/issues

  Suspect RAM error? Use redis-server --test-memory to veryfy it.



@foxx

From what I can tell, it seems to relate to this commit ("reimplementation of blocking operation internals")
from 2 months ago;
7eb850e

I really don't know enough about C to even begin to guess what's happening, but I did see the commit said that it would be "tested with care", so I'm guessing this change was of some risk..?

Either way, any feedback would be very much appreciated.

Thanks!

Cal

@charsyam

@foxx Could you show me how to reproduce this crash?

@foxx

@charsyam Essentially, you just need to have a bunch of threads running blpop() across 4 or more lists, and then another thread executing rpush() on each of those lists;

In python, it'd be something like;

# sender (have this running in 4 different procs or threads)
for x in range(50000):
    redis.rpush('list1', random.random())
    redis.rpush('list2', random.random())
    redis.rpush('list3', random.random())
    redis.rpush('list4', random.random())
# receiver (have this running in 1 proc or thread)
while True:
    q = ['list1', 'list2', 'list3', 'list4']
    timeout = 0
    redis.blpop(q, timeout)
@antirez
Owner

This must be the best bug report we receive in three years (seriously). Thanks @foxx :-)

Time for a breakfast and investigating as soon as a saturday for a family guy makes it possible.

Btw it is not a big surprise for me that there are bugs in this part of Redis 2.6 as the blocking operations engine was partially rewrote. Apparently we need to insert a stress tester like this in our test suite.

Cheers

@foxx

Glad I could help! :)

Something really strange, I just created a unit test for reproducing this crash, and blpop/rpush on its own doesn't seem to be enough to trigger the problem - so I'm going to try including some other calls. If I can't accurately re-produce, then I'll email you our python source and two simple shell scripts to run it. Will update shortly

@charsyam

@foxx maybe you used multi/exec and it can trigger this crash with blpop.

@foxx

@charsyam Just tried this without the multi/exec on rpush(), and the problem still happens. Just to confirm, blpop() is not running inside a multi/exec.

@antirez I've just tried to reproduce this error with single scripts, and annoyingly I can't seem to make it happen. Therefore, I have put together a small python package with some run instructions (really simple, less than 60 seconds) - this should be in your inbox now.

I've done a test run of this module using the exact same instructions against 2.6.5/2.6.6, and the crash happened within about 2 minutes.

@antirez
Owner

Ok working on it right now.

@foxx

I have tried replacing out the blpop() with a bunch of pipelined llen and then a single lpop, and this does not seem to crash.

@antirez antirez referenced this issue from a commit
@antirez Client should not block multiple times on the same key.
Sending a command like:

BLPOP foo foo foo foo 0

Resulted into a crash before this commit since the client ended being
inserted in the waiting list for this key multiple times.
This resulted into the function handleClientsBlockedOnLists() to fail
because we have code like that:

    if (de) {
        list *clients = dictGetVal(de);
        int numclients = listLength(clients);

        while(numclients--) {
            listNode *clientnode = listFirst(clients);

            /* server clients here... */
        }
    }

The code to serve clients used to remove the served client from the
waiting list, so if a client is blocking multiple times, eventually the
call to listFirst() will return NULL or worse will access random memory
since the list may no longer exist as it is removed by the function
unblockClientWaitingData() if there are no more clients waiting for this
list.

To avoid making the rest of the implementation more complex, this commit
modifies blockForKeys() so that a client will be put just a single time
into the waiting list for a given key.

Since it is Saturday, I hope this fixes issue #801.
cac49a9
@antirez
Owner

Hopefully the "issue-801" branch should fix it, please @foxx could you test it?

Btw if this is the case, for some reason your queue implementation ends in some way adding the same key in the argument for BLPOP multiple times, like in:

BLPOP key1 key2 key3 key1 key1 0

That should be allowed indeed, but was not handled correctly.

Waiting for feedbacks :-) Thanks!

@antirez
Owner

@foxx p.s. about the HSETNX issue, please could you open a new issue reporting exactly what happens? From the short description I more or less gasp what happens but not enough to do more tests. Thanks!

@foxx

Thanks for your fast work on this (sorry for the slow reply, was absolutely shattered and went to bed in the end lol)

I have good news and bad news.

The bad news is that it's still broken (crashes same as before) against 'issue-801' branch with your fixes.

The good news is that you were right about the duplicate keys, and I was able to construct two small test scripts that will accurately reproduce the problem.

The gdb/core/log/binary can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/debug-11160.zip

The scripts to reproduce the bug can be downloaded here;

http://blog.simplicitymedialtd.co.uk/wp-content/801-reproduce.zip

Although our own short term fix will be to de-duplicate the keys, I'm more than happy to help continue testing and providing info to get it fixed - just wish I knew more about C!

@foxx

Also, in reference to the regression test commit - it seems that this problem only happens when two or more threads are attempting to push and blpop at the same time, with blpop against duplicate keys. I was successfully able to run the 'receiver.py' file from the 801-reproduce without crashing, but as soon as I ran sender.py, it crashed immediately.

And in ref to the HSETNX problem, I'll put together two scripts that will accurately reproduce the problem and submit a separate ticket

@charsyam

@foxx I have a question. I tried to reproduce it, but I couldn't.
so I checked your binary with objdump. at that time, I found out something wrong.
in patch version.
there are some symbols( dictCreate, dictRelease )
but your binary doesn't have them.
so I compared your binaries 19418 and 11160
and the result is the same.
So I think you just compile the same source again.

Could you check your binary?
thank you.

@foxx

@charsyam Damn it, you're right - well spotted.

I really should have double checked before reporting this in, my sincere apologies for the time wasting.

Just recompiled with the correct branch (issue-801) and I can now confirm it is stable with no crashes after 5 minutes!

@antirez and @charsyam - thank you once again, it has been an absolute pleasure working with you guys :)

@charsyam

@foxx @antirez me, too Thank you for your reporting. I have learned from antirez's patch and your reporting. I could understand redis internal more!. Thank you again.

@antirez
Owner

@foxx @charsyam thanks you both a lot! It's cool to know the issue is fixed.

I'll merge this ASAP into 2.6 and release a new patch version, however, I'm unsure about the final layout of the fix, maybe I'll modify a bit the code at least for 2.8 / unstable because there are better ways now that we need to avoid duplications in the queue list to organize the code.

Taking the issue open since it's not merged into the actual development branches.

Cheers,
Salvatore

@antirez antirez referenced this issue from a commit
@antirez Blocking POP: use a dictionary to store keys clinet side.
To store the keys we block for during a blocking pop operation, in the
case the client is blocked for more data to arrive, we used a simple
linear array of redis objects, in the blockingState structure:

    robj **keys;
    int count;

However in order to fix issue #801 we also use a dictionary in order to
avoid to end in the blocked clients queue for the same key multiple
times with the same client.

The dictionary was only temporary, just to avoid duplicates, but since
we create / destroy it there is no point in doing this duplicated work,
so this commit simply use a dictionary as the main structure to store
the keys we are blocked for. So instead of the previous fields we now
just have:

    dict *keys;

This simplifies the code and reduces the work done by the server during
a blocking POP operation.
54b08c8
@antirez antirez referenced this issue from a commit
@antirez Client should not block multiple times on the same key.
Sending a command like:

BLPOP foo foo foo foo 0

Resulted into a crash before this commit since the client ended being
inserted in the waiting list for this key multiple times.
This resulted into the function handleClientsBlockedOnLists() to fail
because we have code like that:

    if (de) {
        list *clients = dictGetVal(de);
        int numclients = listLength(clients);

        while(numclients--) {
            listNode *clientnode = listFirst(clients);

            /* server clients here... */
        }
    }

The code to serve clients used to remove the served client from the
waiting list, so if a client is blocking multiple times, eventually the
call to listFirst() will return NULL or worse will access random memory
since the list may no longer exist as it is removed by the function
unblockClientWaitingData() if there are no more clients waiting for this
list.

To avoid making the rest of the implementation more complex, this commit
modifies blockForKeys() so that a client will be put just a single time
into the waiting list for a given key.

Since it is Saturday, I hope this fixes issue #801.
4e6dd7b
@antirez antirez referenced this issue from a commit
@antirez Blocking POP: use a dictionary to store keys clinet side.
To store the keys we block for during a blocking pop operation, in the
case the client is blocked for more data to arrive, we used a simple
linear array of redis objects, in the blockingState structure:

    robj **keys;
    int count;

However in order to fix issue #801 we also use a dictionary in order to
avoid to end in the blocked clients queue for the same key multiple
times with the same client.

The dictionary was only temporary, just to avoid duplicates, but since
we create / destroy it there is no point in doing this duplicated work,
so this commit simply use a dictionary as the main structure to store
the keys we are blocked for. So instead of the previous fields we now
just have:

    dict *keys;

This simplifies the code and reduces the work done by the server during
a blocking POP operation.
2f87cf8
@antirez antirez referenced this issue from a commit
@antirez Client should not block multiple times on the same key.
Sending a command like:

BLPOP foo foo foo foo 0

Resulted into a crash before this commit since the client ended being
inserted in the waiting list for this key multiple times.
This resulted into the function handleClientsBlockedOnLists() to fail
because we have code like that:

    if (de) {
        list *clients = dictGetVal(de);
        int numclients = listLength(clients);

        while(numclients--) {
            listNode *clientnode = listFirst(clients);

            /* server clients here... */
        }
    }

The code to serve clients used to remove the served client from the
waiting list, so if a client is blocking multiple times, eventually the
call to listFirst() will return NULL or worse will access random memory
since the list may no longer exist as it is removed by the function
unblockClientWaitingData() if there are no more clients waiting for this
list.

To avoid making the rest of the implementation more complex, this commit
modifies blockForKeys() so that a client will be put just a single time
into the waiting list for a given key.

Since it is Saturday, I hope this fixes issue #801.
64e69c0
@antirez antirez referenced this issue from a commit
@antirez Blocking POP: use a dictionary to store keys clinet side.
To store the keys we block for during a blocking pop operation, in the
case the client is blocked for more data to arrive, we used a simple
linear array of redis objects, in the blockingState structure:

    robj **keys;
    int count;

However in order to fix issue #801 we also use a dictionary in order to
avoid to end in the blocked clients queue for the same key multiple
times with the same client.

The dictionary was only temporary, just to avoid duplicates, but since
we create / destroy it there is no point in doing this duplicated work,
so this commit simply use a dictionary as the main structure to store
the keys we are blocked for. So instead of the previous fields we now
just have:

    dict *keys;

This simplifies the code and reduces the work done by the server during
a blocking POP operation.
07a9f85
@antirez
Owner

Hello again, just to make you aware of the fact that I refactored the code. After the fix it was possibly to simplify it in this way: 2f87cf8

However, I carefully checked the code to ensure it is functionally equivalent, and all our tests are passing without issues.

So I merged the new version of the code in the 2.6, 2.8 and unstable release.

A Redis 2.6.7 release with the fix is planned for today.

So I can close this issue and say thank you again to you for your great help.

@antirez antirez closed this
@antirez antirez referenced this issue from a commit
@antirez Memory leak fixed: release client's bpop->keys dictionary.
Refactoring performed after issue #801 resolution (see commit
2f87cf8) introduced a memory leak that
is fixed by this commit.

I simply forgot to free the new allocated dictionary in the client
structure trusting the output of "make test" on OSX.

However due to changes in the "leaks" utility the test was no longer
testing memory leaks. This problem was also fixed.

Fortunately the CI test running at ci.redis.io spotted the bug in the
valgrind run.

The leak never ended into a stable release.
ab2924c
@antirez antirez referenced this issue from a commit
@antirez Memory leak fixed: release client's bpop->keys dictionary.
Refactoring performed after issue #801 resolution (see commit
2f87cf8) introduced a memory leak that
is fixed by this commit.

I simply forgot to free the new allocated dictionary in the client
structure trusting the output of "make test" on OSX.

However due to changes in the "leaks" utility the test was no longer
testing memory leaks. This problem was also fixed.

Fortunately the CI test running at ci.redis.io spotted the bug in the
valgrind run.

The leak never ended into a stable release.
984f6ed
@antirez antirez referenced this issue from a commit
@antirez Memory leak fixed: release client's bpop->keys dictionary.
Refactoring performed after issue #801 resolution (see commit
2f87cf8) introduced a memory leak that
is fixed by this commit.

I simply forgot to free the new allocated dictionary in the client
structure trusting the output of "make test" on OSX.

However due to changes in the "leaks" utility the test was no longer
testing memory leaks. This problem was also fixed.

Fortunately the CI test running at ci.redis.io spotted the bug in the
valgrind run.

The leak never ended into a stable release.
124cb6d
@antirez antirez referenced this issue from a commit
@antirez Client should not block multiple times on the same key.
Sending a command like:

BLPOP foo foo foo foo 0

Resulted into a crash before this commit since the client ended being
inserted in the waiting list for this key multiple times.
This resulted into the function handleClientsBlockedOnLists() to fail
because we have code like that:

    if (de) {
        list *clients = dictGetVal(de);
        int numclients = listLength(clients);

        while(numclients--) {
            listNode *clientnode = listFirst(clients);

            /* server clients here... */
        }
    }

The code to serve clients used to remove the served client from the
waiting list, so if a client is blocking multiple times, eventually the
call to listFirst() will return NULL or worse will access random memory
since the list may no longer exist as it is removed by the function
unblockClientWaitingData() if there are no more clients waiting for this
list.

To avoid making the rest of the implementation more complex, this commit
modifies blockForKeys() so that a client will be put just a single time
into the waiting list for a given key.

Since it is Saturday, I hope this fixes issue #801.
0e78114
@antirez antirez referenced this issue from a commit
@antirez Blocking POP: use a dictionary to store keys clinet side.
To store the keys we block for during a blocking pop operation, in the
case the client is blocked for more data to arrive, we used a simple
linear array of redis objects, in the blockingState structure:

    robj **keys;
    int count;

However in order to fix issue #801 we also use a dictionary in order to
avoid to end in the blocked clients queue for the same key multiple
times with the same client.

The dictionary was only temporary, just to avoid duplicates, but since
we create / destroy it there is no point in doing this duplicated work,
so this commit simply use a dictionary as the main structure to store
the keys we are blocked for. So instead of the previous fields we now
just have:

    dict *keys;

This simplifies the code and reduces the work done by the server during
a blocking POP operation.
b15be97
@antirez antirez referenced this issue from a commit
@antirez Memory leak fixed: release client's bpop->keys dictionary.
Refactoring performed after issue #801 resolution (see commit
2f87cf8) introduced a memory leak that
is fixed by this commit.

I simply forgot to free the new allocated dictionary in the client
structure trusting the output of "make test" on OSX.

However due to changes in the "leaks" utility the test was no longer
testing memory leaks. This problem was also fixed.

Fortunately the CI test running at ci.redis.io spotted the bug in the
valgrind run.

The leak never ended into a stable release.
8c36eed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.