Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error: Server closed the connection" while doing pipelining with dynomite #484

Closed
codophobia opened this issue Sep 25, 2017 · 3 comments
Closed

Comments

@codophobia
Copy link

codophobia commented Sep 25, 2017

I am trying to benchmark dynomite pipelining. I have set up two dynomite nodes at different data centers(cross data center replication). Without pipelining, everything works well.
Benchmarking pipelining at dc1 dynomite:
redis-benchmark -p 8102 -n 1000000 -t set,get -P 10 -q
output:
Error: Server closed the connection
dynomite(dc1) logs:

[2017-09-25 08:11:18.286] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.289] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a56840 71 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.289] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.291] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.291] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a56b00 70 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.295] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.295] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a56840 71 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.297] _msg_get:316 alloc_msg_count: 300000 caller: req_forward_remote_dc <CONN_CLIENT 0x5593d94bdca0 56 from '127.0.0.1:48628'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:89 <CONN_CLIENT 0x5593d94bdca0 -1 from '127.0.0.1:48628'> Waiting for 6 outstanding messages
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc280 -1 from '127.0.0.1:48596'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] msg_send:1185 <CONN_REMOTE_PEER_SERVER 0x5593c4a56b00 70 to '172.20.22.121:8101:rack2:dc2:0'> Setting ENOTRECOVERABLE happens here!
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593c622e670 -1 from '127.0.0.1:48602'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94ba860 -1 from '127.0.0.1:48608'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb0b0 -1 from '127.0.0.1:48614'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb570 -1 from '127.0.0.1:48620'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94ba730 -1 from '127.0.0.1:48626'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bbc90 -1 from '127.0.0.1:48632'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd450 -1 from '127.0.0.1:48638'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bcd30 -1 from '127.0.0.1:48644'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc9a0 -1 from '127.0.0.1:48650'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd0c0 -1 from '127.0.0.1:48560'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bcf90 -1 from '127.0.0.1:48566'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bdb70 -1 from '127.0.0.1:48572'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd910 -1 from '127.0.0.1:48578'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.297] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bda40 -1 from '127.0.0.1:48584'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94babf0 -1 from '127.0.0.1:48590'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453171, awaiting_rsps = 1
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453172, awaiting_rsps = 1
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453174, awaiting_rsps = 1
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453176, awaiting_rsps = 1
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453178, awaiting_rsps = 1
[2017-09-25 08:11:18.298] msg_put:599 Not freeing req 2453180, awaiting_rsps = 1
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb6a0 -1 from '127.0.0.1:48606'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bbdc0 -1 from '127.0.0.1:48612'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb900 -1 from '127.0.0.1:48618'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd320 -1 from '127.0.0.1:48624'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bbb60 -1 from '127.0.0.1:48630'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94be030 -1 from '127.0.0.1:48636'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bba30 -1 from '127.0.0.1:48642'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bcc00 -1 from '127.0.0.1:48648'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bad20 -1 from '127.0.0.1:48654'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bae50 -1 from '127.0.0.1:48558'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc3b0 -1 from '127.0.0.1:48564'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94ba990 -1 from '127.0.0.1:48570'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bddd0 -1 from '127.0.0.1:48576'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb440 -1 from '127.0.0.1:48582'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc870 -1 from '127.0.0.1:48588'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd6b0 -1 from '127.0.0.1:48594'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb7d0 -1 from '127.0.0.1:48600'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.298] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc4e0 -1 from '127.0.0.1:48634'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bce60 -1 from '127.0.0.1:48640'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bdf00 -1 from '127.0.0.1:48646'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd1f0 -1 from '127.0.0.1:48652'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94baf80 -1 from '127.0.0.1:48556'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bbef0 -1 from '127.0.0.1:48562'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd7e0 -1 from '127.0.0.1:48568'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc610 -1 from '127.0.0.1:48574'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc740 -1 from '127.0.0.1:48580'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94baac0 -1 from '127.0.0.1:48586'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bcad0 -1 from '127.0.0.1:48592'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc020 -1 from '127.0.0.1:48598'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb310 -1 from '127.0.0.1:48604'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bc150 -1 from '127.0.0.1:48610'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bd580 -1 from '127.0.0.1:48616'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:18.299] client_unref_internal_try_put:94 <CONN_CLIENT 0x5593d94bb1e0 -1 from '127.0.0.1:48622'> unref owner <POOL 0x5593c4a36080 'dyn_o_mite'>
[2017-09-25 08:11:21.709] core_timeout:478 <REQ 0x559404f89150 1440173:1440172::0 REQ_REDIS_INCR> on <CONN_REMOTE_PEER_SERVER 0x5593c4a56840 71 to '172.20.22.121:8101:rack2:dc2:0'> timedout, timeout was 30000
[2017-09-25 08:11:21.803] dnode_peer_close:438 <CONN_REMOTE_PEER_SERVER 0x5593c4a56840 71 to '172.20.22.121:8101:rack2:dc2:0'> Closing, Dropped 47247 outqueue & 67388 inqueue requests
[2017-09-25 08:11:21.803] event_del_conn:234 epoll ctl on e 10 sd 71 failed: No such file or directory
[2017-09-25 08:11:21.803] core_timeout:478 <REQ 0x559403e0dc50 1440901:1440900::0 REQ_REDIS_INCR> on <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> timedout, timeout was 30000
[2017-09-25 08:11:21.886] dnode_peer_close:438 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 72 to '172.20.22.121:8101:rack2:dc2:0'> Closing, Dropped 68421 outqueue & 38322 inqueue requests
[2017-09-25 08:11:21.886] event_del_conn:234 epoll ctl on e 10 sd 72 failed: No such file or directory
[2017-09-25 08:11:22.804] conn_connect:321 <CONN_REMOTE_PEER_SERVER 0x5593c4a569a0 20 to '172.20.22.121:8101:rack2:dc2:0'> connecting.....
[2017-09-25 08:11:22.804] conn_connect:321 <CONN_REMOTE_PEER_SERVER 0x5593c4a56840 21 to '172.20.22.121:8101:rack2:dc2:0'> connecting.....

dynomite(dc2) logs:

[2017-09-25 08:11:20.025] _msg_get:307 allocated #msgs 300000 hit max allowable limit
[2017-09-25 08:11:20.025] server_ack_err:203 Could not allocate msg.
[2017-09-25 08:11:20.025] _msg_get:307 allocated #msgs 300000 hit max allowable limit
[2017-09-25 08:11:20.025] server_ack_err:203 Could not allocate msg.
[2017-09-25 08:11:20.025] _msg_get:307 allocated #msgs 300000 hit max allowable limit
[2017-09-25 08:11:20.025] server_ack_err:203 Could not allocate msg.
[2017-09-25 08:11:20.025] server_close:269 close <CONN_SERVER 0x562a9b611e10 14 to '127.0.0.1:6379:1'> Dropped 9534 outqueue & 0 inqueue requests
[2017-09-25 08:11:20.025] event_del_conn:234 epoll ctl on e 10 sd 14 failed: No such file or directory
[2017-09-25 08:11:20.025] conn_pool_notify_conn_close:170 <CONN_POOL 0x562a9b611370 active_conn:2 in array 3 max 3> Removing <CONN_SERVER 0x562a9b611e10 -1 to '127.0.0.1:6379:1'>
[2017-09-25 08:11:20.025] _msg_get:307 allocated #msgs 300000 hit max allowable limit
[2017-09-25 08:11:20.025] dn_stacktrace:326 [0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f8006c240c0]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [1] src/dynomite(req_forward_error+0xb4) [0x562a9a950bc0]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [2] src/dynomite(req_forward_all_local_racks+0x178) [0x562a9a95163d]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [3] src/dynomite(+0x1d46d) [0x562a9a95f46d]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [4] src/dynomite(+0x1d660) [0x562a9a95f660]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [5] src/dynomite(+0x2b169) [0x562a9a96d169]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [6] src/dynomite(+0x2b293) [0x562a9a96d293]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [7] src/dynomite(+0x2b848) [0x562a9a96d848]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [8] src/dynomite(msg_recv+0x9d) [0x562a9a96d93a]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [9] src/dynomite(+0x19a52) [0x562a9a95ba52]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [10] src/dynomite(core_core+0x10a) [0x562a9a95c040]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [11] src/dynomite(event_wait+0x197) [0x562a9a9988bd]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [12] src/dynomite(core_loop+0x7c) [0x562a9a95c902]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [13] src/dynomite(+0x43820) [0x562a9a985820]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [14] src/dynomite(main+0x162) [0x562a9a9859cd]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f80065922e1]
[2017-09-25 08:11:20.026] dn_stacktrace:331 system command did not succeed to print filename
[2017-09-25 08:11:20.026] dn_stacktrace:326 [16] src/dynomite(_start+0x2a) [0x562a

I have set mbuf_size to maximum limit of 64000 and max_msgs to 300000.
I have set max_msgs to 1000000(max limit) still the same happens.
Can you explain what exactly is max_msgs here?

@ipapapa
Copy link
Contributor

ipapapa commented Sep 30, 2017

Similarly to #487 I think you should decrease the mbuf_size The max_msgs is how many messages Dynomite is buffering. Once an upper limit has been reached Dynomite stops accepting further writes. This is done so that Dynomite is capped on how memory it is taking. Our production systems are usually set to 500000 but it depends on the instance type. Check the launch_dynomite script to get some ideas.

@shailesh33
Copy link
Contributor

Other than what @ipapapa mentioned, like I told you in the other issue #487, redis-benchmark will hammer Dynomite with max requests but the replication can be lagging behind. Dynomite tends to start failing requests as a means of admission control. Hope this helps. Please look at https://github.com/Netflix/ndbench the benchmarking project that we use to do load testing on Dynomite and many other projects internally.

@shailesh33
Copy link
Contributor

Please reopen if you have a follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants