New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async/AsyncConnection: dispatch write handler on keepalive2 #11601
Conversation
Commit 5c0a689 ("AsyncConnection: remove unnecessary "send" flag") dropped this parameter. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ping @yuyuyu101 note that at some point we should probably rename 'outcoming' to 'outgoing' :) |
good catch. LGTM |
@yuyuyu101 I'm not entirely clear on the already_dispatch_writer bit - I copied it from the normal message ack logic, which was updated in 88e0b2c ("AsyncConnection: Let receiver ack message ASAP"). Why are we scheduling the writer only once per process() invocation in that case? Given that process() will potentially handle multiple messages, that seems wrong. |
@idryomov oh, yes! it may exists rarely case when handle_write done io but process() one doesn't complete the loop... I think we can move already_dispatch_writer && dispatch codes out of while loop, so we can make sure handle_write could carry on all pending io |
hmm, because this dispatch only need to ack message, so it won't be a big problem. But we need to fix.. |
oh, I think the previous logic is ok... the only problem is this adding codes, we don't need to judge "already_dispatch_writer" and should call dispatch_event_external directly. Anyway, move dispatch_event_external out of while loop should be more clear for all cases |
My concern with the existing logic is that in the presence of multiple messages we are effectively racing the reader with the writer and unless I'm missing something can still exit the loop with unsent seq acks. Setting a new bool need_dispatch_writer in those two places and calling dispatch_event_external() once before returning if need_dispatch_writer is set is definitely better and more intuitive. If you agree, I'll add it to this pull request. |
1ba4813
to
05ba261
Compare
@yuyuyu101 Added "AsyncConnection: eliminate reader vs writer ack race in process()". |
LGTM! |
Suspecting some failures related to this PR in http://pulpito.ceph.com/yuriw-2016-10-25_17:33:31-rados-wip-yuri-testing2_2016_10_24-distro-basic-smithi/ @idryomov will test |
The "eliminate reader vs writer ack race in process()" was a quick addition and I haven't tested it with failure injection - could be something caused by connection resets. This is a low priority fix - I'll hammer on it later. |
@@ -821,6 +816,8 @@ void AsyncConnection::process() | |||
} | |||
} while (prev_state != state); | |||
|
|||
if (need_dispatch_writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to change to "if (need_dispatch_writer && can_write == WriteStatus::CANWRITE)"
Otherwise, connection may in STATE_CLOSED but we will dispatch write_handler which will result in EventCenter call a deconstructed method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and a write_lock around it then - I'll take a look, thanks @yuyuyu101.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no write_lock needed. can_write is a atomic variable and read access is allowed without lock. only modify is lock needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - I assumed a check in handle_write() was sufficient, but it's not. is_connected() can be used to make it slightly more clear, right?
absolutely! On Thu, Oct 27, 2016 at 5:33 PM, Ilya Dryomov notifications@github.com
Best Regards, Wheat |
process() will potentially handle multiple messages, yet we are scheduling the writer to write out acks only once, after the first message is processed. This effectively races the reader with the writer and leaves the window for exiting the process() loop with pending acks (ack_left > 0 but no kicked writer). Close this window by moving dispatch_event_external() out of the loop. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…r_ack Commit 48d929c ("msg: async: improve _send_keepalive_or_ack() a little") removed _try_send() call. _try_send(false) was a no-op by then, but still - rename appropriately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
...otherwise, if the connection is idle (i.e. no proper TAG_MSG messages), keepalive ack will not get sent until some point in the future. Fixes: #17664 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
05ba261
to
aeb979a
Compare
@yuriw, @tchaikov Pushed a new version and ran one of the failed jobs from #11601 (comment): http://pulpito.ceph.com/dis-2016-11-08_12:59:33-rados-wip-ms-async-keepalive2-distro-basic-mira/. Please pick this up for your next run. |
http://tracker.ceph.com/issues/17664