Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/AsyncConnection: dispatch write handler on keepalive2 #11601

Merged
merged 4 commits into from Nov 10, 2016

Conversation

idryomov
Copy link
Contributor

@idryomov idryomov commented Oct 21, 2016

Commit 5c0a689 ("AsyncConnection: remove unnecessary "send" flag")
dropped this parameter.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@liewegas
Copy link
Member

ping @yuyuyu101

note that at some point we should probably rename 'outcoming' to 'outgoing' :)

@yuyuyu101
Copy link
Member

good catch. LGTM

@idryomov
Copy link
Contributor Author

@yuyuyu101 I'm not entirely clear on the already_dispatch_writer bit - I copied it from the normal message ack logic, which was updated in 88e0b2c ("AsyncConnection: Let receiver ack message ASAP"). Why are we scheduling the writer only once per process() invocation in that case? Given that process() will potentially handle multiple messages, that seems wrong.

@yuyuyu101
Copy link
Member

@idryomov oh, yes! it may exists rarely case when handle_write done io but process() one doesn't complete the loop...

I think we can move already_dispatch_writer && dispatch codes out of while loop, so we can make sure handle_write could carry on all pending io

@yuyuyu101
Copy link
Member

hmm, because this dispatch only need to ack message, so it won't be a big problem. But we need to fix..

@yuyuyu101
Copy link
Member

oh, I think the previous logic is ok... the only problem is this adding codes, we don't need to judge "already_dispatch_writer" and should call dispatch_event_external directly. Anyway, move dispatch_event_external out of while loop should be more clear for all cases

@idryomov
Copy link
Contributor Author

idryomov commented Oct 21, 2016

My concern with the existing logic is that in the presence of multiple messages we are effectively racing the reader with the writer and unless I'm missing something can still exit the loop with unsent seq acks.

Setting a new bool need_dispatch_writer in those two places and calling dispatch_event_external() once before returning if need_dispatch_writer is set is definitely better and more intuitive. If you agree, I'll add it to this pull request.

@idryomov
Copy link
Contributor Author

@yuyuyu101 Added "AsyncConnection: eliminate reader vs writer ack race in process()".

@yuyuyu101
Copy link
Member

LGTM!

@yuriw
Copy link
Contributor

yuriw commented Oct 26, 2016

@idryomov
Copy link
Contributor Author

The "eliminate reader vs writer ack race in process()" was a quick addition and I haven't tested it with failure injection - could be something caused by connection resets. This is a low priority fix - I'll hammer on it later.

@@ -821,6 +816,8 @@ void AsyncConnection::process()
}
} while (prev_state != state);

if (need_dispatch_writer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change to "if (need_dispatch_writer && can_write == WriteStatus::CANWRITE)"

Otherwise, connection may in STATE_CLOSED but we will dispatch write_handler which will result in EventCenter call a deconstructed method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and a write_lock around it then - I'll take a look, thanks @yuyuyu101.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no write_lock needed. can_write is a atomic variable and read access is allowed without lock. only modify is lock needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I assumed a check in handle_write() was sufficient, but it's not. is_connected() can be used to make it slightly more clear, right?

@yuyuyu101
Copy link
Member

absolutely!

On Thu, Oct 27, 2016 at 5:33 PM, Ilya Dryomov notifications@github.com
wrote:

@idryomov commented on this pull request.

In src/msg/async/AsyncConnection.cc
#11601:

@@ -821,6 +816,8 @@ void AsyncConnection::process()
}
} while (prev_state != state);

  • if (need_dispatch_writer)

I see - I assumed a check in handle_write() was sufficient, but it's not.
is_connected() can be used to make it slightly more clear, right?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11601, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA72YVyWz842kceI_7IrsjoxaegyDusIks5q4G_bgaJpZM4Kdfg5
.

Best Regards,

Wheat

@liewegas liewegas changed the title AsyncConnection: dispatch write handler on keepalive2 msg/async/AsyncConnection: dispatch write handler on keepalive2 Nov 3, 2016
@liewegas liewegas modified the milestone: kraken Nov 3, 2016
process() will potentially handle multiple messages, yet we are
scheduling the writer to write out acks only once, after the first
message is processed.  This effectively races the reader with the
writer and leaves the window for exiting the process() loop with
pending acks (ack_left > 0 but no kicked writer).

Close this window by moving dispatch_event_external() out of the loop.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…r_ack

Commit 48d929c ("msg: async: improve _send_keepalive_or_ack()
a little") removed _try_send() call.  _try_send(false) was a no-op by
then, but still - rename appropriately.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
...otherwise, if the connection is idle (i.e. no proper TAG_MSG
messages), keepalive ack will not get sent until some point in the
future.

Fixes: #17664
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

idryomov commented Nov 8, 2016

@yuriw, @tchaikov Pushed a new version and ran one of the failed jobs from #11601 (comment): http://pulpito.ceph.com/dis-2016-11-08_12:59:33-rados-wip-ms-async-keepalive2-distro-basic-mira/. Please pick this up for your next run.

@tchaikov tchaikov merged commit d3927fe into master Nov 10, 2016
@tchaikov tchaikov deleted the wip-ms-async-keepalive2 branch November 10, 2016 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants