New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async: remove file event lock #10090
Conversation
2281f9b
to
1705c62
Compare
0925284
to
e8622c1
Compare
@@ -30,6 +30,7 @@ namespace ceph { | |||
int set_nonblock(int sd); | |||
void set_close_on_exec(int sd); | |||
void set_socket_options(int sd); | |||
void set_socket_options(int sd, bool nodelay, int size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we have two set_socket_options()
?
e8622c1
to
31de313
Compare
@tchaikov thanks!!! it's a rebase conflicting |
31de313
to
7d8c863
Compare
@@ -126,10 +126,8 @@ EventCenter::~EventCenter() | |||
} | |||
assert(time_events.empty()); | |||
|
|||
if (notify_receive_fd >= 0) { | |||
delete_file_event(notify_receive_fd, EVENT_READABLE); | |||
if (notify_receive_fd >= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why there is no need to delete delete_file_event()
in the dtor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because delete_file_event is used to delete fd from epoll pool. And we will the whole epoll in the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will what? i guess you mean "nuke" or something? and what does "in the following" stands for? seems this commit does not look right by its own, could you move it to the commit where "we will nuke the whole epoll"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'delete driver"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, delete the epoll in "delete driver"
i see you want to have a 1:1 mapping from EventCenter to thread, as you label this PR with performance, but maybe you can put more details on the rationale of why this improves the performance in one of the commits, and accompany it with a micro benchmark on the improvement? |
the performance is the side effect |
okay, i can hardly tell the intention of this move without the context. |
@tchaikov sorry, I need to clarify at first. |
sorry, i am not familiar with this area or its background. what interested me was #10056 which addresses some of qa run failures. let me know if you could document the commit a little bit. or maybe we can have some other expert in (async) msgr to review this changeset so less latency is expected. |
existing->delay_state->set_center(new_center); | ||
} else if (existing->state == STATE_CLOSED) { | ||
::close(new_fd); | ||
return ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if continue, we will dispatch the later event which we wan't
bb1c912
to
2440b0f
Compare
::shutdown(sd, SHUT_RDWR); | ||
::close(sd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is sd dupped before? if not, ::close()
would suffice in this case.
reviewed. i don't think this PR is a refactor. it has following non-trivial changes
other than some nits, generally looks good. |
@tchaikov thanks! |
We are make each AsyncConnection/AsyncMessenger only modify its file event in event thread. So make sure create/delete_file_event aren't directly called. Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Previously we only exchange fd when replacing, now we will introduce dpdk plugin in the near future. It needs all fd used locally which not like kernel socket shared by all cores. So we need to add EventCenter swapping to let each socket is associated to EventCenter. Signed-off-by: Haomai Wang <haomai@xsky.com>
EventCenter::init is called by other thread instead of event thread, so we need to move create_file_event to set_owner which is called by event thread. Signed-off-by: Haomai Wang <haomai@xsky.com>
When we create event thread, it need a little time to enter event loop(like calling set_owner), if caller is going to call create_file_event before event thread enter event loop, it will trigger assert. Signed-off-by: Haomai Wang <haomai@xsky.com>
because if we are in STATE_CLOSED, fd must be -1 Signed-off-by: Haomai Wang <haomai@xsky.com>
…nished Fixes: http://tracker.ceph.com/issues/16552 Signed-off-by: Haomai Wang <haomai@xsky.com>
Now all EventCenter will exists within one thread, it will let all file events api changes without locks. Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
9df59d3
to
3c595c2
Compare
Let cleanup resources things all in shutdown_socket Signed-off-by: Haomai Wang <haomai@xsky.com>
When replacing and someone called mark_down, it will delete_time_event which isn't allowed. Because we're exchaning EventCenter now! Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Fixes: http://tracker.ceph.com/issues/16554 Signed-off-by: Haomai Wang <haomai@xsky.com>
…lacing When replacing, we don't expect any AsyncConnection dispatch new event which will cause thing chaos Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Otherwise if message in queue, we will continue to reconnect right now, it won't meet our expectation that we want our connect request delay Signed-off-by: Haomai Wang <haomai@xsky.com>
because we want to get the right log sequence which mixes ceph logginer and cerr. Otherwise, cerr output make the logs a little disordered. Signed-off-by: Haomai Wang <haomai@xsky.com>
3c595c2
to
bcc07cd
Compare
mostly done |
1. A -> B 2. goto standby 3. B mark down 4. A reconnect to B 5. got reset session and dispatch remote reset 6. because remote reset is executed in DispatchQueue, it will be delayed 7. A -> B successfully and begin to send message 8. assert because we found the first message is missing but it's reasonble if policy.resetcheck is true Signed-off-by: Haomai Wang <haomai@xsky.com>
bcc07cd
to
96943ee
Compare
prepare for the next async messenger backend framework