Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async: remove file event lock #10090

Merged
merged 21 commits into from Jul 12, 2016
Merged

Conversation

yuyuyu101
Copy link
Member

prepare for the next async messenger backend framework

@yuyuyu101
Copy link
Member Author

@@ -30,6 +30,7 @@ namespace ceph {
int set_nonblock(int sd);
void set_close_on_exec(int sd);
void set_socket_options(int sd);
void set_socket_options(int sd, bool nodelay, int size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we have two set_socket_options()?

@yuyuyu101
Copy link
Member Author

@tchaikov thanks!!! it's a rebase conflicting

@@ -126,10 +126,8 @@ EventCenter::~EventCenter()
}
assert(time_events.empty());

if (notify_receive_fd >= 0) {
delete_file_event(notify_receive_fd, EVENT_READABLE);
if (notify_receive_fd >= 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why there is no need to delete delete_file_event() in the dtor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because delete_file_event is used to delete fd from epoll pool. And we will the whole epoll in the following.

Copy link
Contributor

@tchaikov tchaikov Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will what? i guess you mean "nuke" or something? and what does "in the following" stands for? seems this commit does not look right by its own, could you move it to the commit where "we will nuke the whole epoll"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'delete driver"

Copy link
Member Author

@yuyuyu101 yuyuyu101 Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, delete the epoll in "delete driver"

@tchaikov
Copy link
Contributor

tchaikov commented Jul 7, 2016

so each event thread which own EventCenter must operate EventCenter in local thread

i see you want to have a 1:1 mapping from EventCenter to thread, as you label this PR with performance, but maybe you can put more details on the rationale of why this improves the performance in one of the commits, and accompany it with a micro benchmark on the improvement?

@yuyuyu101
Copy link
Member Author

@tchaikov actually the performance isn't the main purpose of this PR. Local eventcenter is required by dpdk. Since this pr is a part of #9230

@yuyuyu101
Copy link
Member Author

the performance is the side effect

@tchaikov
Copy link
Contributor

tchaikov commented Jul 7, 2016

okay, i can hardly tell the intention of this move without the context.

@tchaikov tchaikov removed their assignment Jul 7, 2016
@yuyuyu101
Copy link
Member Author

@tchaikov sorry, I need to clarify at first.

@tchaikov
Copy link
Contributor

tchaikov commented Jul 7, 2016

sorry, i am not familiar with this area or its background. what interested me was #10056 which addresses some of qa run failures. let me know if you could document the commit a little bit. or maybe we can have some other expert in (async) msgr to review this changeset so less latency is expected.

existing->delay_state->set_center(new_center);
} else if (existing->state == STATE_CLOSED) {
::close(new_fd);
return ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if continue, we will dispatch the later event which we wan't

@yuyuyu101 yuyuyu101 force-pushed the wip-remove-async-lock branch 3 times, most recently from bb1c912 to 2440b0f Compare July 7, 2016 14:27
::shutdown(sd, SHUT_RDWR);
::close(sd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sd dupped before? if not, ::close() would suffice in this case.

@tchaikov
Copy link
Contributor

tchaikov commented Jul 12, 2016

reviewed. i don't think this PR is a refactor. it has following non-trivial changes

  • fixes on cleanup
  • and shards the fds by the thread handling them
  • as a side effect, the lock for inter-thread resource sharing is removed.

other than some nits, generally looks good.

@yuyuyu101
Copy link
Member Author

@tchaikov thanks!

yuyuyu101 and others added 9 commits July 12, 2016 23:37
We are make each AsyncConnection/AsyncMessenger only modify its file event
in event thread. So make sure create/delete_file_event aren't directly called.

Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Previously we only exchange fd when replacing, now we will introduce dpdk
plugin in the near future. It needs all fd used locally which not like
kernel socket shared by all cores.

So we need to add EventCenter swapping to let each socket is associated to
EventCenter.

Signed-off-by: Haomai Wang <haomai@xsky.com>
EventCenter::init is called by other thread instead of event thread, so we
need to move create_file_event to set_owner which is called by event thread.

Signed-off-by: Haomai Wang <haomai@xsky.com>
When we create event thread, it need a little time to enter event loop(like
calling set_owner), if caller is going to call create_file_event before event
thread enter event loop, it will trigger assert.

Signed-off-by: Haomai Wang <haomai@xsky.com>
because if we are in STATE_CLOSED, fd must be -1

Signed-off-by: Haomai Wang <haomai@xsky.com>
Now all EventCenter will exists within one thread, it will let all file events
api changes without locks.

Signed-off-by: Haomai Wang <haomai@xsky.com>
Let cleanup resources things all in shutdown_socket

Signed-off-by: Haomai Wang <haomai@xsky.com>
When replacing and someone called mark_down, it will delete_time_event which
isn't allowed. Because we're exchaning EventCenter now!

Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Fixes: http://tracker.ceph.com/issues/16554
Signed-off-by: Haomai Wang <haomai@xsky.com>
…lacing

When replacing, we don't expect any AsyncConnection dispatch new event which
will cause thing chaos

Signed-off-by: Haomai Wang <haomai@xsky.com>
Signed-off-by: Haomai Wang <haomai@xsky.com>
Otherwise if message in queue, we will continue to reconnect right now,
it won't meet our expectation that we want our connect request delay

Signed-off-by: Haomai Wang <haomai@xsky.com>
because we want to get the right log sequence which mixes ceph logginer and
cerr. Otherwise, cerr output make the logs a little disordered.

Signed-off-by: Haomai Wang <haomai@xsky.com>
@yuyuyu101
Copy link
Member Author

mostly done

1. A -> B
2. goto standby
3. B mark down
4. A reconnect to B
5. got reset session and dispatch remote reset
6. because remote reset is executed in DispatchQueue, it will be delayed
7. A -> B successfully and begin to send message
8. assert because we found the first message is missing but it's reasonble

if policy.resetcheck is true

Signed-off-by: Haomai Wang <haomai@xsky.com>
@yuyuyu101 yuyuyu101 merged commit e489cd4 into ceph:master Jul 12, 2016
@yuyuyu101 yuyuyu101 deleted the wip-remove-async-lock branch July 12, 2016 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants