-
Notifications
You must be signed in to change notification settings - Fork 12k
[ROCKETMQ-184]-It takes too long(3-33 seconds) to switch to read from slave when master crashes #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| final SemaphoreReleaseOnlyOnce once = new SemaphoreReleaseOnlyOnce(this.semaphoreAsync); | ||
|
|
||
| final ResponseFuture responseFuture = new ResponseFuture(opaque, timeoutMillis, invokeCallback, once); | ||
| final GenericFutureListener<ChannelFuture> chanelCloseListener = new ChannelFutureListener() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chanel-->channel
|
Thanks @Jaskey, this is indeed a good place to improve. For the implementation, I suggest an alternative generic way, instead of add a close future for each request, we add the opaque integer into a collection per channel. Remove the opaque integer on response or invalidate all of them in NettyConnectManageHandler. Suggested approach has fewer memory footprint and we may also easily cover the sync request scenario -- respond earlier before |
|
I have considered that, which will take more efforts to achieve the same goal, since we need to change some structure to make the connect manager to get access to the responseFuture map. For my first implementation, I just want to issue this problem and involve you guys to discuss. If you think that is indeed a better approach, I will submit an updated implementations for that, and then let all guys to choose. |
|
You are right that more changes are required for my suggested approach. But, IMO, the suggested way is more unified in design and may also save a few memory footprint in case we have very large semaphore initial capacity. Indeed, this is the place we need to enhance. Let's bring more guys into discussion before you implement the suggested approach. They should easily conceive what's going on here via checking changes made in your PR. Any opinion on this issue? @zhouxinyu @shroman @vongosling |
|
Thanks @Jaskey , agree with @lizhanhui , use NettyConnectManageHandler to handle this close event is a better way, no need to import two mechanisms. |
2 similar comments
2 similar comments
2 similar comments
|
@zhouxinyu @lizhanhui |
Looks this PR is not updated. Do you forget to push your changes? |
|
It has been refactored using the close event handler mechanism according to your advice, please review the pr from |
|
@lizhanhui @vsair @zhouxinyu @shroman any ideas for this updated solution? |
|
+1 |
|
LGTM @zhouxinyu |
|
@zhouxinyu @vongosling @shroman what's your advice, can this pr be merged? |
|
This pr is not updated from the source for long, so I updated just now , please review. I think this is a good improvement for HA. |
vongosling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
JIRA:https://issues.apache.org/jira/browse/ROCKETMQ-184?jql=project%20%3D%20ROCKETMQ
Problem, no listener is triggered when Chanel is close.
When async command sent to the server, and the server is crash before sending response to client, the callback can not be invoked in time. Instead, the callback can only be triggered by the timeout scan service.
This is obvious for pulling message since the timeout is by default 30 seconds. So if master crashes before process response to the client, the client can not repull until scan service tell it, which takes at most 30 seconds. And repull will have 3 seconds delay, so the HA to read from slave has to take 3-33 seconds when this problem occurs.