Skip to content

Conversation

@Jaskey
Copy link
Contributor

@Jaskey Jaskey commented Apr 20, 2017

JIRA:https://issues.apache.org/jira/browse/ROCKETMQ-184?jql=project%20%3D%20ROCKETMQ

Problem, no listener is triggered when Chanel is close.

When async command sent to the server, and the server is crash before sending response to client, the callback can not be invoked in time. Instead, the callback can only be triggered by the timeout scan service.

This is obvious for pulling message since the timeout is by default 30 seconds. So if master crashes before process response to the client, the client can not repull until scan service tell it, which takes at most 30 seconds. And repull will have 3 seconds delay, so the HA to read from slave has to take 3-33 seconds when this problem occurs.

@coveralls
Copy link

coveralls commented Apr 20, 2017

Coverage Status

Coverage increased (+0.04%) to 34.666% when pulling 0dc37e1 on Jaskey:ROCKETMQ-184-slave-switch into 42f78c2 on apache:develop.

@coveralls
Copy link

coveralls commented Apr 21, 2017

Coverage Status

Coverage decreased (-0.0007%) to 34.631% when pulling 65ffffd on Jaskey:ROCKETMQ-184-slave-switch into 42f78c2 on apache:develop.

final SemaphoreReleaseOnlyOnce once = new SemaphoreReleaseOnlyOnce(this.semaphoreAsync);

final ResponseFuture responseFuture = new ResponseFuture(opaque, timeoutMillis, invokeCallback, once);
final GenericFutureListener<ChannelFuture> chanelCloseListener = new ChannelFutureListener() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chanel-->channel

@lizhanhui
Copy link
Contributor

Thanks @Jaskey, this is indeed a good place to improve.

For the implementation, I suggest an alternative generic way, instead of add a close future for each request, we add the opaque integer into a collection per channel. Remove the opaque integer on response or invalidate all of them in NettyConnectManageHandler. Suggested approach has fewer memory footprint and we may also easily cover the sync request scenario -- respond earlier before timeoutMillis amount of time elapsed in case channel experiences problems.

@Jaskey
Copy link
Contributor Author

Jaskey commented Apr 22, 2017

@lizhanhui

I have considered that, which will take more efforts to achieve the same goal, since we need to change some structure to make the connect manager to get access to the responseFuture map.

For my first implementation, I just want to issue this problem and involve you guys to discuss.

If you think that is indeed a better approach, I will submit an updated implementations for that, and then let all guys to choose.

@lizhanhui
Copy link
Contributor

You are right that more changes are required for my suggested approach. But, IMO, the suggested way is more unified in design and may also save a few memory footprint in case we have very large semaphore initial capacity.

Indeed, this is the place we need to enhance.

Let's bring more guys into discussion before you implement the suggested approach. They should easily conceive what's going on here via checking changes made in your PR. Any opinion on this issue? @zhouxinyu @shroman @vongosling

@zhouxinyu
Copy link
Member

Thanks @Jaskey , agree with @lizhanhui , use NettyConnectManageHandler to handle this close event is a better way, no need to import two mechanisms.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 37.974% when pulling 388e88b on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 37.974% when pulling 388e88b on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 37.974% when pulling 388e88b on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 37.869% when pulling 40d77ea on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 37.869% when pulling 40d77ea on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 37.869% when pulling 40d77ea on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

coveralls commented Apr 24, 2017

Coverage Status

Coverage increased (+0.004%) to 37.858% when pulling abd61c8 on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 37.761% when pulling abd61c8 on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 37.761% when pulling abd61c8 on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 37.761% when pulling abd61c8 on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@coveralls
Copy link

coveralls commented Apr 24, 2017

Coverage Status

Coverage increased (+0.04%) to 37.894% when pulling d8faa33 on Jaskey:ROCKETMQ-184-slave-switch into 6a9628b on apache:develop.

@Jaskey
Copy link
Contributor Author

Jaskey commented Apr 24, 2017

@zhouxinyu @lizhanhui
please review the updated pr

@lizhanhui
Copy link
Contributor

please review the updated pr

Looks this PR is not updated. Do you forget to push your changes?

@Jaskey
Copy link
Contributor Author

Jaskey commented May 11, 2017

@lizhanhui

It has been refactored using the close event handler mechanism according to your advice, please review the pr from close callback in NettyRemotingClient.java

@Jaskey
Copy link
Contributor Author

Jaskey commented May 18, 2017

@lizhanhui @vsair @zhouxinyu @shroman

any ideas for this updated solution?

@lizhanhui
Copy link
Contributor

+1

@dongeforever
Copy link
Member

LGTM @zhouxinyu

@Jaskey
Copy link
Contributor Author

Jaskey commented Jun 6, 2017

@zhouxinyu @vongosling @shroman what's your advice, can this pr be merged?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 39.238% when pulling 80f4e6b on Jaskey:ROCKETMQ-184-slave-switch into cba3089 on apache:develop.

@Jaskey
Copy link
Contributor Author

Jaskey commented Nov 14, 2017

This pr is not updated from the source for long, so I updated just now , please review.

I think this is a good improvement for HA.

@zhouxinyu @vongosling @shroman @dongeforever

Copy link
Member

@vongosling vongosling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vongosling vongosling merged commit 6ae619c into apache:develop Jul 14, 2018
@vongosling vongosling added this to the 4.3.0 milestone Jul 14, 2018
lizhanhui pushed a commit to lizhanhui/rocketmq that referenced this pull request Jun 25, 2019
renshuaibing-aaron pushed a commit to renshuaibing-aaron/rocketmq that referenced this pull request Apr 13, 2020
JiaMingLiu93 pushed a commit to JiaMingLiu93/rocketmq that referenced this pull request May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants