Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA housekeeping failure, master close the connection frequently #1082

Open
zhanguohuang opened this issue Mar 13, 2019 · 3 comments

Comments

@zhanguohuang
Copy link
Contributor

@zhanguohuang zhanguohuang commented Mar 13, 2019

#The issue tracker is ONLY used for bug report(feature request need to follow RIP process). Keep in mind, please check whether there is an existing same report before your raise a new one.

Alternately (especially if your communication is not a bug report), you can send mail to our mailing lists. We welcome any friendly suggestions, bug fixes, collaboration and other improvements.

Please ensure that your bug report is clear and that it is complete. Otherwise, we may be unable to understand it or to reproduce it, either of which would prevent us from fixing the bug. We strongly recommend the report(bug report or feature request) could include some hints as the following:

BUG REPORT

  1. Please describe the issue you observed:
  • What did you do (The steps to reproduce)?
    Start a master and slave in the default way. And then do not send any messages.

  • What did you expect to see?
    Whether or not a message is sent, the master will not disconnect from the slave unless there is an exception.

  • What did you see instead?
    The log will be found on master‘s storing logs every 20 seconds. This log is as follows:
    ha housekeeping, found this connection[ip:port] expired, 200xx
    At the same time, the log is as follows will be found on slave's storing logs:
    HAClient, processReadEvent read socket < 0

  1. Please tell us about your environment:
    rocketmq version: 4.3.2
    jdk version: 1.8.0_101

  2. Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc):
    When the connection was just established,, the WriteSocketService.run() blocks waiting for the slave to escalate the offset (default 5 seconds), and if a message is sent in at this point, and master's mode for disk flush is SYNC_MASTER, the time-consuming of CommitLog.handleHA(args) will be affected, causing the producer send message timeout.

FEATURE REQUEST

  1. Please describe the feature you are requesting.

  2. Provide any additional detail on your proposed use case for this feature.

  3. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

  4. If there are some sub-tasks using -[] for each subtask and create a corresponding issue to map to the sub task:

@zhanguohuang

This comment has been minimized.

Copy link
Contributor Author

@zhanguohuang zhanguohuang commented Mar 13, 2019

I have solved this problem. #1083

@chengqipeng

This comment has been minimized.

Copy link

@chengqipeng chengqipeng commented Mar 31, 2019

zhanguohuang您好,请问下有解决上面问题吗,现在用rocketmq双主双从,经常出现org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <10.111.xxx.xxx:10911> timeout, 3000(ms)

@zhanguohuang

This comment has been minimized.

Copy link
Contributor Author

@zhanguohuang zhanguohuang commented Mar 31, 2019

如果你确认问题跟我描述的完全一样,就可以参考以下两种方案:
1.调整slave中的参数haSendHeartbeatInterval从5000调到4000,master不需要改,用mqadmin updateBrokerConfig即可,不需要重启(推荐使用此方案)
2.查看我提的pr #1083 ,需求修改源码打包部署到slave机器上,而且需要重启。 @chengqipeng

补充:我提的另一个issue也会导致你说的这个问题 #1108 ,及它的解决方案 #1109

vongosling added a commit that referenced this issue Jul 27, 2019
supercym pushed a commit to supercym/rocketmq that referenced this issue Aug 22, 2019
duhenglucky added a commit that referenced this issue Aug 22, 2019
* Remove the useless files

* Replace PermSize with MetaspaceSize, details see http://openjdk.java.net/jeps/122

* Update DLedgerCommitLog.java (#1145)

Delete useless code

* Remove the duplicate content

* Polish the comment (#1107)

* Minor Typo fix  (#860)

* [ISSUE #1082] Fix disconnection of HA (#1083)

* fixed the text description in chinese doc (#1339)

* fix /dev/shm not found on some OSs (#1345)

* Refactor the protection logic when pulling

* change the MQVersion variable to rocketmq 4.5.2 version;

* Minor polish

* Fix the wrong package name

* [maven-release-plugin] prepare release rocketmq-all-4.5.2

* [maven-release-plugin] prepare for next development iteration

* [RIP-15]Add Ipv6 support for RocketMQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.