Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA sync timeout is wrong when message length is larger than 160K #1535

Closed
wqliang opened this issue Oct 16, 2019 · 1 comment
Labels
bug
Milestone

Comments

@wqliang
Copy link
Contributor

@wqliang wqliang commented Oct 16, 2019

BUG REPORT

  1. Please describe the issue you observed:
  • What did you do (The steps to reproduce)?
    I deploy a cluster with a SYNC_MASTER and a Slave.
    Then I send a message larger than 160K.

  • What did you see instead?
    The master return "FLUSH_SLAVE_TIMEOUT" to producer and output log "transfer messsage to slave timeout".
    But actually, the message is transfer success.

  1. Please tell us about your environment:
    version: rocketmq-4.5.2

  2. Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc):
    There may be something wrong at HAService.GroupTransferService.doWaitTransfer as follow:
    for (int i = 0; !transferOK && i < 5; i++) {
    this.notifyTransferObject.waitForRunning(1000);
    transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
    }

haTransferBatchSize is default 32K, if a message is larger than 32K, it would transfer 32K each time and transfer many times. If message is larger than 5*32K, then it will be transfer at least 6 times. Each time slave receive data and ack to master, the notifyTransferObject would be notified. When the 5th notified, the loop would exit and than transferOk is still false because 6th transfer not finished.
In addition, if a message need to be transfer more than 10 times, the result of next message just behind this message would be wrong even it's a small message less than 32K.

Sending batch message is much easier to trigger this bug becasue single message would be compressed when it is larger than 4K.

@wqliang wqliang mentioned this issue Oct 16, 2019
5 of 6 tasks complete
@duhenglucky duhenglucky added the bug label Oct 18, 2019
@duhenglucky duhenglucky added this to the 4.6.0 milestone Oct 18, 2019
duhenglucky added a commit that referenced this issue Oct 21, 2019
[ISSUE #1535] Fix ha sync transfer timeout
@duhenglucky

This comment has been minimized.

Copy link
Contributor

@duhenglucky duhenglucky commented Oct 22, 2019

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.