Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] BROADCASTING DefaultMQPushConsumer can't fix illegal pull offset #7740

Closed
3 tasks done
redlsz opened this issue Jan 10, 2024 · 2 comments · Fixed by #7819
Closed
3 tasks done

[Bug] BROADCASTING DefaultMQPushConsumer can't fix illegal pull offset #7740

redlsz opened this issue Jan 10, 2024 · 2 comments · Fixed by #7819

Comments

@redlsz
Copy link
Contributor

redlsz commented Jan 10, 2024

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

centos7

RocketMQ version

4.9.x | 5.1.x

JDK Version

JDK 1.8

Describe the Bug

image

We encountered a issue that the BROADCASTING DefaultMQPushConsumer was unable to consume: As shown in the log, the client received the OFFSET_ILLEGAL response returned by the broker and kept trying to fix nextOffset but failed, resulting in the inability to consume messages.

我们遇到了一个 Push 消费者广播模式无法消费的问题:如日志所显示,客户端收到了服务端返回的 OFFSET_ILLEGAL,一直在尝试纠正 nextOffset 但始终不能成功,导致无法消费。

Steps to Reproduce

  1. Create a topic and produce certain messages.
  2. Wait for message to expire (queue minOffset >0).
  3. Create a subscription group and start a DefaultMQPushConsumer (MessageModel=BROADCASTING, ConsumeFromWhere=CONSUME_FROM_FIRST_OFFSET).
  4. Produce more messages.

What Did You Expect to See?

Normal message consumption.

What Did You See Instead?

The consumer is unable to consume any messages.

Additional Context

image image image

Possible cause we located after reviewing code:

When the consumer handles OFFSET_ILLEGAL error, it will first set the correct nextOffset returned by the broker as the consumerOffset (in memory), and then delete the PQ corresponding to the MQ. When the next time the rebalance is triggered, it is expected to set the new consumerOffset as the nextOffset, reinitialize PQ and start pulling messages. But at this time the consumerOffset is read from the store (ReadOffsetType=READ_FROM_STORE), so it will always get -1. As a result, the client will fall into an infinite loop of the above process.

消费端在处理 OFFSET_ILLEGAL 时,首先会将服务端返回的正确的 nextOffset 更新为消费位点(内存),接着删除 MQ 对应的 PQ。下次重平衡触发时,预期是将新的消费位点作为 nextOffset 重新初始化 PQ 并开始拉取消息。但此时是从持久化介质读取消费位点 (ReadOffsetType=READ_FROM_STORE),因此永远只会读到 -1。于是,客户端会陷入上述过程的无限循环中。

@humkum
Copy link
Contributor

humkum commented Jan 11, 2024

之前我也遇到过这个问题,可以提个 patch 修复下

@redlsz
Copy link
Contributor Author

redlsz commented Jan 12, 2024

之前我也遇到过这个问题,可以提个 patch 修复下

提了一个 patch,有时间可以帮忙一起 review 下~

RongtongJin pushed a commit that referenced this issue Jan 29, 2024
* Fix LocalFileOffsetStore persistAll and persist

* Fix LocalFileOffsetStore removeOffset

* Add test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants