Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Get message from tiered storage return incorrect next begin offset #7363

Closed
3 tasks done
lizhimins opened this issue Sep 16, 2023 · 0 comments · Fixed by #7365
Closed
3 tasks done

[Bug] Get message from tiered storage return incorrect next begin offset #7363

lizhimins opened this issue Sep 16, 2023 · 0 comments · Fixed by #7365

Comments

@lizhimins
Copy link
Member

lizhimins commented Sep 16, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Linux 4.19

RocketMQ version

RocketMQ develop branch, 5.1.3

JDK Version

JDK11

Describe the Bug

When the pull and pop threads try to read data from tiered storage, they will call TieredMessageStore#getMessageAsync. Due to tiered storage has the behavior of caching data in batches during upload, the current implementation returns incorrect results. The next begin offset of the get message result cycles between the local cq max offset and the tiered storage cq commit offset, causing a large number of duplicate messages.

For example, if the storage offset of the local cq is 100-200, the offset of the tiered storage at this time may be 50-190, and the messages from 190-200 are waiting to be uploaded. At this time, it is not possible to read the data from 190-200 from the tiered storage, and the max offset of the pull result should also be 190 instead of 200.

I will submit a pull request to fix this issue.

当 pull 和 pop 线程尝试从分级存储读取数据时,会调用org.apache.rocketmq.tieredstore.TieredMessageStore#getMessageAsync,由于分级存储在上传时存在缓存数据攒批的行为,当前实现返回的结果不对,get message result 的 next begin offset 在本地 cq max offset 和分级存储的 cq commit offset 之间循环,造成大量消息重复。例如,本地 cq 的存储位点是 100-200,分级存储此时的位点可能是 50-190,其中 190-200 的消息正在等待上传。此时从分级存储是读取不到 190-200 这段数据的,pull result 的 max offset 也应该是 190 而非 200。我将提交一个 pr,来修复这个问题。

Steps to Reproduce

修改 deep storage level 为 force 强制从分级存储读取数据时,pop 消费会产生大量消息重复

What Did You Expect to See?

没有重复

What Did You See Instead?

Additional Context

No response

@lizhimins lizhimins changed the title [Bug] [Bug] Fix get message from tiered storage return incorrect next begin offset Sep 16, 2023
@lizhimins lizhimins changed the title [Bug] Fix get message from tiered storage return incorrect next begin offset [Bug] Get message from tiered storage return incorrect next begin offset Sep 16, 2023
@lizhimins lizhimins changed the title [Bug] Get message from tiered storage return incorrect next begin offset [Bug] Get message from tiered storage return incorrect next pull offset Sep 16, 2023
@lizhimins lizhimins changed the title [Bug] Get message from tiered storage return incorrect next pull offset [Bug] Get message from tiered storage return incorrect next begin offset Sep 16, 2023
lizhimins added a commit to lizhimins/rocketmq that referenced this issue Sep 16, 2023
lizhimins added a commit to lizhimins/rocketmq that referenced this issue Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant