-
Notifications
You must be signed in to change notification settings - Fork 11.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Get message from tiered storage return incorrect next begin offset #7363
Comments
lizhimins
added a commit
to lizhimins/rocketmq
that referenced
this issue
Sep 16, 2023
…ect next pull offset
lizhimins
added a commit
to lizhimins/rocketmq
that referenced
this issue
Sep 16, 2023
…ect next pull offset
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Before Creating the Bug Report
I found a bug, not just asking a question, which should be created in GitHub Discussions.
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
Linux 4.19
RocketMQ version
RocketMQ develop branch, 5.1.3
JDK Version
JDK11
Describe the Bug
When the pull and pop threads try to read data from tiered storage, they will call TieredMessageStore#getMessageAsync. Due to tiered storage has the behavior of caching data in batches during upload, the current implementation returns incorrect results. The next begin offset of the get message result cycles between the local cq max offset and the tiered storage cq commit offset, causing a large number of duplicate messages.
For example, if the storage offset of the local cq is 100-200, the offset of the tiered storage at this time may be 50-190, and the messages from 190-200 are waiting to be uploaded. At this time, it is not possible to read the data from 190-200 from the tiered storage, and the max offset of the pull result should also be 190 instead of 200.
I will submit a pull request to fix this issue.
当 pull 和 pop 线程尝试从分级存储读取数据时,会调用org.apache.rocketmq.tieredstore.TieredMessageStore#getMessageAsync,由于分级存储在上传时存在缓存数据攒批的行为,当前实现返回的结果不对,get message result 的 next begin offset 在本地 cq max offset 和分级存储的 cq commit offset 之间循环,造成大量消息重复。例如,本地 cq 的存储位点是 100-200,分级存储此时的位点可能是 50-190,其中 190-200 的消息正在等待上传。此时从分级存储是读取不到 190-200 这段数据的,pull result 的 max offset 也应该是 190 而非 200。我将提交一个 pr,来修复这个问题。
Steps to Reproduce
修改 deep storage level 为 force 强制从分级存储读取数据时,pop 消费会产生大量消息重复
What Did You Expect to See?
没有重复
What Did You See Instead?
Additional Context
No response
The text was updated successfully, but these errors were encountered: