Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Lock granularity issue causing LMQ message loss #7511

Closed
3 tasks done
lzkisok opened this issue Oct 27, 2023 · 2 comments · Fixed by #7525
Closed
3 tasks done

[Bug] Lock granularity issue causing LMQ message loss #7511

lzkisok opened this issue Oct 27, 2023 · 2 comments · Fixed by #7525
Labels

Comments

@lzkisok
Copy link

lzkisok commented Oct 27, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Centos 7

RocketMQ version

RMQ 5.1.4

JDK Version

JDk 1.8

Describe the Bug

1,当并发发送两条MQ消息,设置相同的INNER_MULTI_DISPATCH,相当于DefaultMQProducer并发发送了两条同一个LMQ的消息(消息的LMQ相同,内容不同)。
2,当这两条消息被发送到两个的MessageQuene 时,消息的queneId不同,topicQueueKey是topic-queneId,topicQueueLock.lock(topicQueueKey) 是锁各自queneId的,所以两条消息是并发处理的。
3,在两条消息并发处理时,defaultMessageStore.assignOffset(msg) 会同时从QueueOffsetOperator拿到LMQ的queneOffste,这时并发拿到的queneOffst相同,导致两条LMQ消息的INNER_MULTI_QUEUE_OFFSET 相同,LMQ 消息的offset重复
4,LMQ消息的INNER_MULTI_QUEUE_OFFSET相同时,RocketMQ-mqtt无法正常拉取和消费。

Steps to Reproduce

1,DefaultMQProducer并发发送两条相同LMQ的消息(消息的LMQ相同,内容不同)如:
String secondTopic = "/r7";
setLmq(msg, new HashSet<>(Arrays.asList(TopicUtils.wrapLmq(firstTopic, secondTopic))));
2. 观察RMQ 日志,确定两个消息的MessageQuene不同。
3. mqtt客户端消费,只消费到一条。

What Did You Expect to See?

在消息并发,且MessageQuene不同时,能正确设置LMQ消息的INNER_MULTI_QUEUE_OFFSET

What Did You See Instead?

两条LMQ消息的INNER_MULTI_QUEUE_OFFSET 相同,LMQ 消息的offset重复

Additional Context

image
如上图:两条消息的INNER_MULTI_QUEUE_OFFSET相同
image
如上图:在DefaultMQProducer并发发送两条相同LMQ的消息(消息的LMQ相同,内容不同),这里的锁,对LMQ不适用。
发送的消息的MessageQuene不同时,defaultMessageStore.assignOffset(msg)是并发执行的,给两条LMQ消息设置了相同的INNER_MULTI_QUEUE_OFFSET

@RongtongJin RongtongJin changed the title [Bug] CommitLog topicQueueLock.lock(topicQueueKey) 锁粒度问题,导致defaultMessageStore.assignOffset(msg) 中 给LMQ消息分配的INNER_MULTI_QUEUE_OFFSET重复,导致LMQ消费时,LMQ消息丢失 [Bug] Lock granularity issue causing LMQ message loss Oct 30, 2023
@RongtongJin
Copy link
Contributor

There is indeed such a bug, and I also found the same issue when writing this PR.

@lzkisok
Copy link
Author

lzkisok commented Oct 30, 2023

我们看代码找到了一个暂时的解决方式,将同一个LMQ的消息,始终发送到同一个MessageQuene中,让topic+queneid的锁能生效,如图:
image
实际使用过程中,可以LMQ可以根据queneId分片,让锁生效

RongtongJin pushed a commit that referenced this issue Nov 8, 2023
* bug fix: assignOffset and increaseOffset in LMQ has concurrency issues in topicQueueLock, should be in putMessageLock

* fix MultiDispatchTest

* fix MultiDispatchTest

* fix unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants