Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocketmq need enhance stability when commitlog broken #1568

Closed
guyuezheng opened this issue Nov 4, 2019 · 7 comments
Labels
Milestone

Comments

@guyuezheng
Copy link
Contributor

@guyuezheng guyuezheng commented Nov 4, 2019

我们生产发现rocketMq 事务消息不停回调,导致事务消息不可用,丢失消息,目前版本是4.3.1,但是我发现后面版本没有修复

主要原因已经发现如下
事务消息回调,是先批量处理触发回调check,在更新RMQ_SYS_TRANS_HALF_TOPIC上的offset,一旦在批量处理中有异常,就会跳过 更新RMQ_SYS_TRANS_HALF_TOPIC上的offset,导致每次处理都报相同的错和回调同一批事务消息,不管他有没有在RMQ_SYS_TRANS_OP_HALF_TOPIC,

broker 回调check报错日志如下(一分钟一次):
java.lang.NullPointerException
at org.apache.rocketmq.broker.transaction.queue.TransactionalMessageBridge.getMessage(TransactionalMessageBridge.java:136)
at org.apache.rocketmq.broker.transaction.queue.TransactionalMessageBridge.getHalfMessage(TransactionalMessageBridge.java:108)
at org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl.pullHalfMsg(TransactionalMessageServiceImpl.java:379)
at org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl.getHalfMsg(TransactionalMessageServiceImpl.java:444)
at org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl.check(TransactionalMessageServiceImpl.java:164)
at org.apache.rocketmq.broker.transaction.TransactionalMessageCheckService.onWaitEnd(TransactionalMessageCheckService.java:76)
at org.apache.rocketmq.common.ServiceThread.waitForRunning(ServiceThread.java:121)
at org.apache.rocketmq.broker.transaction.TransactionalMessageCheckService.run(TransactionalMessageCheckService.java:65)
at java.lang.Thread.run(Thread.java:745)、
报错细节如下;
1)我们的底层消息字节有损坏,导致org.apache.rocketmq.common.message.MessageDecoder#decode(java.nio.ByteBuffer, boolean, boolean, boolean)方法转化抛异常,decodeMsgList直接吃了异常,返回null
2)org.apache.rocketmq.broker.transaction.queue.TransactionalMessageBridge#decodeMsgList,直接把null加入到返回的 List foundList
3)导致在org.apache.rocketmq.broker.transaction.queue.TransactionalMessageBridge#getMessage,的136行
获取消息存储时间报了空指针,
4)org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl#check方法在获取待确定回调时,报空指针,没有更新 transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
5)每次重复相同过程,导致不停重复回调check,事务消息不可用

@duhenglucky duhenglucky changed the title rocketmq 事务消息,底层消息字节损坏导致不停重复回调check(以重复1天),丢失消息 rocketmq need enhance stability when commitlog broken Nov 4, 2019
@duhenglucky duhenglucky added the question label Nov 4, 2019
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
 增加非空判断,防止空指针异常,导致循环事务 check
@guyuezheng

This comment has been minimized.

Copy link
Contributor Author

@guyuezheng guyuezheng commented Nov 4, 2019

各位大腿看看,这样修改,影不影响外部 offset,主要是跳过这个 异常的消息,让其不影响正常的事务回调check ,感谢🙏

guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
 增加非空判断,防止空指针异常,导致循环事务 check
@duhenglucky

This comment has been minimized.

Copy link
Contributor

@duhenglucky duhenglucky commented Nov 4, 2019

@guyuezheng will review this pr ASAP.

@guyuezheng

This comment has been minimized.

Copy link
Contributor Author

@guyuezheng guyuezheng commented Nov 4, 2019

@guyuezheng will review this pr ASAP.
大神,我就加一个if 非空判断
#1569
可以这样改不,我们生产有出现了

@guyuezheng

This comment has been minimized.

Copy link
Contributor Author

@guyuezheng guyuezheng commented Nov 4, 2019

第一次pull request,有一点激动呀

@guyuezheng

This comment has been minimized.

Copy link
Contributor Author

@guyuezheng guyuezheng commented Nov 4, 2019

@guyuezheng will review this pr ASAP.

有没有什么零时解决方案,我们生产还是在报

@YuShanMuGong

This comment has been minimized.

Copy link

@YuShanMuGong YuShanMuGong commented Nov 4, 2019

@guyuezheng will review this pr ASAP.

有没有什么零时解决方案,我们生产还是在报

你可以按照你的修改,自己打包用起来。

@duhenglucky

This comment has been minimized.

Copy link
Contributor

@duhenglucky duhenglucky commented Nov 4, 2019

@guyuezheng commitlog损坏了的话,很难进行修复,因为你后面的消息很有可能也是坏的,这个时候只能把备机切换为主机,其他具体的信息可以加我钉钉: juicehenry

guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
…broken 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式"

This reverts commit a4f6210
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 4, 2019
…加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
guyuezheng added a commit to guyuezheng/rocketmq that referenced this issue Nov 5, 2019
增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
原因,消息太长压缩后重复解压
@duhenglucky duhenglucky removed the question label Nov 5, 2019
@duhenglucky duhenglucky added this to the 4.6.0 milestone Nov 5, 2019
duhenglucky added a commit that referenced this issue Nov 8, 2019
[ISSUE #1568] rocketmq need enhance stability when commitlog broken
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
 增加非空判断,防止空指针异常,导致循环事务 check
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
 增加非空判断,防止空指针异常,导致循环事务 check
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
…broken 增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式"

This reverts commit a4f6210
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
…加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
areyouok added a commit to areyouok/rocketmq that referenced this issue Nov 15, 2019
增加非空判断,防止空指针异常,导致循环事务 check,增加test覆盖,修复编码格式
原因,消息太长压缩后重复解压
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.