Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix and refactor handle commit exception in tiered storage #7280

Closed
3 tasks done
lizhimins opened this issue Aug 29, 2023 · 0 comments · Fixed by #7281
Closed
3 tasks done

[Bug] Fix and refactor handle commit exception in tiered storage #7280

lizhimins opened this issue Aug 29, 2023 · 0 comments · Fixed by #7281

Comments

@lizhimins
Copy link
Member

lizhimins commented Aug 29, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.
  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Linux 4.19

RocketMQ version

branch: develop latest

JDK Version

JDK11

Describe the Bug

In the tiered storage upload process, there are obvious issues with error handling for failed uploads:

  • When cq fails to upload, data correction needs to be done based on the current storage position / file length.
  • Retrieving positions from distributed storage can still fail, with -1 and 0 used to differentiate.
    -1 indicates failure due to network or other reasons, and 0 indicates that the file does not exist or has a length of zero.
  • The control flow of the buffer list during upload is overly complicated, and a new approach is simpler and more reliable.
  • Adding position comparison logs to track failed requests.

分级存储上传流程中,对于上传失败的错误处理存在明显问题:

  • 当 cq 上传失败时,需要根据此时存储的真实位点进行数据订正。
  • 从分布式存储获取位点时仍然可能失败,使用 -1 和 0 进行区分。
    -1 表示因为网络或其他原因失败,0 表示文件不存在或者长度为 0。
  • 上传时 buffer list 的控制流程过于复杂,新的方案更简单可靠。
  • 添加位点对比日志,用于跟踪失败的请求。

Steps to Reproduce

As long as the cq upload fails, the build process will have problems.

What Did You Expect to See?

Construct cq and commitlog correctly.

What Did You See Instead?

None

Additional Context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant