Skip to content

[opt](recycler) Speed up recycling txn info by introducing parallelism #50037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented Apr 14, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

This PR changes the recycle txn label function to a concurrent recycle implementation.

Performance Comparison: Using release mode, connecting to FDB, and recycling 10,000 transaction-related key-value pairs.

cost(ms)
Before this PR 15229
After this PR (1 concurrent) 18764
After this PR (2 concurrent) 8844
After this PR (5 concurrent) 3890
After this PR (10 concurrent) 2165
After this PR (20 concurrent) 1248

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@gavinchou gavinchou added the usercase Important user case type label label Apr 21, 2025
@gavinchou gavinchou changed the title [Enhancement](recycler) Support concurrent recycle txn kv [opt](recycler) Speed up recycling txn info by introduce parallelism Apr 21, 2025
@gavinchou gavinchou changed the title [opt](recycler) Speed up recycling txn info by introduce parallelism [opt](recycler) Speed up recycling txn info by introducing parallelism Apr 21, 2025
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@Yukang-Lian
Copy link
Collaborator Author

run cloudut

@Yukang-Lian Yukang-Lian force-pushed the Support-Concurrent-Recycle-Txn-KV branch from f62fc70 to 22128e2 Compare April 22, 2025 11:21
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.98% (1097/1322)
Line Coverage: 65.76% (18342/27892)
Region Coverage: 65.12% (9065/13921)
Branch Coverage: 55.20% (4892/8862)
Coverage Report: http://coverage.selectdb-in.cc/coverage/767af3873ba852b4066ba29c1f91cddd1d97e441_767af3873ba852b4066ba29c1f91cddd1d97e441_cloud/report/index.html

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 22, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@gavinchou gavinchou merged commit 3e10093 into apache:master Apr 23, 2025
26 of 27 checks passed
dataroaring pushed a commit that referenced this pull request Apr 27, 2025
…g parallelism #50037 (#50428)

Cherry-picked from #50037

Co-authored-by: abmdocrt <lianyukang@selectdb.com>
englefly pushed a commit to englefly/incubator-doris that referenced this pull request May 6, 2025
gavinchou pushed a commit that referenced this pull request May 12, 2025
…p failures and 'key not found' errors (#50766)

Related PR: #50037 

If an error occurs during transaction label recycling, the vector
recording keys cannot be cleaned up. Keys that were already cleaned up
in the previous scan and recycle cycle will be carried over to the next
scan and recycle cycle, causing a large number of 'key not found'
errors.
github-actions bot pushed a commit that referenced this pull request May 12, 2025
…p failures and 'key not found' errors (#50766)

Related PR: #50037 

If an error occurs during transaction label recycling, the vector
recording keys cannot be cleaned up. Keys that were already cleaned up
in the previous scan and recycle cycle will be carried over to the next
scan and recycle cycle, causing a large number of 'key not found'
errors.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…p failures and 'key not found' errors (apache#50766)

Related PR: apache#50037 

If an error occurs during transaction label recycling, the vector
recording keys cannot be cleaned up. Keys that were already cleaned up
in the previous scan and recycle cycle will be carried over to the next
scan and recycle cycle, causing a large number of 'key not found'
errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants