Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: add a concurrency limiter for receiving snapshots #17015

Merged
merged 5 commits into from
May 20, 2024

Conversation

hbisheng
Copy link
Member

@hbisheng hbisheng commented May 14, 2024

What is changed and how it works?

Issue Number: Ref #15972

What's Changed:

This commit starts to introduce a snapshot precheck mechanism to reduce
unnecessary snapshot drops and generations. The mechanism functions as follows: 

Before a leader sends a snapshot to a follower, it first sends a precheck
message. This message serves as a preliminary inquiry to the follower, seeking
confirmation of its readiness to receive a snapshot. Upon receiving the message,
the follower consults its concurrency limiter and returns a response to the
leader. The leader will only proceed to generate the snapshot after the precheck
is passed. 

A passed precheck means the leader has reserved a spot on the follower so the
subsequent snapshot send should succeed. The reservation has a TTL and the
leader is supposed to complete the snapshot generation and transmission within
the TTL timeframe.

Note that this commit implements the concurrency limiter without actually using
it. A follow-up commit will update the snapshot sending process and make use of
this concurrency limiter.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

This commit starts to introduce a snapshot precheck mechanism to reduce
unnecessary snapshot drops and generations. The mechanism functions as follows:

Before a leader sends a snapshot to a follower, it first sends a precheck
message. This message serves as a preliminary inquiry to the follower, seeking
confirmation of its readiness to receive a snapshot. Upon receiving the message,
the follower consults its concurrency limiter and returns a response to the
leader. The leader will only proceed to generate the snapshot after the precheck
is passed.

A passed precheck means the leader has reserved a spot on the follower so the
subsequent snapshot send should succeed. The reservation has a TTL and the
leader is supposed to complete the snapshot generation and transmission within
the TTL timeframe.

Note that this commit implements the concurrency limiter without actually using
it. A follow-up commit will update the snapshot sending process and make use of
this concurrency limiter.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
Copy link
Contributor

ti-chi-bot bot commented May 14, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Connor1996
  • overvenus

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels May 14, 2024
Copy link
Contributor

ti-chi-bot bot commented May 14, 2024

Hi @hbisheng. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 14, 2024
hbisheng added a commit to hbisheng/tikv that referenced this pull request May 15, 2024
This commit implements a snapshot precheck process that leverages the snapshot
concurrency limiter introduced in tikv#17015. The leader only proceeds to generate
the snapshot after it receives a precheck succeed message from the follower. The
precheck request and response are sent via the raft `ExtraMessage` where two new
message types are added in kvproto.

TODO:
1. Handle the case where the follower is on an old version without the snapshot
precheck API.
2. Throttle the snapshot precheck requests, if necessary.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@Connor1996 Connor1996 self-requested a review May 16, 2024 08:31
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added status/LGT1 Indicates that a PR has LGTM 1. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 16, 2024
@hbisheng
Copy link
Member Author

/cc @overvenus could you help review this?

The approach right now is similar to what we discussed last week except that the follower would not return a string token and would not ask for a token when receiving the snapshot. The follower merely maintains a counter where each counted object has a TTL. I think this is sufficient for slowing down snapshot requests from the leader. Also, the existing concurrent_recv_snap_limit is still in place to protect the receiver from receiving too many snapshots.

@overvenus overvenus self-requested a review May 20, 2024 05:08
Co-authored-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@overvenus
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels May 20, 2024
@overvenus
Copy link
Member

/test

Copy link
Contributor

ti-chi-bot bot commented May 20, 2024

@overvenus: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-unit-test

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@overvenus
Copy link
Member

/test all

…ed to 0 when limit is 0

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 20, 2024
@overvenus
Copy link
Member

/merge

Copy link
Contributor

ti-chi-bot bot commented May 20, 2024

@overvenus: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented May 20, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 801a02d

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 20, 2024
Copy link
Contributor

ti-chi-bot bot commented May 20, 2024

@hbisheng: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 34d8b38 into tikv:master May 20, 2024
7 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone May 20, 2024
@hbisheng hbisheng deleted the snap-recv-limiter branch May 21, 2024 04:24
hbisheng added a commit to hbisheng/tikv that referenced this pull request May 21, 2024
This commit implements a snapshot precheck process that leverages the snapshot
concurrency limiter introduced in tikv#17015. The leader only proceeds to generate
the snapshot after it receives a precheck succeed message from the follower. The
precheck request and response are sent via the raft `ExtraMessage` where two new
message types are added in kvproto.

TODO:
1. Handle the case where the follower is on an old version without the snapshot
precheck API.
2. Throttle the snapshot precheck requests, if necessary.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
hbisheng added a commit to hbisheng/tikv that referenced this pull request May 21, 2024
This commit implements a snapshot precheck process that leverages the snapshot
concurrency limiter introduced in tikv#17015. The leader only proceeds to generate
the snapshot after it receives a precheck succeed message from the follower. The
precheck request and response are sent via the raft `ExtraMessage` where two new
message types are added in kvproto.

TODO:
1. Handle the case where the follower is on an old version without the snapshot
precheck API.
2. Throttle the snapshot precheck requests, if necessary.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
ti-chi-bot bot added a commit that referenced this pull request Jun 12, 2024
…17019)

close #15972

This commit implements a snapshot precheck process that leverages the snapshot
concurrency limiter introduced in #17015. The leader only proceeds to generate
the snapshot after it receives a precheck succeed message from the follower. The
precheck request and response are sent via the raft `ExtraMessage` where two new
message types are added in kvproto.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants