Do not reset snapshot replication when a single request timedout #16971

deepthidevaki · 2024-03-19T08:06:01Z

Description

Previously, when replicating a snapshot chunk timedout, leader restarts snapshot replication from the first chunk. This can result in a never ending retry loop if one of the other chunk hits a timeout frequently. This can happen more frequent when network latency is high and/or snapshot is large. To ensure that snapshot replication can complete in such cases,

Leader do not reset snapshot replication, instead re-send the same chunk if it encounters a timeout
Follower accepts the chunk even if it is a duplicate.

Tests are added where we can inject timeout and observer the behavior. RaftRandomizedTest also covers snapshot replication.

Related issues

closes #11496

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

Retry sending the same chunk, otherwise large snapshots can end up in a retry loop just because a single request timed out.

If a leader observed a time out for an InstallRequest, it will resend the same chunk. If the follower had already processed the previous request successfully, it should simply accept the request instead of rejecting it with an out of order error. If it was actually out of order, it will be identified during commit as the checksum would not match.

When the request is retried, the reader might be already at the end. So it will return before seeking to the chunk that should be re-send.

Reader will be pointing to the next chunk, when we want to retry the first chunk. So we must reset the reader, so that it can be re-read.

npepinpe

🚀

npepinpe · 2024-03-20T12:37:27Z

zeebe/atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderAppender.java

-    member.setNextSnapshotChunk(null);
+    final boolean isTimeout =
+        error instanceof TimeoutException
+            || (error != null && error.getCause() instanceof TimeoutException);


❓ What were the case where it would be the cause? Do you think it could also be more deeply nested?

error is usually CompletionException so we always have to get the cause. I don't think it can be more deeply nested.

zeebe/atomix/cluster/src/main/java/io/atomix/raft/roles/PassiveRole.java

zeebe/atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderAppender.java

deepthidevaki · 2024-06-03T13:03:24Z

This was not backported to older versions. But there was an incident today for which this would have helped. I don't see any reason not to backport this, although original issue was tagged as an improvement.

I'm backporting this. /cc @npepinpe

deepthidevaki · 2024-06-03T13:03:58Z

/backport

backport-action · 2024-06-03T13:04:22Z

Created backport PR for stable/8.2:

[Backport stable/8.2] Do not reset snapshot replication when a single request timedout #19028 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-16971-to-stable/8.2
git worktree add --checkout .worktree/backport-16971-to-stable/8.2 backport-16971-to-stable/8.2
cd .worktree/backport-16971-to-stable/8.2
git reset --hard HEAD^
git cherry-pick -x 281eb95c0f593c2cc4ffecc873f3738b895bd09b d32d65050d5c8a498ec6a60ddfa5b8dc3eb3ce89 be3c7d703e8ac9c84c3bf2ada4efcde1e39ba5da c16f3c8c9f66fdae85ca800fac4b978546d4f566 13e8d923c8892c03a155615eebc573bc96f079e4 fc7828a67a8cbe5b225abc83eb1d234c8502947e 18e6598d84e3944d10538ec6fe2813900ba7fe71 05ed2dce937338e936387fa6345993b78b14498b dd76387d68891fe97266a671cb4a9bc247610b5e
git push --force-with-lease

backport-action · 2024-06-03T13:04:26Z

Created backport PR for stable/8.3:

[Backport stable/8.3] Do not reset snapshot replication when a single request timedout #19029 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-16971-to-stable/8.3
git worktree add --checkout .worktree/backport-16971-to-stable/8.3 backport-16971-to-stable/8.3
cd .worktree/backport-16971-to-stable/8.3
git reset --hard HEAD^
git cherry-pick -x 281eb95c0f593c2cc4ffecc873f3738b895bd09b d32d65050d5c8a498ec6a60ddfa5b8dc3eb3ce89 be3c7d703e8ac9c84c3bf2ada4efcde1e39ba5da c16f3c8c9f66fdae85ca800fac4b978546d4f566 13e8d923c8892c03a155615eebc573bc96f079e4 fc7828a67a8cbe5b225abc83eb1d234c8502947e 18e6598d84e3944d10538ec6fe2813900ba7fe71 05ed2dce937338e936387fa6345993b78b14498b dd76387d68891fe97266a671cb4a9bc247610b5e
git push --force-with-lease

backport-action · 2024-06-03T13:04:31Z

Created backport PR for stable/8.4:

[Backport stable/8.4] Do not reset snapshot replication when a single request timedout #19030 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-16971-to-stable/8.4
git worktree add --checkout .worktree/backport-16971-to-stable/8.4 backport-16971-to-stable/8.4
cd .worktree/backport-16971-to-stable/8.4
git reset --hard HEAD^
git cherry-pick -x 281eb95c0f593c2cc4ffecc873f3738b895bd09b d32d65050d5c8a498ec6a60ddfa5b8dc3eb3ce89 be3c7d703e8ac9c84c3bf2ada4efcde1e39ba5da c16f3c8c9f66fdae85ca800fac4b978546d4f566 13e8d923c8892c03a155615eebc573bc96f079e4 fc7828a67a8cbe5b225abc83eb1d234c8502947e 18e6598d84e3944d10538ec6fe2813900ba7fe71 05ed2dce937338e936387fa6345993b78b14498b dd76387d68891fe97266a671cb4a9bc247610b5e
git push --force-with-lease

@deepthidevaki

… request timedout (#19030) # Description Backport of #16971 to `stable/8.4`. relates to #11496 original author: @deepthidevaki

@backport-action

…lication when a single request timedout (#19038) # Description Backport of #19030 to `stable/8.3`. relates to #16971 #11496 original author: @backport-action

@backport-action

…lication when a single request timedout (#19037) # Description Backport of #19030 to `stable/8.2`. relates to #16971 #11496 original author: @backport-action

github-actions bot added the component/zeebe Related to the Zeebe component/team label Mar 19, 2024

deepthidevaki added 2 commits March 19, 2024 09:10

test: verify install requests retry behavior on timeout

281eb95

feat: do not reset snapshot replication when one request timeout

d32d650

Retry sending the same chunk, otherwise large snapshots can end up in a retry loop just because a single request timed out.

deepthidevaki force-pushed the dd-11496-snapshot-replicaiton-timeout branch from 0ffcf25 to 4670203 Compare March 19, 2024 08:10

deepthidevaki added 5 commits March 19, 2024 12:12

test: verify retry when install request is rejected by the follower

c16f3c8

fix: verify reader has next after seek

13e8d92

When the request is retried, the reader might be already at the end. So it will return before seeking to the chunk that should be re-send.

feat: allow SnapshotReader to reset to initial chunk

fc7828a

fix: first snapshot chunk can be retried if it times out

18e6598

Reader will be pointing to the next chunk, when we want to retry the first chunk. So we must reset the reader, so that it can be re-read.

deepthidevaki force-pushed the dd-11496-snapshot-replicaiton-timeout branch from 040e6cd to 18e6598 Compare March 19, 2024 11:12

deepthidevaki marked this pull request as ready for review March 19, 2024 12:12

deepthidevaki requested a review from npepinpe March 19, 2024 12:12

npepinpe approved these changes Mar 20, 2024

View reviewed changes

deepthidevaki added 2 commits March 20, 2024 14:20

refactor: remove unused method

05ed2dc

refactor: document why reader is reset when building install request

dd76387

deepthidevaki enabled auto-merge March 20, 2024 13:27

deepthidevaki added this pull request to the merge queue Mar 20, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2024

deepthidevaki added this pull request to the merge queue Mar 20, 2024

Merged via the queue into main with commit 9c34814 Mar 20, 2024
31 checks passed

deepthidevaki deleted the dd-11496-snapshot-replicaiton-timeout branch March 20, 2024 14:53

deepthidevaki added backport stable/8.4 Backport a pull request to 8.4.x backport stable/8.3 Backport a pull request to 8.3.x backport stable/8.2 Backport a pull request to 8.2.x labels Jun 3, 2024

backport-action mentioned this pull request Jun 3, 2024

[Backport stable/8.2] Do not reset snapshot replication when a single request timedout #19028

Closed

backport-action mentioned this pull request Jun 3, 2024

[Backport stable/8.3] Do not reset snapshot replication when a single request timedout #19029

Closed

backport-action mentioned this pull request Jun 3, 2024

[Backport stable/8.4] Do not reset snapshot replication when a single request timedout #19030

Merged

This was referenced Jun 3, 2024

[Backport stable/8.2] [Backport stable/8.4] Do not reset snapshot replication when a single request timedout #19037

Merged

[Backport stable/8.3] [Backport stable/8.4] Do not reset snapshot replication when a single request timedout #19038

Merged

github-actions bot added version:8.3.12 Label that represents issues released on version 8.3.12 version:8.4.8 Label that represents issues released on version 8.4.8 labels Jun 4, 2024

github-actions bot added the version:8.2.29 Marks an issue as being completely or in parts released in 8.2.29 label Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not reset snapshot replication when a single request timedout #16971

Do not reset snapshot replication when a single request timedout #16971

deepthidevaki commented Mar 19, 2024 •

edited

Loading

npepinpe left a comment

npepinpe Mar 20, 2024

deepthidevaki Mar 20, 2024

deepthidevaki commented Jun 3, 2024

deepthidevaki commented Jun 3, 2024

backport-action commented Jun 3, 2024

backport-action commented Jun 3, 2024

backport-action commented Jun 3, 2024

Do not reset snapshot replication when a single request timedout #16971

Do not reset snapshot replication when a single request timedout #16971

Conversation

deepthidevaki commented Mar 19, 2024 • edited Loading

Description

Related issues

Definition of Done

npepinpe left a comment

Choose a reason for hiding this comment

npepinpe Mar 20, 2024

Choose a reason for hiding this comment

deepthidevaki Mar 20, 2024

Choose a reason for hiding this comment

deepthidevaki commented Jun 3, 2024

deepthidevaki commented Jun 3, 2024

backport-action commented Jun 3, 2024

backport-action commented Jun 3, 2024

backport-action commented Jun 3, 2024

deepthidevaki commented Mar 19, 2024 •

edited

Loading