Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky deployment recovery test #10938

Merged
2 commits merged into from
Nov 8, 2022
Merged

Conversation

remcowesterhoud
Copy link
Contributor

Description

The MultiPartitionDeploymentRecoveryTest is flaky. After analyzing why this is we came to the conclusion that this test is currently not testing what is intended, which would be verifying that a deployment works after recovering from a failed deployment partition.

In order to fix this flakiness we have deleted this test and replaced with a different integration test. This test verifies that a deployment gets redistributed in the event of a deployment partition restart. In order to achieve this we pause one of the partition of the cluster. This is to make sure we don't process the deployment before we stop the deployment partition.

Once we've verified that the 3rd partition has finished it's deployment as expected we will resume the second partition and restart the deployment partition. After this the test verifies that the deployment is completed as expected.

Related issues

closes #10731

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

This test was flaky. After analyzing why this is we came to the conclusion that this test is currently not testing what is intended, which would be verifying that a deployment works after recovering from a failed deployment partition.

We still want a test for this, but it proves difficult to this in the engine tests. Instead, we will create a new integration which takes care of this.
Copy link
Member

@saig0 saig0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@remcowesterhoud Looks good. 👍

I have some minor suggestions. Please have a look.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

Test Results

   949 files   -     1     949 suites   - 1   1h 38m 50s ⏱️ + 5m 39s
7 649 tests  - 111  7 642 ✔️  - 88  7 💤 ±0  0 ±0 
7 847 runs   - 111  7 838 ✔️  - 88  9 💤 ±0  0 ±0 

Results for commit f43905e. ± Comparison against base commit b1bb9e1.

♻️ This comment has been updated with latest results.

This test verifies that a deployment gets redistributed in the event of a deployment partition restart. In order to achieve this we pause one of the partition of the cluster. This is to make sure we don't process the deployment before we stop the deployment partition.

Once we've verified that the 3rd partition has finished it's deployment as expected we will resume the second partition and restart the deployment partition. After this the test verifies that the deployment is completed as expected.
@remcowesterhoud remcowesterhoud force-pushed the 10731_deployment_recovery_test_flaky branch from 2bf1ce8 to f43905e Compare November 8, 2022 14:21
@remcowesterhoud
Copy link
Contributor Author

bors merge

@ghost
Copy link

ghost commented Nov 8, 2022

Build succeeded:

  • Test summary

@ghost ghost merged commit 12fddeb into main Nov 8, 2022
@ghost ghost deleted the 10731_deployment_recovery_test_flaky branch November 8, 2022 14:45
@backport-action
Copy link
Collaborator

Successfully created backport PR #10942 for stable/8.0.

@backport-action
Copy link
Collaborator

Successfully created backport PR #10943 for stable/8.1.

ghost pushed a commit that referenced this pull request Nov 8, 2022
10943: [Backport stable/8.1] Fix flaky deployment recovery test r=remcowesterhoud a=backport-action

# Description
Backport of #10938 to `stable/8.1`.

relates to #10731

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
@remcowesterhoud remcowesterhoud added the version:8.1.4 Marks an issue as being completely or in parts released in 8.1.4 label Nov 22, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:8.1.4 Marks an issue as being completely or in parts released in 8.1.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiPartitionDeploymentRecoveryTest is flaky
3 participants