Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: increase await assertion timeout in OpensearchExporterIT IndexSettingsTest to overcome flaky test runs #17282

Conversation

mustafadagher
Copy link
Contributor

Description

Increase await assertion timeout to overcome flaky test runs

Policy change is an asynchronous background process in opensearch, that's why we use awaits before asserts to reduce flaky results

Related issues

closes #17279

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

@mustafadagher mustafadagher added kind/flake Categorizes issue or PR as related to a flaky test component/zeebe Related to the Zeebe component/team labels Apr 3, 2024
@mustafadagher mustafadagher self-assigned this Apr 3, 2024
policy change is an asynchronous background process in opensearch
that's why we use awaits before asserts to reduce flaky results
@mustafadagher mustafadagher force-pushed the 17279-flaky-test-opensearchexporteritshouldaddindexlifecyclesettingstoexistingindicesonrerunwhenretentionisenabled branch from f1b9979 to 3197e92 Compare April 3, 2024 17:13
Copy link
Contributor

@megglos megglos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mustafadagher
Copy link
Contributor Author

While policy change is indeed asynchronous, and this change is trying to increase the timeout to avoid flakiness in test runs. this specific failure here, and the one that triggered that PR should never have happened.

It's failing at this line, in the beginning of the test.
At this point above we are trying to create an index with retention initially NOT enabled. This assert is just making sure that this index has no lifecycle policy. which should always be the case. we are not adding or removing a policy at this point, so waiting can't help this failure.

I've been trying to reproduce this locally to see what might have been the problem, but unfortunately, I can't. it never fails locally!

CC: @megglos

@megglos
Copy link
Contributor

megglos commented Apr 4, 2024

While policy change is indeed asynchronous, and this change is trying to increase the timeout to avoid flakiness in test runs. this specific failure here, and the one that triggered that PR should never have happened.

It's failing at this line, in the beginning of the test. At this point above we are trying to create an index with retention initially NOT enabled. This assert is just making sure that this index has no lifecycle policy. which should always be the case. we are not adding or removing a policy at this point, so waiting can't help this failure.

I've been trying to reproduce this locally to see what might have been the problem, but unfortunately, I can't. it never fails locally!

CC: @megglos

hmm could it be that there are template leftovers from a previous test? meaning maybe we need to extend the cleanup to not just cleanup indices but also potentially existing templates

Edit: however that should be overridden by the previous export

@mustafadagher mustafadagher force-pushed the 17279-flaky-test-opensearchexporteritshouldaddindexlifecyclesettingstoexistingindicesonrerunwhenretentionisenabled branch from deb3e6d to 16cfe26 Compare April 4, 2024 08:49
@mustafadagher
Copy link
Contributor Author

While policy change is indeed asynchronous, and this change is trying to increase the timeout to avoid flakiness in test runs. this specific failure here, and the one that triggered that PR should never have happened.
It's failing at this line, in the beginning of the test. At this point above we are trying to create an index with retention initially NOT enabled. This assert is just making sure that this index has no lifecycle policy. which should always be the case. we are not adding or removing a policy at this point, so waiting can't help this failure.
I've been trying to reproduce this locally to see what might have been the problem, but unfortunately, I can't. it never fails locally!
CC: @megglos

hmm could it be that there are template leftovers from a previous test? meaning maybe we need to extend the cleanup to not just cleanup indices but also potentially existing templates

Edit: however that should be overridden by the previous export

I made one try at expanding wildcards to all (open and closed) indices when deleting indices before each test. I'm not sure this is the problem though. it's just a guess.

This is also one other API where opensearch diverges from elasticsearch.

@mustafadagher mustafadagher added this pull request to the merge queue Apr 4, 2024
@mustafadagher
Copy link
Contributor Author

should this be backported to 8.4, 8.5 too?
I don't see the need, except for consistency.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 4, 2024
@mustafadagher mustafadagher added this pull request to the merge queue Apr 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 4, 2024
@mustafadagher mustafadagher added this pull request to the merge queue Apr 4, 2024
Merged via the queue into main with commit 164d470 Apr 4, 2024
31 checks passed
@mustafadagher mustafadagher deleted the 17279-flaky-test-opensearchexporteritshouldaddindexlifecyclesettingstoexistingindicesonrerunwhenretentionisenabled branch April 4, 2024 10:59
lenaschoenburg added a commit that referenced this pull request Apr 4, 2024
…ensearchExporterIT IndexSettingsTest to overcome flaky test runs (#17306)

# Description
Backport of #17282 to `release 8.5.0`.

relates to #17279
@mustafadagher mustafadagher added the backport stable/8.4 Backport a pull request to 8.4.x label Apr 4, 2024
@mustafadagher
Copy link
Contributor Author

/backport stable/8.4

@backport-action
Copy link
Collaborator

github-merge-queue bot pushed a commit that referenced this pull request Apr 4, 2024
…earchExporterIT IndexSettingsTest to overcome flaky test runs (#17309)

# Description
Backport of #17282 to `stable/8.4`.

relates to #17279
original author: @mustafadagher
@mustafadagher mustafadagher added the backport stable/8.5 Backport a pull request to stable/8.5 label Apr 4, 2024
@camunda camunda deleted a comment from backport-action Apr 4, 2024
@backport-action
Copy link
Collaborator

Backport failed for stable/8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin stable/8.4
git worktree add -d .worktree/backport-17282-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-17282-to-stable/8.4
git switch --create backport-17282-to-stable/8.4
git cherry-pick -x 3197e92ced339fce6266acb3ae790383969d2a0b 16cfe26acc27fcee899525d5ddd569b6a0d9d64c

@backport-action
Copy link
Collaborator

github-merge-queue bot pushed a commit that referenced this pull request Apr 5, 2024
…earchExporterIT IndexSettingsTest to overcome flaky test runs (#17312)

# Description
Backport of #17282 to `stable/8.5`.

relates to #17279
original author: @mustafadagher
@Zelldon Zelldon added version:8.4.7 Marks an issue as being completely or in parts released in 8.4.7 version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1 version:8.6.0-alpha1 labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport stable/8.4 Backport a pull request to 8.4.x backport stable/8.5 Backport a pull request to stable/8.5 component/zeebe Related to the Zeebe component/team kind/flake Categorizes issue or PR as related to a flaky test version:8.4.7 Marks an issue as being completely or in parts released in 8.4.7 version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1 version:8.6.0-alpha1 Label that represents issues released on verions 8.6.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test OpensearchExporterIT.shouldAddIndexLifecycleSettingsToExistingIndicesOnRerunWhenRetentionIsEnabled
4 participants