Skip to content

Comments

Increase retries for Kinesis sharding integration tests#12255

Merged
zachjsh merged 1 commit intoapache:masterfrom
dkoepke:increase-kinesis-shard-retries
Feb 15, 2022
Merged

Increase retries for Kinesis sharding integration tests#12255
zachjsh merged 1 commit intoapache:masterfrom
dkoepke:increase-kinesis-shard-retries

Conversation

@dkoepke
Copy link
Contributor

@dkoepke dkoepke commented Feb 10, 2022

Description

This fixes intermittent, spurious failures that we've observed in the Kinesis sharding integration tests due to Kinesis taking longer than the code expected to start a sharding operation. The method that's changed is part of the integration test suite and only used by the test cases that we've seen are flaky.

Increase retries

Prior to this change, the tests expected a sharding operation to start in 9 seconds (30 retries * 300ms delay/retry). This change bumps the number of retries to 100, giving Kinesis 30 seconds to start the sharding.

We chose this value to ensure the existing tests always pass for us while still allowing them to fail reasonably fast if there's some integration logic error. This Amazon doesn't provide guidance on when an operation might start but does document that a "scaling action could take a few minutes to complete".

Clarify the condition

This PR also makes a small, clarifying change to the condition used to determine if sharding has started. Instead of checking if the number of shards has increased (which was technically correct even if the test is reducing the number of shards due to a Kinesis implementation detail), we now just check if the shard count has changed.


Key changed/added classes in this PR
  • org.apache.druid.testing.utils.KinesisAdminClient

This PR has:

  • been self-reviewed.

This fixes intermittent, spurious failures that we've observed in
the Kinesis sharding integration tests due to Kinesis taking
longer than the code expected to start a sharding operation. The
method that's changed is part of the integration test suite and
only used by the test cases that we've seen are flaky.

Prior to this change, the tests expected a sharding operation to
start in 9 seconds (30 retries * 300ms delay/retry). This change
bumps the number of retries to 100, giving Kinesis 30 seconds to
start the sharding.

This PR also makes a small, clarifying change to the condition
used to determine if sharding has started. Instead of checking if
the number of shards has increased (which was technically correct
even if the test is reducing the number of shards due to a Kinesis
implementation detail), we now just check if the shard count has
changed.
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zachjsh zachjsh merged commit 47153cd into apache:master Feb 15, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants