ci: retry test-infra container initialization on transient failures#22405
Merged
ci: retry test-infra container initialization on transient failures#22405
Conversation
Add retry with exponential backoff and jitter to TestServiceUtil.tryInitialize() for ContainerFetchException (image pull 404) and ContainerLaunchException (container start timeout). Configurable via system properties camel.test.infra.container.retries (default 3) and camel.test.infra.container.retry.delay.ms (default 5000).
Contributor
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
Contributor
|
🧪 CI tested the following changed modules:
|
apupier
reviewed
Apr 2, 2026
Comment on lines
+58
to
+59
| long jitter = ThreadLocalRandom.current().nextLong(0, BASE_DELAY_MS / 2); | ||
| long delay = BASE_DELAY_MS * attempt + jitter; |
Contributor
There was a problem hiding this comment.
What is the interest of using a random value to change the delay?
Contributor
Author
There was a problem hiding this comment.
In this case, most probably there is no interest. But this is a pretty common approach.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
From time to time the gcr mirror we are using does not work as expected, for example getting 404 for existing images, this feature implement a retry with a jitter
Summary
TestServiceUtil.tryInitialize()for transient container errorsContainerFetchException(image pull 404, registry cache eviction) andContainerLaunchException(container start timeout, resource contention)camel.test.infra.container.retries(default 3),camel.test.infra.container.retry.delay.ms(default 5000)tryInitialize()Motivation: Transient Docker/registry failures (e.g.
mirror.gcr.iocache eviction returning 404, Couchbase container startup timeouts) cause CI flakes across multiple components. A retry at the service initialization level handles these without per-component workarounds.Test plan