Skip to content

ci: retry test-infra container initialization on transient failures#22405

Merged
Croway merged 1 commit intomainfrom
ci-fix/test-infra-container-retry
Apr 2, 2026
Merged

ci: retry test-infra container initialization on transient failures#22405
Croway merged 1 commit intomainfrom
ci-fix/test-infra-container-retry

Conversation

@Croway
Copy link
Copy Markdown
Contributor

@Croway Croway commented Apr 2, 2026

From time to time the gcr mirror we are using does not work as expected, for example getting 404 for existing images, this feature implement a retry with a jitter

Summary

  • Add retry with exponential backoff and jitter to TestServiceUtil.tryInitialize() for transient container errors
  • Retries on ContainerFetchException (image pull 404, registry cache eviction) and ContainerLaunchException (container start timeout, resource contention)
  • Non-container exceptions (test logic failures) fail immediately without retry
  • Walks the exception cause chain, so wrapped container exceptions are also caught
  • Configurable via system properties: camel.test.infra.container.retries (default 3), camel.test.infra.container.retry.delay.ms (default 5000)
  • All test-infra services benefit automatically since they all flow through tryInitialize()

Motivation: Transient Docker/registry failures (e.g. mirror.gcr.io cache eviction returning 404, Couchbase container startup timeouts) cause CI flakes across multiple components. A retry at the service initialization level handles these without per-component workarounds.

Test plan

  • 6 unit tests covering: retry on fetch/launch exceptions, immediate fail on non-container errors, max retries exhaustion, wrapped exceptions, first-attempt success

Add retry with exponential backoff and jitter to TestServiceUtil.tryInitialize()
for ContainerFetchException (image pull 404) and ContainerLaunchException
(container start timeout). Configurable via system properties
camel.test.infra.container.retries (default 3) and
camel.test.infra.container.retry.delay.ms (default 5000).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@Croway Croway requested review from apupier and orpiske April 2, 2026 11:54
Copy link
Copy Markdown
Contributor

@orpiske orpiske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@Croway Croway marked this pull request as ready for review April 2, 2026 12:05
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🧪 CI tested the following changed modules:

  • test-infra/camel-test-infra-common

ℹ️ Dependent modules were not tested because the total number of affected modules exceeded the threshold (50). Use the test-dependents label to force testing all dependents.

Comment on lines +58 to +59
long jitter = ThreadLocalRandom.current().nextLong(0, BASE_DELAY_MS / 2);
long delay = BASE_DELAY_MS * attempt + jitter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the interest of using a random value to change the delay?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, most probably there is no interest. But this is a pretty common approach.

@Croway Croway merged commit 82a05e0 into main Apr 2, 2026
5 checks passed
@davsclaus davsclaus deleted the ci-fix/test-infra-container-retry branch April 12, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants