Skip to content

test-infra: Fix Docker container name collisions in parallel execution#22287

Open
gnodet wants to merge 4 commits intoapache:mainfrom
gnodet:fix-testinfra-container-name-collision
Open

test-infra: Fix Docker container name collisions in parallel execution#22287
gnodet wants to merge 4 commits intoapache:mainfrom
gnodet:fix-testinfra-container-name-collision

Conversation

@gnodet
Copy link
Copy Markdown
Contributor

@gnodet gnodet commented Mar 26, 2026

Summary

  • Append JVM PID + instance counter to Docker container names generated by ContainerEnvironmentUtil.containerName() to prevent 409 Conflict errors in parallel test execution
  • Remove hardcoded container name in RocketMQNameserverContainer that bypassed ContainerEnvironmentUtil

Problem

Docker container name collisions occur in two scenarios:

  1. Cross-JVM (mvnd parallel builds): When multiple modules sharing the same test infra service (e.g., camel-elasticsearch and camel-elasticsearch-rest-client) run in separate JVMs, both try to create a container with the same name (e.g., camel-elasticsearch). The second JVM fails with 409 Conflict. Seen in chore(deps): Bump org.elasticsearch.client:elasticsearch-rest-client-sniffer from 9.3.1 to 9.3.2 #22237.

  2. Within-JVM (parallel test classes): When camel.failsafe.parallel=true, JUnit 5 runs test classes concurrently (mode.classes.default=concurrent). Non-singleton services each create their own container with the same name, causing the same collision.

Fix

Generate instance-unique container names: camel-{alias}-{pid}-{counter}

  • PID: unique per JVM process (cross-JVM safety)
  • AtomicInteger counter: unique per service instance within a JVM (within-JVM safety)
  • Singletons unaffected: SingletonService.addToStore() uses computeIfAbsent, so the wrapped service (and its container name) is created exactly once

Additionally, removed withCreateContainerCmdModifier(cmd -> cmd.withName("nameserver")) from RocketMQNameserverContainer — this hardcoded name bypassed ContainerEnvironmentUtil and would collide in parallel execution. The network alias "nameserver" (which is network-scoped) is sufficient for broker-to-nameserver communication.

Safe because

  • Container names are only used for cmd.withName() (Docker identification), never for network aliases or inter-container communication — verified across all ~30 services
  • Camel JBang infra run/stop/ps manages services by JVM PID files, not Docker container names
  • Also improves crash recovery: a zombie container from a crashed JVM no longer blocks the next run

Test plan

  • mvn install -B -pl test-infra/camel-test-infra-common,test-infra/camel-test-infra-rocketmq -DskipTests compiles successfully
  • CI passes

🤖 Generated with Claude Code

Claude Code on behalf of Guillaume Nodet

Append the JVM PID to generated Docker container names to avoid
409 Conflict errors when multiple modules sharing the same test
infrastructure (e.g., camel-elasticsearch and camel-elasticsearch-rest-client)
run integration tests in parallel via mvnd.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@gnodet gnodet requested review from Croway and apupier March 26, 2026 23:46
}
// Append PID to avoid Docker container name conflicts when multiple
// modules run tests in parallel (e.g., via mvnd with multiple threads)
name += "-" + ProcessHandle.current().pid();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use the thread id instead of the Process Id?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict happens between separate JVM processes (mvnd spawns each module in its own daemon JVM), not between threads within the same JVM. Thread ID wouldn't help because:

  1. Thread IDs are not unique across processes — e.g., both JVMs' main threads could have thread ID 1
  2. Within a single JVM, the SingletonService pattern already ensures only one container is created per service, shared across all test classes/threads
  3. Thread ID can vary depending on which thread first triggers the container creation, making the name non-deterministic within the same JVM — which could actually cause collisions where none exist today

PID is unique per JVM on the host and stable for the entire JVM lifetime, which is exactly the granularity we need.

Claude Code on behalf of Guillaume Nodet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify further: the singleton is per-JVM, not cross-JVM.

Within a JVM (e.g., all test classes in camel-elasticsearch):

  • SingletonServiceHolder.INSTANCE is a static field — one instance per classloader
  • store.computeIfAbsent("elastic", ...) in JUnit's root ExtensionContext.Store ensures initialize() is called exactly once
  • All test classes share the same container — same PID, same container name, no conflict

Across JVMs (e.g., camel-elasticsearch vs camel-elasticsearch-rest-client in separate mvnd daemons):

  • Each JVM has its own static holder, its own JUnit root context
  • Each creates its own Docker container independently
  • The containers were never shared — without the PID fix, the second JVM simply crashed with a 409 Conflict

So there was no actual container sharing happening across JVMs before either — it was just failing. The PID suffix makes it work by giving each JVM its own uniquely-named container.

Claude Code on behalf of Guillaume Nodet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting: the tests are designed to share a single container within a module. Test isolation is not achieved through container isolation but through data partitioning — each test method gets a unique Elasticsearch index prefix via testNameExtension.getCurrentTestName() (see ElasticsearchTestSupport.createPrefix()). This avoids the overhead of restarting Elasticsearch for each test class.

The two modules (camel-elasticsearch and camel-elasticsearch-rest-client) were never sharing a container either — they run in separate surefire/failsafe forks (separate JVMs), each with its own SingletonServiceHolder.INSTANCE. The second JVM was simply crashing trying to create a container with an already-taken Docker name. The PID suffix makes both succeed with their own container, which is what was always intended.

Claude Code on behalf of Guillaume Nodet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the above analysis is not specific to Elasticsearch — ContainerEnvironmentUtil.containerName() is used by all ~30 test infra services (Kafka, MongoDB, Redis, etc.). The same collision would happen whenever two modules sharing the same test infra service are tested in parallel in separate JVMs. The PID suffix fix applies globally to all of them.

Claude Code on behalf of Guillaume Nodet

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then the comment is a bit misleading as it mentions multiple threads

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right that PID alone wasn't sufficient — there is indeed a within-JVM gap when test classes run in parallel. I've updated the fix to use PID + AtomicInteger counter, which covers both dimensions:

  • PID: cross-JVM uniqueness (mvnd parallel builds)
  • Counter: within-JVM uniqueness (parallel test classes)

Thread ID specifically wouldn't have worked (not unique across JVMs, and two services created on the same thread would still collide), but your instinct was correct. Thanks for pushing on this!

Claude Code on behalf of Guillaume Nodet

@gnodet
Copy link
Copy Markdown
Contributor Author

gnodet commented Mar 27, 2026

Follow-up: making integration tests more robust for parallel execution

This PR fixes the cross-JVM container name collision (e.g., mvnd testing two modules in parallel). However, there's a broader gap: within-JVM parallel execution of test classes using non-singleton services is only safe because camel.failsafe.parallel defaults to false.

If parallel integration test execution is ever enabled, non-singleton services sharing the same @InfraService(serviceAlias = ...) would collide on the Docker container name within the same JVM — and the PID suffix doesn't help there since all threads share the same PID.

JUnit 5's @ResourceLock mechanism could address this by declaring the Docker container name as a shared resource, preventing concurrent test classes from conflicting. We should consider a follow-up to:

  • Have test infra services declare @ResourceLock on the container name
  • Or migrate non-singleton services to the singleton pattern where appropriate

This would make the test infrastructure safe for parallel execution without relying on convention.

Claude Code on behalf of Guillaume Nodet

gnodet and others added 2 commits March 27, 2026 11:05
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AtomicInteger counter alongside PID for within-JVM uniqueness,
  making container names safe for parallel test class execution
- Remove hardcoded withName("nameserver") in RocketMQNameserverContainer
  that bypassed ContainerEnvironmentUtil (the network alias "nameserver"
  is sufficient for inter-container communication)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gnodet gnodet changed the title test-infra: Fix Docker container name collisions in parallel builds test-infra: Fix Docker container name collisions in parallel execution Mar 27, 2026
@Croway
Copy link
Copy Markdown
Contributor

Croway commented Mar 27, 2026

@gnodet is the PID appended when using camel infra run $SERVICE too? in this case shouldn't be.. when using camel infra run .. it is handy to have the same name so that it can be used in guides, documentation, and development

gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
Add singleton service support to test-infra service factories that
were missing it. This allows test classes to share a single container
instance per JVM, reducing Docker overhead and preparing the
infrastructure for safe within-JVM parallel test execution.

Each factory now provides:
- A SingletonXxxService inner class extending SingletonService<T>
- A lazy-init SingletonServiceHolder using static initializer
- A createSingletonService() factory method

Factories updated: AzureStorageBlob, AzureStorageQueue, Cassandra,
Consul, Docling, GooglePubSub, Hashicorp, Hazelcast, IbmMQ, Iggy,
Ignite, Keycloak, McpEverything, McpEverythingSse, MicroprofileLRA,
Minio, Mosquitto, Nats, Openldap, Postgres, PostgresVector, RabbitMQ,
Redis, Solr, TensorFlowServing, Triton, Xmpp.

ZooKeeper excluded: requires unique container naming (PR apache#22287)
first, since camel-zookeeper and camel-zookeeper-master both create
containers named "camel-zookeeper" and collide in parallel CI builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
Switch 49 test files from using createService() (non-singleton) to
createSingletonService() (singleton) to allow sharing a single
container instance per JVM. This reduces Docker overhead and enables
safe within-JVM parallel test execution.

Migrated base/support classes (covering all subclasses):
- ConsulTestSupport, NatsITSupport, PahoMqtt5ITSupport
- HazelcastAggregationRepositoryCamelTestSupport
- MinioIntegrationTestSupport, SolrTestSupport, RabbitMQITSupport
- HashicorpVaultBase, PubsubTestSupport, XmppBaseIT
- LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport
- IggyTestBase, TensorFlowServingITSupport, KServeITSupport

Skipped: ConsulHealthIT (instance field with manual lifecycle),
ZooKeeper tests (needs unique container naming from PR apache#22287 first).

Also added createSingletonService() to the spring-rabbitmq
RabbitMQServiceFactory wrapper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
Switch 47 test files from using createService() (non-singleton) to
createSingletonService() (singleton) to allow sharing a single
container instance per JVM. This reduces Docker overhead and enables
safe within-JVM parallel test execution.

Migrated base/support classes (covering all subclasses):
- ConsulTestSupport, NatsITSupport, PahoMqtt5ITSupport
- HazelcastAggregationRepositoryCamelTestSupport
- MinioIntegrationTestSupport, SolrTestSupport
- HashicorpVaultBase, PubsubTestSupport, XmppBaseIT
- LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport
- IggyTestBase, TensorFlowServingITSupport, KServeITSupport

Skipped:
- ConsulHealthIT: instance field with manual lifecycle
- ZooKeeper tests: needs unique container naming (PR apache#22287) first
- RabbitMQ tests: use conflicting exchange/queue declarations with
  hardcoded names, not safe for sharing a single container

Also added createSingletonService() to the spring-rabbitmq
RabbitMQServiceFactory wrapper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
Switch 47 test files from using createService() (non-singleton) to
createSingletonService() (singleton) to allow sharing a single
container instance per JVM. This reduces Docker overhead and enables
safe within-JVM parallel test execution.

Migrated base/support classes (covering all subclasses):
- ConsulTestSupport, NatsITSupport
- HazelcastAggregationRepositoryCamelTestSupport
- MinioIntegrationTestSupport, SolrTestSupport
- HashicorpVaultBase, PubsubTestSupport, XmppBaseIT
- LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport
- IggyTestBase, TensorFlowServingITSupport, KServeITSupport

Skipped (not safe for singleton):
- ConsulHealthIT: instance field with manual lifecycle
- ZooKeeper tests: needs unique container naming (PR apache#22287) first
- RabbitMQ tests: conflicting exchange/queue declarations
- Mosquitto/MQTT5: PahoMqtt5ReconnectAfterFailureIT creates its own
  container for stop/start testing, collides with singleton name

Also added createSingletonService() to the spring-rabbitmq
RabbitMQServiceFactory wrapper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gnodet
Copy link
Copy Markdown
Contributor Author

gnodet commented Mar 27, 2026

@Croway Good question! I looked into this — camel infra run manages services via PID files in ~/.camel/ (e.g., infra-redis-<PID>.json), not by container name. camel infra stop, camel infra ps, etc. all use this PID-based tracking. So the PID+counter suffix in the container name doesn't break any lifecycle management.

That said, I understand the UX concern: if someone runs docker ps, seeing camel-kafka is nicer than camel-kafka-12345-1.

Do you still think we should avoid the suffix specifically for the camel infra run use case? We could potentially use the camel.infra.fixedPort property as a signal — when it's true (set by JBang), skip the PID+counter and use the clean name, since in that mode only one instance runs at a time. Would that be doable?

Claude Code on behalf of Guillaume Nodet

@Croway
Copy link
Copy Markdown
Contributor

Croway commented Mar 27, 2026

@gnodet picture the following blogpost example:

camel infra run postgres
docker exec -i camel-postgres psql -U test -d postgres < schema.sql

This scenario should keep working like this.

We could potentially use the camel.infra.fixedPort property as a signal sounds hacky, but something like this should be done, yeah

Skip PID+counter suffix when camel.infra.fixedPort=true (set by
camel infra run) so containers get predictable names like
camel-postgres for docker exec usability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gnodet
Copy link
Copy Markdown
Contributor Author

gnodet commented Mar 27, 2026

@Croway Done — pushed a new commit that skips the PID+counter suffix when camel.infra.fixedPort=true (set by camel infra run). So docker exec -i camel-postgres psql ... keeps working as expected.

In test mode (default), the unique suffix is still appended for parallel safety.

Claude Code on behalf of Guillaume Nodet

gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
…ce()

Migrate every test class that used createService() to use
createSingletonService(), so all tests share a single container
instance per JVM. This reduces Docker overhead and prepares for
safe within-JVM parallel test execution.

Tests that required isolation fixes to work with shared containers:
- Google PubSub: catch AlreadyExistsException in topic/subscription setup
- Hashicorp Vault: use per-class secret paths to prevent version collisions
- Spring-RabbitMQ: use uniqueName() helper for exchange/queue names
- LRA: delta-tolerant assertions for shared coordinator state
- Consul: remove manual initialization (singleton handles lifecycle)

Note: camel-zookeeper-master tests are kept on createService() because
they share the same @infraservice(serviceAlias="zookeeper") container
name as camel-zookeeper, and mvnd runs these as separate JVMs. The
cross-module container name collision will be resolved by PR apache#22287
(PID-based container naming).

82 test files migrated across 30+ components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet added a commit to gnodet/camel that referenced this pull request Mar 27, 2026
…ce()

Migrate every test class that used createService() to use
createSingletonService(), so all tests share a single container
instance per JVM. This reduces Docker overhead and prepares for
safe within-JVM parallel test execution.

Tests that required isolation fixes to work with shared containers:
- Google PubSub: catch AlreadyExistsException in topic/subscription setup
- Hashicorp Vault: use per-class secret paths to prevent version collisions
- Spring-RabbitMQ: use uniqueName() helper for exchange/queue names
- LRA: delta-tolerant assertions for shared coordinator state
- Consul: remove manual initialization (singleton handles lifecycle)

Tests kept on createService() due to container name collisions:
- camel-zookeeper-master: shares @infraservice(serviceAlias="zookeeper")
  with camel-zookeeper; mvnd runs them as separate JVMs
- camel-paho-mqtt5: PahoMqtt5ReconnectAfterFailureIT creates its own
  MosquittoLocalContainerService for broker lifecycle testing, which
  collides with the singleton container name

These cross-module/intra-module collisions will be resolved by PR apache#22287
(PID+counter-based container naming).

82 test files migrated across 30+ components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants