test-infra: Fix Docker container name collisions in parallel execution#22287
test-infra: Fix Docker container name collisions in parallel execution#22287gnodet wants to merge 4 commits intoapache:mainfrom
Conversation
Append the JVM PID to generated Docker container names to avoid 409 Conflict errors when multiple modules sharing the same test infrastructure (e.g., camel-elasticsearch and camel-elasticsearch-rest-client) run integration tests in parallel via mvnd. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
| } | ||
| // Append PID to avoid Docker container name conflicts when multiple | ||
| // modules run tests in parallel (e.g., via mvnd with multiple threads) | ||
| name += "-" + ProcessHandle.current().pid(); |
There was a problem hiding this comment.
shouldn't we use the thread id instead of the Process Id?
There was a problem hiding this comment.
The conflict happens between separate JVM processes (mvnd spawns each module in its own daemon JVM), not between threads within the same JVM. Thread ID wouldn't help because:
- Thread IDs are not unique across processes — e.g., both JVMs' main threads could have thread ID 1
- Within a single JVM, the
SingletonServicepattern already ensures only one container is created per service, shared across all test classes/threads - Thread ID can vary depending on which thread first triggers the container creation, making the name non-deterministic within the same JVM — which could actually cause collisions where none exist today
PID is unique per JVM on the host and stable for the entire JVM lifetime, which is exactly the granularity we need.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
To clarify further: the singleton is per-JVM, not cross-JVM.
Within a JVM (e.g., all test classes in camel-elasticsearch):
SingletonServiceHolder.INSTANCEis a static field — one instance per classloaderstore.computeIfAbsent("elastic", ...)in JUnit's rootExtensionContext.Storeensuresinitialize()is called exactly once- All test classes share the same container — same PID, same container name, no conflict
Across JVMs (e.g., camel-elasticsearch vs camel-elasticsearch-rest-client in separate mvnd daemons):
- Each JVM has its own static holder, its own JUnit root context
- Each creates its own Docker container independently
- The containers were never shared — without the PID fix, the second JVM simply crashed with a
409 Conflict
So there was no actual container sharing happening across JVMs before either — it was just failing. The PID suffix makes it work by giving each JVM its own uniquely-named container.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
Also worth noting: the tests are designed to share a single container within a module. Test isolation is not achieved through container isolation but through data partitioning — each test method gets a unique Elasticsearch index prefix via testNameExtension.getCurrentTestName() (see ElasticsearchTestSupport.createPrefix()). This avoids the overhead of restarting Elasticsearch for each test class.
The two modules (camel-elasticsearch and camel-elasticsearch-rest-client) were never sharing a container either — they run in separate surefire/failsafe forks (separate JVMs), each with its own SingletonServiceHolder.INSTANCE. The second JVM was simply crashing trying to create a container with an already-taken Docker name. The PID suffix makes both succeed with their own container, which is what was always intended.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
To be clear, the above analysis is not specific to Elasticsearch — ContainerEnvironmentUtil.containerName() is used by all ~30 test infra services (Kafka, MongoDB, Redis, etc.). The same collision would happen whenever two modules sharing the same test infra service are tested in parallel in separate JVMs. The PID suffix fix applies globally to all of them.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
ok, then the comment is a bit misleading as it mentions multiple threads
There was a problem hiding this comment.
You were right that PID alone wasn't sufficient — there is indeed a within-JVM gap when test classes run in parallel. I've updated the fix to use PID + AtomicInteger counter, which covers both dimensions:
- PID: cross-JVM uniqueness (mvnd parallel builds)
- Counter: within-JVM uniqueness (parallel test classes)
Thread ID specifically wouldn't have worked (not unique across JVMs, and two services created on the same thread would still collide), but your instinct was correct. Thanks for pushing on this!
Claude Code on behalf of Guillaume Nodet
Follow-up: making integration tests more robust for parallel executionThis PR fixes the cross-JVM container name collision (e.g., mvnd testing two modules in parallel). However, there's a broader gap: within-JVM parallel execution of test classes using non-singleton services is only safe because If parallel integration test execution is ever enabled, non-singleton services sharing the same JUnit 5's
This would make the test infrastructure safe for parallel execution without relying on convention. Claude Code on behalf of Guillaume Nodet |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AtomicInteger counter alongside PID for within-JVM uniqueness,
making container names safe for parallel test class execution
- Remove hardcoded withName("nameserver") in RocketMQNameserverContainer
that bypassed ContainerEnvironmentUtil (the network alias "nameserver"
is sufficient for inter-container communication)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@gnodet is the PID appended when using |
Add singleton service support to test-infra service factories that were missing it. This allows test classes to share a single container instance per JVM, reducing Docker overhead and preparing the infrastructure for safe within-JVM parallel test execution. Each factory now provides: - A SingletonXxxService inner class extending SingletonService<T> - A lazy-init SingletonServiceHolder using static initializer - A createSingletonService() factory method Factories updated: AzureStorageBlob, AzureStorageQueue, Cassandra, Consul, Docling, GooglePubSub, Hashicorp, Hazelcast, IbmMQ, Iggy, Ignite, Keycloak, McpEverything, McpEverythingSse, MicroprofileLRA, Minio, Mosquitto, Nats, Openldap, Postgres, PostgresVector, RabbitMQ, Redis, Solr, TensorFlowServing, Triton, Xmpp. ZooKeeper excluded: requires unique container naming (PR apache#22287) first, since camel-zookeeper and camel-zookeeper-master both create containers named "camel-zookeeper" and collide in parallel CI builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch 49 test files from using createService() (non-singleton) to createSingletonService() (singleton) to allow sharing a single container instance per JVM. This reduces Docker overhead and enables safe within-JVM parallel test execution. Migrated base/support classes (covering all subclasses): - ConsulTestSupport, NatsITSupport, PahoMqtt5ITSupport - HazelcastAggregationRepositoryCamelTestSupport - MinioIntegrationTestSupport, SolrTestSupport, RabbitMQITSupport - HashicorpVaultBase, PubsubTestSupport, XmppBaseIT - LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport - IggyTestBase, TensorFlowServingITSupport, KServeITSupport Skipped: ConsulHealthIT (instance field with manual lifecycle), ZooKeeper tests (needs unique container naming from PR apache#22287 first). Also added createSingletonService() to the spring-rabbitmq RabbitMQServiceFactory wrapper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch 47 test files from using createService() (non-singleton) to createSingletonService() (singleton) to allow sharing a single container instance per JVM. This reduces Docker overhead and enables safe within-JVM parallel test execution. Migrated base/support classes (covering all subclasses): - ConsulTestSupport, NatsITSupport, PahoMqtt5ITSupport - HazelcastAggregationRepositoryCamelTestSupport - MinioIntegrationTestSupport, SolrTestSupport - HashicorpVaultBase, PubsubTestSupport, XmppBaseIT - LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport - IggyTestBase, TensorFlowServingITSupport, KServeITSupport Skipped: - ConsulHealthIT: instance field with manual lifecycle - ZooKeeper tests: needs unique container naming (PR apache#22287) first - RabbitMQ tests: use conflicting exchange/queue declarations with hardcoded names, not safe for sharing a single container Also added createSingletonService() to the spring-rabbitmq RabbitMQServiceFactory wrapper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch 47 test files from using createService() (non-singleton) to createSingletonService() (singleton) to allow sharing a single container instance per JVM. This reduces Docker overhead and enables safe within-JVM parallel test execution. Migrated base/support classes (covering all subclasses): - ConsulTestSupport, NatsITSupport - HazelcastAggregationRepositoryCamelTestSupport - MinioIntegrationTestSupport, SolrTestSupport - HashicorpVaultBase, PubsubTestSupport, XmppBaseIT - LdifTestSupport, AbstractLRATestSupport, DoclingITestSupport - IggyTestBase, TensorFlowServingITSupport, KServeITSupport Skipped (not safe for singleton): - ConsulHealthIT: instance field with manual lifecycle - ZooKeeper tests: needs unique container naming (PR apache#22287) first - RabbitMQ tests: conflicting exchange/queue declarations - Mosquitto/MQTT5: PahoMqtt5ReconnectAfterFailureIT creates its own container for stop/start testing, collides with singleton name Also added createSingletonService() to the spring-rabbitmq RabbitMQServiceFactory wrapper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@Croway Good question! I looked into this — That said, I understand the UX concern: if someone runs Do you still think we should avoid the suffix specifically for the Claude Code on behalf of Guillaume Nodet |
|
@gnodet picture the following blogpost example: This scenario should keep working like this.
|
Skip PID+counter suffix when camel.infra.fixedPort=true (set by camel infra run) so containers get predictable names like camel-postgres for docker exec usability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@Croway Done — pushed a new commit that skips the PID+counter suffix when In test mode (default), the unique suffix is still appended for parallel safety. Claude Code on behalf of Guillaume Nodet |
…ce() Migrate every test class that used createService() to use createSingletonService(), so all tests share a single container instance per JVM. This reduces Docker overhead and prepares for safe within-JVM parallel test execution. Tests that required isolation fixes to work with shared containers: - Google PubSub: catch AlreadyExistsException in topic/subscription setup - Hashicorp Vault: use per-class secret paths to prevent version collisions - Spring-RabbitMQ: use uniqueName() helper for exchange/queue names - LRA: delta-tolerant assertions for shared coordinator state - Consul: remove manual initialization (singleton handles lifecycle) Note: camel-zookeeper-master tests are kept on createService() because they share the same @infraservice(serviceAlias="zookeeper") container name as camel-zookeeper, and mvnd runs these as separate JVMs. The cross-module container name collision will be resolved by PR apache#22287 (PID-based container naming). 82 test files migrated across 30+ components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ce() Migrate every test class that used createService() to use createSingletonService(), so all tests share a single container instance per JVM. This reduces Docker overhead and prepares for safe within-JVM parallel test execution. Tests that required isolation fixes to work with shared containers: - Google PubSub: catch AlreadyExistsException in topic/subscription setup - Hashicorp Vault: use per-class secret paths to prevent version collisions - Spring-RabbitMQ: use uniqueName() helper for exchange/queue names - LRA: delta-tolerant assertions for shared coordinator state - Consul: remove manual initialization (singleton handles lifecycle) Tests kept on createService() due to container name collisions: - camel-zookeeper-master: shares @infraservice(serviceAlias="zookeeper") with camel-zookeeper; mvnd runs them as separate JVMs - camel-paho-mqtt5: PahoMqtt5ReconnectAfterFailureIT creates its own MosquittoLocalContainerService for broker lifecycle testing, which collides with the singleton container name These cross-module/intra-module collisions will be resolved by PR apache#22287 (PID+counter-based container naming). 82 test files migrated across 30+ components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
ContainerEnvironmentUtil.containerName()to prevent409 Conflicterrors in parallel test executionRocketMQNameserverContainerthat bypassedContainerEnvironmentUtilProblem
Docker container name collisions occur in two scenarios:
Cross-JVM (mvnd parallel builds): When multiple modules sharing the same test infra service (e.g.,
camel-elasticsearchandcamel-elasticsearch-rest-client) run in separate JVMs, both try to create a container with the same name (e.g.,camel-elasticsearch). The second JVM fails with409 Conflict. Seen in chore(deps): Bump org.elasticsearch.client:elasticsearch-rest-client-sniffer from 9.3.1 to 9.3.2 #22237.Within-JVM (parallel test classes): When
camel.failsafe.parallel=true, JUnit 5 runs test classes concurrently (mode.classes.default=concurrent). Non-singleton services each create their own container with the same name, causing the same collision.Fix
Generate instance-unique container names:
camel-{alias}-{pid}-{counter}SingletonService.addToStore()usescomputeIfAbsent, so the wrapped service (and its container name) is created exactly onceAdditionally, removed
withCreateContainerCmdModifier(cmd -> cmd.withName("nameserver"))fromRocketMQNameserverContainer— this hardcoded name bypassedContainerEnvironmentUtiland would collide in parallel execution. The network alias"nameserver"(which is network-scoped) is sufficient for broker-to-nameserver communication.Safe because
cmd.withName()(Docker identification), never for network aliases or inter-container communication — verified across all ~30 servicesinfra run/stop/psmanages services by JVM PID files, not Docker container namesTest plan
mvn install -B -pl test-infra/camel-test-infra-common,test-infra/camel-test-infra-rocketmq -DskipTestscompiles successfully🤖 Generated with Claude Code
Claude Code on behalf of Guillaume Nodet