Skip to content

test: replace Thread.sleep with Awaitility in integration tests#2080

Closed
dkropachev wants to merge 409 commits intoapache:4.xfrom
dkropachev:replace-thread-sleep-with-awaitility
Closed

test: replace Thread.sleep with Awaitility in integration tests#2080
dkropachev wants to merge 409 commits intoapache:4.xfrom
dkropachev:replace-thread-sleep-with-awaitility

Conversation

@dkropachev
Copy link
Copy Markdown
Contributor

Summary

  • Replace fixed Thread.sleep calls with Awaitility polling in integration tests
  • Affected tests: MetricsITBase, DefaultMetadataTabletMapIT, MockResolverIT
  • DriverBlockHoundIntegrationIT left unchanged (intentional sleep for BlockHound)
  • Eliminates wasted wait time by polling for expected conditions

Test plan

  • All modified tests pass locally
  • CI integration tests pass with no regressions

daniloarodrigues and others added 30 commits August 27, 2024 09:57
Motivation:

Applications are improperly printing messages to the log, causing overload in certain situations. There is no need for a println since the log is already being managed by slf4j.

Modifications:

Remove println from buildAsync method

Result:

No improper messages are displayed in the log, and there is no more overload.

Co-authored-by: Danilo Rodrigues <danilo.rodrigues@accenture.com>
* Add CcmBridge#getNodeIpAddress(int)

Adds tiny utility method to CcmBridge.

* Add Resolver interface and implementations

Adds a middleman between driver and all direct calls to InetAddress.
This is meant to help with testing in case there is a need to substitute
name resolution with resolution by specific ip address without modifying
system environment or providing custom DNS server.

Default implementation redirects all calls to InetAddress class which
should not introduce change in behaviour.
One introduced downside is that if the methods are to be customizable
then the interface cannot have them `static`.

Adds also implementation of MockResolver that has its own mapping of hostnames
to addresses.

Adds MockResolverIT.
If rule throw an exception on `before` it's `after` is not being
executed.
It needs to be respected in CustomCcmRule.
Currently 4.x CI pins the CCM version to a specific commit.
Unfortunately this version incorrectly pulls `aarch64` packages even when
`x64_64` are needed, causing CI to fail. Newer version does not pull `aarch64`
when using defaults.
* Remove ccm rule from MockResolverIT

should_connect_with_mocked_hostname() will use CcmBridge.Builder builder
instead.

* Add `MockResolverIT#replace_cluster_test()`

Adds another method that runs scenario in which we replace three node
cluster with the completely new three node cluster. This method runs 20 times
called by `run_replace_test_20_times()` test method.

* Add flags to CcmBridge.start methods

Adds "--wait-other-notice", "--wait-for-binary-proto" if missing.

* Make defaultResolverFactory settable only once before use
* Pull CCM image before the test

* Remove RC images from test list
Log message notifying of closing the previous ControlConnection's channel
wrongly mentions current channel. Switch it to the previous instead.
* Changed Cassandra to ScyllaDB where applicable
* Updated `DefaultLoadBalancingPolicy` documentation
* Updated links to point to ScyllaDB documentations
Removes old Resolver implementations.
Removes Resolver "hooks" that were added in core code.
Adds org.burningwave:tools dependency as the preferred solution.
Adds MultimapHostResolver implementation allowing for one-to-many mapping of
hostnames to ips.
Adds MultimapHostResolverProvider as a replacement for ResolverProvider.
Adjusts MockResolverIT to use new replacement.
…ladb#341)

* Add `MockResolverIT#cannot_reconnect_with_resolved_socket()`

Adds a method for testing the issues that surface after cluster
replacements. Due to the variable, sometimes long runtime it is not added
to any of the test groups.

* Add optional fallback for `ControlConnection#reconnect()`

Adds an experimental option to allow `ControlConnection` to try
reconnecting to the original contact points held by `MetadataManager`,
in case of getting empty query plan from the load balancing policy.

In order to separate this logic from query plans of other queries
`LoadBalancingPolicyWrapper#newControlReconnectionQueryPlan()` was introduced
and is called during reconnection in place of `newQueryPlan()`.
docs: update workflows

fix: java version

fix: java version

fix: java version
…db#388)

Previously the tablet map would be completely emptied on node removal,
necessitating full rebuild of it. With this change the TabletMap will
be scanned through on node removals (`RemoveNodeRefresh`) in order to delete
only those tablets, that contain removed node as one of the replicas.

Additionally we introduce `TabletMapSchemaChangeListener` which will be
automatically registered on context initialization as long as schema metadata
is enabled. It won't be added if the schema metadata is enabled later at
runtime. This listener ensures that relevant tablets will be removed
on removals and updates of both keyspaces and tables.
The `TabletMapSchemaChangesIT` tests its behaviour.

Addresses scylladb#377.
Bumps [ch.qos.logback:logback-classic](https://github.com/qos-ch/logback) from 1.2.3 to 1.2.13.
- [Commits](qos-ch/logback@v_1.2.3...v_1.2.13)

---
updated-dependencies:
- dependency-name: ch.qos.logback:logback-classic
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [com.google.guava:guava](https://github.com/google/guava) from 25.1-jre to 32.0.0-jre.
- [Release notes](https://github.com/google/guava/releases)
- [Commits](https://github.com/google/guava/commits)

---
updated-dependencies:
- dependency-name: com.google.guava:guava
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [org.testng:testng](https://github.com/cbeust/testng) from 7.3.0 to 7.5.1.
- [Release notes](https://github.com/cbeust/testng/releases)
- [Changelog](https://github.com/testng-team/testng/blob/master/CHANGES.txt)
- [Commits](testng-team/testng@7.3.0...7.5.1)

---
updated-dependencies:
- dependency-name: org.testng:testng
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230227 to 20231013.
- [Release notes](https://github.com/douglascrockford/JSON-java/releases)
- [Changelog](https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md)
- [Commits](https://github.com/douglascrockford/JSON-java/commits)

---
updated-dependencies:
- dependency-name: org.json:json
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* CcmBridge methods adjustments

Adds `startWithArgs` methods that allow avoiding the default arguments like
"--wait-other-notice". This wait causes timeout in multi-dc test
utilizing `join_ring: false` so it needs to be avoided.

Adds `addWithoutStart(n, dc)`. As the name suggests it's an add that does not
immediately start the node.

Adds `updateNodeConfig` in order to allow adding new nodes with different
configurations.

* Add ZeroTokenNodesIT

Adds a test class verifying the behavior of the driver when dealing with
zero-token Scylla nodes.

Adds a configurable option which controls whether to allow zero-token nodes as
valid peers. Until now such nodes were considered invalid. Default value of
this option does not change the driver behavior.
…equest-timeout` (scylladb#404)

Updates the default value of `advanced.metadata.schema.request-timeout`
corresponding to `METADATA_SCHEMA_REQUEST_TIMEOUT` TypedDriverOption
to 20 seconds.

Extends the documentation with information about additional function in
Scylla clusters.
Adopt is not maintaned anymore
CI started picking up 2024.2.2 but the feature is not there yet.
This causes test failures.
Increasing minEnterprise to 2024.2.3.

Fixes scylladb#413
Previous version has a bug causing an IOException during
reading of /etc/os-release. See netty's issue 14479 for details.
* Fetch additional keyspace metadata from scylla_keyspaces

Extends schema related classes with methods for querying scylla_keyspaces
in order to determine whether keyspace is tablets-enabled.

Adjusts schema queries tests and several other integration tests
that mock responses from system tables.

Adds `isUsingTablets()` and `setUsingTablets()` to KeyspaceMetadata.

* Use keyspace metadata in `BasicLoadBalancingPolicy#getReplicas`

Modifies the getReplicas method to do the tablet lookup if and only if the
keyspace metadata indicates that it's a tablets-based keyspace. Otherwise refer
to the token map.
Previous behavior was to try tablet map lookup first regardless of the
keyspace configuration.
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.3 to 1.8.4.
- [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases)
- [Commits](scylladb/sphinx-scylladb-theme@1.8.3...1.8.4)

---
updated-dependencies:
- dependency-name: sphinx-scylladb-theme
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Until now, the versions passed through ccm.version property would be
considered invalid if cannot be parsed to one of the known formats.
Now if the parsing fails, the property will be passed to ccm commands
"as is" which ccm will treat as directory name in local repository.
Numbered version will then be fetched through setting up a makeshift cluster
and asking ccm for it.
dkropachev and others added 29 commits January 19, 2026 14:05
Set - doesn't guarantee the order.
When driver works with Nodes, say read replicas from TokenMap, Tablets,
calculate replicas, API user should expect that order does matter and
preserved.
For that we need to switch all these APIs and internal structures to use
Lists
Python warnings discrupts logic that parses ccm results.
In order to avoid test failures on modern environment we better suppress
them.
It will help to handle rate limit error
Bumps [org.assertj:assertj-core](https://github.com/assertj/assertj) from 3.27.6 to 3.27.7.
- [Release notes](https://github.com/assertj/assertj/releases)
- [Commits](assertj/assertj@assertj-build-3.27.6...assertj-build-3.27.7)

---
updated-dependencies:
- dependency-name: org.assertj:assertj-core
  dependency-version: 3.27.7
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
- Added RequestRoutingType and RequestRoutingMethod enums to define request routing strategies.
- Updated DefaultLoadBalancingPolicy to consider request routing type for replica selection, especially for LWT requests.
- Updated various graph statement classes (BytecodeGraphStatement, DefaultBatchGraphStatement, DefaultFluentGraphStatement, DefaultScriptGraphStatement) to implement getRequestRoutingType method.
- Modified BatchStatementBuilder to set request routing type based on LWT status.
- Enhanced DefaultBatchStatement, DefaultBoundStatement, DefaultPrepareRequest, and DefaultSimpleStatement to include routing type in constructors and methods.
- Added logic to avoid slow replicas based on health checks and in-flight requests.
Introduce RequestRoutingType to distinguish LWT from regular queries and
add a new configuration option to control LWT routing behavior.

The new `advanced.load-balancing-policy.default-lwt-request-routing-method`
option allows choosing between:
- REGULAR: Default shuffling and slow replica avoidance
- PRESERVE_REPLICA_ORDER: Maintains replica order from partitioner

Changes:
- Add RequestRoutingType enum (REGULAR, LWT) to classify requests
- Remove unused RequestRoutingMethod enum from Request interface
- Thread RequestRoutingType through Statement builders and implementations
- Update DefaultLoadBalancingPolicy to route LWT queries according to config
- Add corresponding TypedDriverOption and OptionsMap support
- Update prepared statement creation to detect and mark LWT queries
- Remove RequestRoutingMethod.getRoutingMethod() default method

This enables optimized LWT performance by avoiding unnecessary shuffling
when replica order preservation is beneficial for linearizability.

refactor: update LWT request routing method to preserve replica order
- Fix Javadoc positioning: Move @nonnull annotations after doc comments
  in DefaultLoadBalancingPolicy methods (per Java conventions)
- Add missing @nonnull annotation to StatementBuilderTest mock builder
- Add @nullable annotation to NodeStateIT query plan method signature
- Standardize test infrastructure:
  * Add @RunWith(MockitoJUnitRunner.Silent.class) to 7 test classes
  * Update LoadBalancingPolicyTestBase to stub LWT routing config option
  * Convert base class from @RunWith to abstract (subclasses now declare runner)
- Standardize integration test naming: ccmRule→CCM_RULE, sessionRule→SESSION_RULE
- Update test mocks with RequestRoutingType.REGULAR parameter for compatibility
- Improve LWT integration tests:
  * BatchStatementIT: Fix variable references, enhance LWT batch assertions
  * LWTLoadBalancingIT: Change from single-node to replica-set validation
  * Add LWTLoadBalancingMultiDcIT: New multi-DC LWT routing test coverage

No functional changes to production code—purely code quality and test improvements.

Apply suggestions from code review

Co-authored-by: Dmitry Kropachev <dmitry.kropachev@gmail.com>
commons-lang3 was pulled in transitively via the optional gremlin-core
dependency but used directly in production code (DefaultLoadBalancingPolicy).
Since gremlin-core is optional, consumers would not get commons-lang3
transitively, risking NoClassDefFoundError at runtime.

Replace Pair<Integer, Integer> with int[] in moveReplicasToFront() and
ArrayUtils.addAll() with standard Arrays.copyOf()/System.arraycopy()
in VectorCodecTest.
Fix pre-existing xml-format-maven-plugin check failure.
Bumps [sphinx-multiversion-scylla](https://holzhaus.github.io/sphinx-multiversion/) from 0.3.4 to 0.3.7.

---
updated-dependencies:
- dependency-name: sphinx-multiversion-scylla
  dependency-version: 0.3.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Extract hardcoded threadCountClasses=8 into a configurable Maven
property test.parallel.threads. Default remains 8 (no behavior change).
CI can now override with -Dtest.parallel.threads=16.
Most integration tests are I/O-bound (waiting on CCM/Scylla), not
CPU-bound, so higher concurrency helps reduce total test time.
Add test-group matrix (parallelizable, serial, isolated) to both
Cassandra and Scylla integration test jobs in GitHub Actions. Each
group runs in a separate parallel job, reducing wall-clock time from
the sum of all three test phases to the duration of the slowest one.

- Add MAVEN_EXTRA_ARGS support to Makefile integration test targets
- Pass -DskipSerialITs/-DskipParallelizableITs/-DskipIsolatedITs via
  the workflow matrix to select which test group each job runs
- Update artifact names and check names to include test-group
- Remove unused run-tests.sh changes (script is not used by any CI)
Replace CustomCcmRule.builder().build() with CcmRule.getInstance() in
4 test classes that use zero customization, avoiding unnecessary cluster
creation/destruction per test class. Add @category(ParallelizableTests.class)
as required by CcmRule.

Skipped ExecutionInfoWarningsIT because it requires custom Cassandra
configuration (batch_size_warn_threshold_in_kb).
This test uses a mocked Mockito log appender to verify no resource
leaks occur. When run in parallel, other tests' log output contaminates
the appender, causing spurious NeverWantedButInvoked failures.
It must remain isolated (separate JVM fork).
Add @category(ParallelizableTests.class) to 7 integration tests that are
safe to run in parallel:

- LineStringIT, PointIT, PolygonIT: DSE geometry tests using shared
  CcmRule.getInstance() with their own keyspace
- DseAggregateMetadataIT, DseFunctionMetadataIT: DSE schema metadata
  tests using shared CcmRule.getInstance() with their own keyspace
- DriverExecutionProfileReloadIT: Uses isolated SimulacronRule
- DefaultRetryPolicyIT: Uses isolated SimulacronRule

Tests kept serial (not reclassified):
- All graph tests: use CustomCcmRule with DSE graph workloads
- Auth tests: use CustomCcmRule with authentication config
- SSL tests: use CustomCcmRule with SSL config
- Token/metadata tests: use CcmRule.builder() with custom partitioners
- Session add/remove node tests: manipulate CCM nodes
- Metrics tests: explicitly marked as not parallelizable due to
  unsynchronized access to AbstractMetricUpdater.MIN_EXPIRE_AFTER
- HeartbeatDisabledIT: explicitly marked as not parallelizable
- Delete*IT: use CustomCcmRule
DefaultRetryPolicyIT verifies log messages via a mocked appender, which
gets contaminated by other tests running in parallel, causing spurious
TooManyActualInvocations / NeverWantedButInvoked failures.
…e copilot instructions with formatting and build guidelines 🤖
Replace fixed Thread.sleep calls with Awaitility's polling-based
waiting in MetricsITBase, DefaultMetadataTabletMapIT, and
MockResolverIT. This eliminates wasted wait time by polling for
the expected condition instead of sleeping for a fixed duration.

DriverBlockHoundIntegrationIT is left unchanged as its sleep is
intentional for BlockHound testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dkropachev dkropachev closed this Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants