Fix LWT prepared statement routing failure in PRESERVE_REPLICA_ORDER mode#2084
Closed
dkropachev wants to merge 415 commits intoapache:4.xfrom
Closed
Fix LWT prepared statement routing failure in PRESERVE_REPLICA_ORDER mode#2084dkropachev wants to merge 415 commits intoapache:4.xfrom
dkropachev wants to merge 415 commits intoapache:4.xfrom
Conversation
* Pull CCM image before the test * Remove RC images from test list
Log message notifying of closing the previous ControlConnection's channel wrongly mentions current channel. Switch it to the previous instead.
* Changed Cassandra to ScyllaDB where applicable * Updated `DefaultLoadBalancingPolicy` documentation * Updated links to point to ScyllaDB documentations
Removes old Resolver implementations. Removes Resolver "hooks" that were added in core code. Adds org.burningwave:tools dependency as the preferred solution. Adds MultimapHostResolver implementation allowing for one-to-many mapping of hostnames to ips. Adds MultimapHostResolverProvider as a replacement for ResolverProvider. Adjusts MockResolverIT to use new replacement.
* Add `MockResolverIT#cannot_reconnect_with_resolved_socket()` Adds a method for testing the issues that surface after cluster replacements. Due to the variable, sometimes long runtime it is not added to any of the test groups. * Add optional fallback for `ControlConnection#reconnect()` Adds an experimental option to allow `ControlConnection` to try reconnecting to the original contact points held by `MetadataManager`, in case of getting empty query plan from the load balancing policy. In order to separate this logic from query plans of other queries `LoadBalancingPolicyWrapper#newControlReconnectionQueryPlan()` was introduced and is called during reconnection in place of `newQueryPlan()`.
docs: update workflows fix: java version fix: java version fix: java version
This rids us of CVE-2024-47535.
Previously the tablet map would be completely emptied on node removal, necessitating full rebuild of it. With this change the TabletMap will be scanned through on node removals (`RemoveNodeRefresh`) in order to delete only those tablets, that contain removed node as one of the replicas. Additionally we introduce `TabletMapSchemaChangeListener` which will be automatically registered on context initialization as long as schema metadata is enabled. It won't be added if the schema metadata is enabled later at runtime. This listener ensures that relevant tablets will be removed on removals and updates of both keyspaces and tables. The `TabletMapSchemaChangesIT` tests its behaviour. Addresses #377.
Bumps [ch.qos.logback:logback-classic](https://github.com/qos-ch/logback) from 1.2.3 to 1.2.13. - [Commits](qos-ch/logback@v_1.2.3...v_1.2.13) --- updated-dependencies: - dependency-name: ch.qos.logback:logback-classic dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [com.google.guava:guava](https://github.com/google/guava) from 25.1-jre to 32.0.0-jre. - [Release notes](https://github.com/google/guava/releases) - [Commits](https://github.com/google/guava/commits) --- updated-dependencies: - dependency-name: com.google.guava:guava dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [org.testng:testng](https://github.com/cbeust/testng) from 7.3.0 to 7.5.1. - [Release notes](https://github.com/cbeust/testng/releases) - [Changelog](https://github.com/testng-team/testng/blob/master/CHANGES.txt) - [Commits](testng-team/testng@7.3.0...7.5.1) --- updated-dependencies: - dependency-name: org.testng:testng dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230227 to 20231013. - [Release notes](https://github.com/douglascrockford/JSON-java/releases) - [Changelog](https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md) - [Commits](https://github.com/douglascrockford/JSON-java/commits) --- updated-dependencies: - dependency-name: org.json:json dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* CcmBridge methods adjustments Adds `startWithArgs` methods that allow avoiding the default arguments like "--wait-other-notice". This wait causes timeout in multi-dc test utilizing `join_ring: false` so it needs to be avoided. Adds `addWithoutStart(n, dc)`. As the name suggests it's an add that does not immediately start the node. Adds `updateNodeConfig` in order to allow adding new nodes with different configurations. * Add ZeroTokenNodesIT Adds a test class verifying the behavior of the driver when dealing with zero-token Scylla nodes. Adds a configurable option which controls whether to allow zero-token nodes as valid peers. Until now such nodes were considered invalid. Default value of this option does not change the driver behavior.
…equest-timeout` (#404) Updates the default value of `advanced.metadata.schema.request-timeout` corresponding to `METADATA_SCHEMA_REQUEST_TIMEOUT` TypedDriverOption to 20 seconds. Extends the documentation with information about additional function in Scylla clusters.
Adopt is not maintaned anymore
CI started picking up 2024.2.2 but the feature is not there yet. This causes test failures. Increasing minEnterprise to 2024.2.3. Fixes #413
Previous version has a bug causing an IOException during reading of /etc/os-release. See netty's issue 14479 for details.
* Fetch additional keyspace metadata from scylla_keyspaces Extends schema related classes with methods for querying scylla_keyspaces in order to determine whether keyspace is tablets-enabled. Adjusts schema queries tests and several other integration tests that mock responses from system tables. Adds `isUsingTablets()` and `setUsingTablets()` to KeyspaceMetadata. * Use keyspace metadata in `BasicLoadBalancingPolicy#getReplicas` Modifies the getReplicas method to do the tablet lookup if and only if the keyspace metadata indicates that it's a tablets-based keyspace. Otherwise refer to the token map. Previous behavior was to try tablet map lookup first regardless of the keyspace configuration.
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.3 to 1.8.4. - [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases) - [Commits](scylladb/sphinx-scylladb-theme@1.8.3...1.8.4) --- updated-dependencies: - dependency-name: sphinx-scylladb-theme dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Until now, the versions passed through ccm.version property would be considered invalid if cannot be parsed to one of the known formats. Now if the parsing fails, the property will be passed to ccm commands "as is" which ccm will treat as directory name in local repository. Numbered version will then be fetched through setting up a makeshift cluster and asking ccm for it.
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases) - [Commits](scylladb/sphinx-scylladb-theme@1.8.4...1.8.5) --- updated-dependencies: - dependency-name: sphinx-scylladb-theme dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The changes related to this test are not included in released enterprise versions as of this moment. It should be safe to increase minEnterprise requirement to 2025.1 for now.
) Just like other drivers (Python, GoCQL), the Java 4.x driver can query with this simple WHERE clause added, to improve query performance. Fixes: #282 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Full scan requests slower even when only one record there is
Bumps [io.netty:netty-handler](https://github.com/netty/netty) from 4.1.116.Final to 4.1.118.Final. - [Commits](netty/netty@netty-4.1.116.Final...netty-4.1.118.Final) --- updated-dependencies: - dependency-name: io.netty:netty-handler dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…thub.com/apache/cassandra-java-driver into scylla-4.x-merge-4.18.1 Following conflict resolutions and additional changes were made: - Version was changed to 4.18.1.0-SNAPSHOT - deep-lic-scan.yaml was removed. Likewise we don't have a fossa subscription. - added with modifications previously skipped distribution-source module. Changed groupId, versions and names. - added with modifications previously skipped distribution-tests module. Changed groupId, versions and names. - Skipped license-maven-plugin which was checking for ASF license headers. - removed `<oldArtifacts>` section of revapi plugin configuration pointing to datastax. - Skipped addition of maven-remote-resources-plugin (pom.xml). - Skipped removal of distributionManagement section (pom.xml). We are not using `org.apache` parent pom to define that part. - Switched Scylla specific uses of `AttributeKey.newInstance` to `.valueOf` too - Added `WHERE key='local'` to `system.local` queries in `DefaultTopologyMonitorTest` - Removed irrelevant to Scylla version of the driver parts of the documentation - Merged in translation between older and newer variations of config keys and values. - Discarded some adjustments for upstream's Jenkins setup which conflicted with adjustments for our CI. - Added extra stubs in BasicLoadBalancingPolicyPreferredRemoteDcsTest to match Scylla's modifications.
- Fix Javadoc positioning: Move @nonnull annotations after doc comments in DefaultLoadBalancingPolicy methods (per Java conventions) - Add missing @nonnull annotation to StatementBuilderTest mock builder - Add @nullable annotation to NodeStateIT query plan method signature - Standardize test infrastructure: * Add @RunWith(MockitoJUnitRunner.Silent.class) to 7 test classes * Update LoadBalancingPolicyTestBase to stub LWT routing config option * Convert base class from @RunWith to abstract (subclasses now declare runner) - Standardize integration test naming: ccmRule→CCM_RULE, sessionRule→SESSION_RULE - Update test mocks with RequestRoutingType.REGULAR parameter for compatibility - Improve LWT integration tests: * BatchStatementIT: Fix variable references, enhance LWT batch assertions * LWTLoadBalancingIT: Change from single-node to replica-set validation * Add LWTLoadBalancingMultiDcIT: New multi-DC LWT routing test coverage No functional changes to production code—purely code quality and test improvements. Apply suggestions from code review Co-authored-by: Dmitry Kropachev <dmitry.kropachev@gmail.com>
commons-lang3 was pulled in transitively via the optional gremlin-core dependency but used directly in production code (DefaultLoadBalancingPolicy). Since gremlin-core is optional, consumers would not get commons-lang3 transitively, risking NoClassDefFoundError at runtime. Replace Pair<Integer, Integer> with int[] in moveReplicasToFront() and ArrayUtils.addAll() with standard Arrays.copyOf()/System.arraycopy() in VectorCodecTest.
Fix pre-existing xml-format-maven-plugin check failure.
Bumps [sphinx-multiversion-scylla](https://holzhaus.github.io/sphinx-multiversion/) from 0.3.4 to 0.3.7. --- updated-dependencies: - dependency-name: sphinx-multiversion-scylla dependency-version: 0.3.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…to v2.21.1 [security]
Extract hardcoded threadCountClasses=8 into a configurable Maven property test.parallel.threads. Default remains 8 (no behavior change). CI can now override with -Dtest.parallel.threads=16.
Most integration tests are I/O-bound (waiting on CCM/Scylla), not CPU-bound, so higher concurrency helps reduce total test time.
Add test-group matrix (parallelizable, serial, isolated) to both Cassandra and Scylla integration test jobs in GitHub Actions. Each group runs in a separate parallel job, reducing wall-clock time from the sum of all three test phases to the duration of the slowest one. - Add MAVEN_EXTRA_ARGS support to Makefile integration test targets - Pass -DskipSerialITs/-DskipParallelizableITs/-DskipIsolatedITs via the workflow matrix to select which test group each job runs - Update artifact names and check names to include test-group - Remove unused run-tests.sh changes (script is not used by any CI)
Replace CustomCcmRule.builder().build() with CcmRule.getInstance() in 4 test classes that use zero customization, avoiding unnecessary cluster creation/destruction per test class. Add @category(ParallelizableTests.class) as required by CcmRule. Skipped ExecutionInfoWarningsIT because it requires custom Cassandra configuration (batch_size_warn_threshold_in_kb).
This test uses a mocked Mockito log appender to verify no resource leaks occur. When run in parallel, other tests' log output contaminates the appender, causing spurious NeverWantedButInvoked failures. It must remain isolated (separate JVM fork).
Add @category(ParallelizableTests.class) to 7 integration tests that are safe to run in parallel: - LineStringIT, PointIT, PolygonIT: DSE geometry tests using shared CcmRule.getInstance() with their own keyspace - DseAggregateMetadataIT, DseFunctionMetadataIT: DSE schema metadata tests using shared CcmRule.getInstance() with their own keyspace - DriverExecutionProfileReloadIT: Uses isolated SimulacronRule - DefaultRetryPolicyIT: Uses isolated SimulacronRule Tests kept serial (not reclassified): - All graph tests: use CustomCcmRule with DSE graph workloads - Auth tests: use CustomCcmRule with authentication config - SSL tests: use CustomCcmRule with SSL config - Token/metadata tests: use CcmRule.builder() with custom partitioners - Session add/remove node tests: manipulate CCM nodes - Metrics tests: explicitly marked as not parallelizable due to unsynchronized access to AbstractMetricUpdater.MIN_EXPIRE_AFTER - HeartbeatDisabledIT: explicitly marked as not parallelizable - Delete*IT: use CustomCcmRule
DefaultRetryPolicyIT verifies log messages via a mocked appender, which gets contaminated by other tests running in parallel, causing spurious TooManyActualInvocations / NeverWantedButInvoked failures.
…e copilot instructions with formatting and build guidelines 🤖
Extract hardcoded semaphore permit count (5) in SchemaChangeSynchronizer into a configurable system property test.max.concurrent.ddl. This allows tuning DDL concurrency when increasing parallel test thread count, without modifying source code.
The test-integration-* Makefile targets previously ran `mvn verify` across all modules, re-compiling everything unnecessarily. Now we first install all modules to the local repo with -DskipTests, then run verify only on the integration-tests module with -pl.
The workflow never passed SCYLLA_VERSION from the matrix to the resolve-scylla-version make target, so all three Scylla IT variants (LTS-LATEST, LTS-PRIOR, LATEST) silently ran against the LATEST version due to the Makefile default. Fix broken dollar-sign escaping in the LTS-PRIOR scylla-enterprise fallback regex in the Makefile ($ -> $$), which prevented resolution of enterprise-only LTS versions. Additionally, fix the enterprise tablets version threshold from 2023.1.0 to 2024.2.0 in DefaultLoadBalancingPolicyIT and LWTLoadBalancingMultiDcIT. Tablets were introduced in ScyllaDB Enterprise 2024.2.0, not 2023.1.0.
…IT to reuse sessions across tests Convert @rule to @ClassRule for CcmRule and SessionRule so that the CCM cluster and CQL session are created once per test class instead of once per test method. This eliminates repeated cluster startup/teardown overhead across all tests in these classes. For PreparedStatementIT specifically: - Replace DROP+CREATE TABLE with TRUNCATE between tests that don't modify schema, since most tests only need a clean table - Track schema-altering tests (ALTER TABLE, DROP TABLE) with a flag so the table is only recreated when actually needed - Use regular session.execute() for INSERT statements instead of routing them through the slow DDL execution profile - Use try-with-resources in cross-session test to avoid prepare cache interference from the shared session Retain CcmRule as a per-method @rule so that @BackendRequirement, @ScyllaOnly, and @ScyllaSkip annotations are still evaluated per test.
Replace two identical 26-line Thread.sleep(1000) polling loops with a shared waitForAllNodesUp() helper using Awaitility (500ms poll interval, 20s timeout). Reduces timeout from 60s to 20s and uses idiomatic Awaitility consistent with other integration tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mode Addresses issue #830 where LWT prepared statements fail with "No node was available" error when using PRESERVE_REPLICA_ORDER routing (default in 4.19.0.5). Changes: - Enhanced newQueryPlanPreserveReplicas to include non-replica nodes with proper DC prioritization - Added fallback logic to prevent empty query plans for prepared statements - Implemented routing-key-based consistent rotation for non-replica nodes - Replaced diceRoll1d4() with universal randomNextInt(bound) method - Updated tests to validate new behavior including randomness testing The fix maintains replica order preservation while ensuring query plans always contain nodes, resolving the regression introduced in 4.19.0.5.
Contributor
Author
|
Wrong repository - moving to scylladb/java-driver |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes issue #830 where LWT prepared statements fail with "No node was available" error when using PRESERVE_REPLICA_ORDER routing method (default in 4.19.0.5).
The root cause was that the new LWT routing logic returned empty query plans when replica information wasn't available (common for prepared statements before parameter binding).
Changes
Core Fix
Enhanced
newQueryPlanPreserveReplicasto implement proper node prioritization:Added proper handling for both
localDc == nullandlocalDc != nullcasesImplemented routing-key-based consistent rotation for deterministic behavior
Replaced
diceRoll1d4()with universalrandomNextInt(bound)for better testabilityTest Updates
randomNextInt()diceRoll1d4()torandomNextInt(4)Behavior Changes
Before: LWT queries with
PRESERVE_REPLICA_ORDERcould fail with empty query plansAfter: LWT queries always include all available nodes with proper DC prioritization
Backward Compatibility
Test Plan
Resolves #830