-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Jenkinsfile for multibranch support in IBM TAAS jenkins. #4576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+524,804
−67,615
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What is the issue Node crashes during node replacements result in hibernated nodes that cannot join the cluster anymore due to a lack of SYN messages from seeds. ### What does this PR fix and why was it fixed Port DB-1482, which allows the use a jmx endpoint on a seed to bring the hibernated node back to the gossiping candidate list. Tested via: datastax/cassandra-dtest#75.
Fixes: riptano/cndb#13196 ### What is the issue We were incorrectly sharing a file reader across threads in BM25 queries, which can lead to invalid results, as I reproduced in the test. ### What does this PR fix and why was it fixed The PR fixes the issue by creating a `DocLengthsReader` object per query.
### What is the issue Fixes riptano/cndb#13203 The underlying problem is that the `OutboundConnection` needs to store the message version from the peer. Otherwise, when the local `current_version` is higher than the remove `current_version`, we store the wrong value (we were storing the local value, not the agreed upon value). The PR also fixes an issue in the ANN_OPTIONS serialization logic that we can hit in rare cases where the peers haven't yet connected, so the validation logic that would prevent serialization due to a peer having too low of a message version is skipped. ### What does this PR fix and why was it fixed This PR fixes several issues related to upgrades by improving handshake logic. --------- Co-authored-by: Sergio Bossa <sergio.bossa@gmail.com>
### What is the issue CNDB-12599 ### What does this PR fix and why was it fixed Exposes the space requirements of compaction picks and compaction tasks for CNDB's compaction space management.
Fixes riptano/cndb#12683 Table names are used in file names. Since the table names were not validated on its length, creating or flushing a table with too long name fails on too long file name. There are two cases with using table names in file names: - keyspace_name .table_name-controller-config.JSON - table_name-32chars_table_id The maximum allowed file name size is 255 chars. Thus, - keyspace and table names, which are together longer than 231 chars, will fail. - a table name, which is longer than 222 chars, will fail. This PR checks that creating new table name should not have too long table name. New tables are identified through the query's client state, which should not be internal. The limit is 222 chars for combined keyspace and table names. This is more restrictive than necessary, but is easier to explain in the documentation. The change is tested in CNDB that creating new tables with long names are prevented, but existing tables still work.
…o datastax/cassandra-ccm
…essaging version (#1624) This makes sure the proper version is used, as following CNDB-13203, each connection maintains its own version, separated from the one recorded in the MessagingService. ### What is the issue `ANNOptions#validate()` was previously checking endpoint versions via `MessagingService#versions`: but following #13203, in order to resolve a race condition, each endpoint connection tracks its own `EndpointMessagingVersions` object, with the version negotiated during handshake, while `MessagingService#versions` is updated with the max version of each endpoint. It follows checking `MessagingService#versions` might not rely on the right version used for the connection. ### What does this PR fix and why was it fixed This PR introduces a new `MessagingService` method to check the connection version, and uses it inside `ANNOptions#validate()`.
Trying to create an SAI or other index on a column, which name is either long or has non-alphanumeric characters, fails, while the table was successfully created with such name. This happens if an index name is not provided. The reason for this limitation is that the column name is used for generating the default index name, which is used in the file names. Fixes riptano/cndb#12503, fixes riptano/cndb#12609 This PR removes the check for column names during an index creation. Thus it allows to create indexes on existing columns with any names including qualified names and long names. 1. The check was preventing qualified column names with characters, which cannot be used in file names. This was never an issue as the index name construction from table and column names already removes any non-alphanumeric or non-underscore characters. 2. The check was preventing long column names. This limitation is removed by - Adding calculation of maximum amount characters, which can be used in index name, so it will allow to construct file names without exceeding the limit of 255 chars (255 is added as a constant). - Then the index name is truncated. 3. The uniqueness of index names is already fixed by existing implementation, which adds counter prefix when needed. This change to generated index name does not affect existing indexes as the existing name from `IndexMetadata` is always serialized to JSON, thus avoiding to re-generating the name. This fix required to fix CQL tester, since it was assuming that the restriction is in place. So column name regex pattern matcher is fixed in `createIndex` to allow non-restricted column names. There might still be possible corner cases on some column names used for generating index names, which will not work with `createIndex`. In such case `execute` should be used. It also includes several fixes in affected files: - Improve the name and usage of constants. - Fix warnings with minor improvements in affected files.
The goal of this commit is to allow low-level code, typically at the level of file reads (say, a custom `FileChannel` implementation), to obtain context on which higher level operation they correspond to. A typical use case may for tiered storage where `FileChannel` implementations may forward reads to remote storage, and where having context on the overall operation could help, say, take prior work done to set timeout, or aggregate metrics per "user-level" operations (rather than per `FileChannel` operation). To do this, we reuse the existing `ExecutorLocals` mechanism, and simply have the top-level operation setup the proper context as an `ExecutorLocal`, allowing it to be accessed by any low-level operations operating on behalf of that operation. This is, in many way, similar to tracing, but instead of a `TraceState` that collect what happens during the operation, it is a relatively flexible notion of `OperationContext`. As of this patch, this feature is fairly barebone and mostly exists for extensions in the following sense: 1. only user reads (`ReadCommand` execution) currently sets up such a context. The code is written in such a way that adding support for other operations should be easy, but this is not done. 2. the context set by reads by default has barely any information: it merely has a `toString()` method that can roughly tell what the operation itself is, and so could have use for debugging, but not much more. Further, that context is not read by anything in this patch. However, said context are created through a "factory" and the factory class is configurable through a system property. So extension can override the factory in order to create contexts with more information/state, and they fetch/use those context where they see fit.
patch by Berenguer Blasi; reviewed by Andres de la Peña for CASSANDRA-17333
…as causing test flakiness.
### What is the issue When the node gets replaced, it announces itself as hibernated (one of the silent-shutdown states). When the node replacement fails, as the other nodes see the replacing node in that state then the replacing node won't receive any gossip messages from the seed at subsequent startup - which ends up with an exception. ### What does this PR fix and why was it fixed This patch adds an explicit shutdown announcement via gossip to let other nodes know that the node was explicitly shut down - as it was due to the exception. That allows other nodes (seeds in particular) to contact the replacing node at its next startup, thus allowing it to retry the replacement.
When a wildcard selection is used, the score column
must be included in the selection.
Fixes:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 10.87.217.86:9042 us-central1>: <Error from server: code=0000 [Server error] message="java.lang.ClassCastException:
class org.apache.cassandra.serializers.UTF8Serializer cannot be cast to class org.apache.cassandra.db.marshal.VectorType$VectorSerializer
(org.apache.cassandra.serializers.UTF8Serializer and org.apache.cassandra.db.marshal.VectorType$VectorSerializer are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @446c3920)">})
Reject creating indexes with analysis options on frozen collections. We don't have a way to correctly index them, and we don't support querying either.
Part of riptano/cndb#13375 ### What is the issue Using keyspace name length in the error message for too long table name is confusing, since the length might include CNDB's generated prefix invisible to users. ### What does this PR fix and why was it fixed Reformulates the error message for too long table name to avoid using the keyspace name length, since it might contain CNDB generated prefix, which is not visible to users.
Creating an index with too long index names leads to file errors or incorrect results. This PR prevents creating new indexes, i.e., not internal, with provided index names longer than 182 characters. This constant is based on the most restrictive requirement from different index types. It comes from SAI by subtracting the longest file name addition from the file name limit of 255 characters.
InetAddressAndPort fields were previously InetAddress, and serialised as either byte[] addresses or text hostnames. Adds two unit tests, simple serdes of InetAddress to InetAddressAndPort, and parsing of DSE-6 files (kept in `test/data/legacy-metadata/dse-6.8/nodes`).
The 4th component can be used to distinguish version which introduce breaking changes, especially we want to retain the appropriate version ordering.
…1641) ### What is the issue Fixes riptano/cndb#13403 by implementing one of the solutions. ### What does this PR fix and why was it fixed The current synthetic column logic relies on convention to know that the synthetic column's source column is the SAI ORDER BY column. This seems fragile and will break if we find a reason to introduce a new synthetic column. I propose that we serialize the source column name so that we can more easily support multiple synthetic columns. The urgency for this change is that we want to get it in before we are locked into a given serde protocol. Note however, that I haven't solved the primary issue described in riptano/cndb#13402. CNDB tests riptano/cndb#13418
…re to use SSTableReader and add more loggings to index tracking (#1644)
…NOT AVAILABLE when they cannot query because an index is building. We add a new reason, `RequestFailureReason.INDEX_BUILD_IN_PROGRESS`, to differentiate indexes that are currently building in error messages Co-authored-by: Sergio Bossa <sergio.bossa@gmail.com>
…ead of the query analyzer (#1615) Fix `RowFilter.AnalyzableExpression#numFilteredValues`, which is used by the `query_filters` guardrail, to use the query analyzer, instead of the index analyzer. That prevents wrong triggerings of the guardrail.
### What is the issue The customer requested an upgrade for some libraries (CVEs) ### What does this PR fix and why was it fixed This PR upgrades to latest stable versions - snappy to 1.1.10.7 fixing CVE-2023-43642 - guava to 33.4.0-jre fixing CVE-2023-2976 and CVE-2020-8908 - jackson to 2.18.0 fixing CWE-400 - snakeyaml to 2.4 fixing CVE-2022-1471
…ing a row (#1653) ### What is the issue see CNDB-13480 ### What does this PR fix and why was it fixed Exclude deleted rows while computing terms frequencies and preparing for computing the scores
…on queries (#1638) Queries with index-based ordering (ANN, BM25, generic `ORDER BY`) don't support CQL's aggregation queries, such as `GROUP BY` or `SELECT COUNT(*)`. The reason is that the coordinator internally uses paging to fetch more results from the replicas until the aggregated limit is satisfied. Since top-k index-based ordering doesn't support paging, these queries will fail. We should reject these queries until we have a way to support them.
…1200) Add support for updating a row's terms in the `TrieMemoryIndex`. There are two general cases for row to term mappings: 1:one and 1:many. In the 1 to 1 case, update is trivial. We check before insertion that we have a new value, then we insert the new value and remove the old value. In the 1 to many case, the solution is somewhat bespoke, but is well documnted and encapsulated. The main challenge comes in the form of when to add/remove terms without increasing the overall time complexity of the update operation. In this design, a majority of the complexity is in the PrimaryKeys object. There, we monitor seen keys and use whether we've seen the key to decide whether to reset the frequency counter and then subsequently to decide whether to leave the key or remove it (the term could be in both the old and the new values). This whole design hinges on the fact that we synchronize additions to the trie. The one drawback in terms of performance is that collections are added and removed within the sync block. This change is necessary to deal with encoding nuances unless we want to largely refactor the class. The motivation for needing updates is to ensure that we have unique primary keys in the postings list for scalar values and for a given map key. Supporting updates means that the order of terms within an index is internally consistent.
### What is the issue Fixes riptano/cndb#13446 ### What does this PR fix and why was it fixed This pr fixes an issue in the search-then-sort path where we never apply the query term filter. The change is quite simple because we were already scoring these terms, so we just need to filter out the ones that do not have frequency of at least 1 for each query term.
…be indexed with a non-analyzed index (#1656) ### What is the issue ... ORDER BY queries return wrong result if we have an analyzed index, as reported in CNDB-13464 ### What does this PR fix and why was it fixed ... ORDER BY should not work with an analyzed index at all. This PR fixes it and improves the error message.
### What is the issue The original AA SAI format was writing to disk with different byte-comparable version depending on the version of the sstable files. This causes problems with the ByteComparable version checking that was recently introduced. ### What does this PR fix and why was it fixed - Introduces a new `ByteComparable.Preencoded` interface meant to store and surface the byte-comparable version something was encoded with. - Changes tries to always return `Preencoded` type, so that callers may easily query the version. - Changes SAI code to use the correct byte-comparable version when that is necessary, and to ignore or use Preencoded's version when it is not feasible or important.
Work done in HCD-82 was further refined in CASSANDRA-20368 and CASSANDRA-20450. This backports C* changes in order to minimize the diff in future backports.
…ows/tombstones (#2132) The metrics partitionsRead, rowsFiltered, rowsPreFiltered and shadowedPrimaryKeyCount, which don't always work as expected, are replaced by these metrics: * keysFetched: Number of partition/row keys that will be used to fetch rows from the base table. * partitionsFetched: Number of live partitions fetched from the base table, before post-filtering and sorting. Note that currently ANN fetches one partition per index key, without any grouping of same-partition clusterings. * partitionsReturned: Number of live partitions returned to the coordinator, after post-filtering and sorting. * partitionTombstonesFetched: Number of deleted partitions found when reading the base table. * rowsFetched: Number of live rows fetched from the base table, before post-filtering and sorting. * rowsReturned: Number of live rows returned to the coordinator, after post-filtering and sorting. * rowTombstonesFetched: Number deleted rows or row ranges found when reading the base table. StorageAttachedIndexSearcher is modified to use the command timestamp, rather than getting the current time every time a query timestamp is needed, which was possibly buggy and inefficient.
While working on CNDB-15608, IntelliJ lint complains were noticed, which are not related to the actual changes in the patch port. Thus I fix them in this separate commit to avoid unnecessary noise while working on the actual patch port. Many of the changes are align what I have already merged earlier in other PRs. Some of the changes might not match preferences from others and I am open for discussion. The changes include: - Remove unused imports - Use the formatter of the logger instead of string concatenation - Use method instead of lambda - Remove unnecessary suppression of resource warnings - Simplify Boolean conditions - Remove unnecessary modifiers in interfaces - Fix typos - Fixing links in javadoc comments - Add static modifier to nested classes - Remove class fields when not used - Remove unnecessary throws in method signatures - Use final when recommended - Remove unused method arguments - Replace single char strings with chars - Remove unnecessary null variable initialization - Replace assert true with assert equal - Change order of assert arguments to have expected value first - Remove unnecessary explicit casting
PartitionAwarePrimaryKeyMap implements overcomplicated `ceiling` method calling `exactRowIdOrInvertedCeiling`. This commit Simplifies PartitionAwarePrimaryKeyMap.ceiling to use the corresponding correct method from the reader directly. This can be seen as a follow up to https://github.com/datastax/cassandra/pull/1096/files#diff-c5011580ab9b0d99d9e504570c4cccb152221d3dbe62c8a956e83fce9070b380
…ands (#2147) Add information about fetched/returned/deleted partitions and rows to the logging reports for non-SAI slow queries. There is a new system property named `cassandra.monitoring_execution_info_enabled` to disable this feature, which is enabled by default.
Extends IndexQuerySupport framework with new test scenarios mixing row-aware and non-row-aware index versions. In the added scenarios, rows are first inserted and flushed into AA indexes, then the node(s) are upgraded to a row-aware index SAI version and finally more rows are inserted and flushed. Then existing workloads are executed against those mixed version indexes. Compaction of mixed version indexes is also tested. This commit does not add new test queries nor new datamodels, but only reuses the existing tests we had that were executed against a single version at a time.
…2150) ### What is the issue CNDB-16135: Descriptor uses different instances of `Path directory` for the same table as constructor parameter ### What does this PR fix and why was it fixed Use the table directory from `Directories` in `Descriptor` as constructor parameter . `Descriptor#directory` is still different instance in C* due to `directory#toCanonical()` which always creates new instance in local file system
### What is the issue UCS settings files are not dropped after the table gets dropped. Instead they are supposed to be cleared after the node restart. The cleanup is faulty though and it prevents the node from startup. Root Cause: The cleanupControllerConfig() method in CompactionManager attempts to verify if a table exists by calling getColumnFamilyStore(). When the table is dropped, this method throws IllegalArgumentException, which was not being caught. The existing catch block only handled NullPointerException (for missing keyspace). ### What does this PR fix and why was it fixed Extended the exception handler to catch both NullPointerException and IllegalArgumentException, allowing orphaned controller-config.JSON files to be properly identified and deleted during node restart.
We don't need all the counters in QueryContext to be LongAdder, since they are only accessed by the query's thread. They used to be plain longs until VECTOR-SEARCH-29 parallelized vector searches, making them accessed by multiple threads. However, that parallelization was removed by the BM25 patch. The system properties USE_PARALLEL_INDEX_READ and PARALLEL_INDEX_READ_NUM_THREADS are removed, since they were no-op since the introduction of BM25. Since we don't reject unknown system properties, they will remain no-op.
Ports CASSANDRA-19483 which added an option to group invalid legacy protocol magic exception with an no-spam logger. The test was slightly modified, it contains the timeout for log watching and ensure the logs start from the mark that happens before the exception. It ensures the test does not hang without the fix it covers. The motivation behind the port is to fix https://datastax.jira.com/browse/DSP-24639
…ose rowIdIterator (#2183) ### What is the issue Fixes: riptano/cndb#16336 ### What does this PR fix and why was it fixed In SAI, when we handle already created iterators, we need to close them on the exceptional path. This is not the most aesthetic practice, but it is the current one, so this change follows the current standard of catching throwable, calling close, and throwing the throwable. In this case, we hit an exception when reading from disk for `hasNext` but there are at a couple spots there could be exceptions in there, so I wrapped the whole block.
…tionGraph to index multiple columns (#2177) ### What is the issue Fixes riptano/cndb#16252 ### What does this PR fix and why was it fixed Add a new method to the `IndexComponents.ForWrite` interface named `tmpFileFor`. The new method is expected to create temporary files that are namespaced to the index build. This fixes an issue in the `CompactionGraph` where we incorrectly created temp files with the same name in such a way that they collided. Now the files have the column name and the build id, so they will be properly namespaced while maintaining some form of meaningful name.
ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal. Top of the error stack: ``` java.lang.NullPointerException at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58) at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52) at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129) ``` This can cause a flush to fail. Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync. Tests are added in ULIDBasedSSTableIdGeneratorTest A package-protected constructor is introduced for ULIDBasedSSTableIdGeneratorTest.testGeneratorRetryOnEmptyOptional() Cassandra Applicability: upstream doesn't have ULIDBasedSSTableId (and won't because CASSANDRA-17048).
The patch in CNDB-15608 will remove token from the partition key serialization, thus the case, when only token is provided in primary key, needs to be treated separately. Currently only token primary key case is blurred with complete or deferred primary keys. This commit implements clear separation for primary keys with token only by introducing own `TokenOnlyPrimaryKey` class. It also implement `isOnlyToken()` method, which is used instead of comparing partition key with null in appropriate cases (e.g., deferred partition key was loaded). New tests are added to exercise `TokenOnlyPrimaryKey` more and improve code coverage.
…2156) ### What is the issue ConcurrentHashMap backing ChunkCache sometimes falls into Treeification ( O(log n) access time instead of O(1)) Full context in https://github.com/riptano/cndb/issues/16243 ### What does this PR fix and why was it fixed * Improve hashCode of the Key to prevent collisions * Implement Comparable<Key> in case the ConcurrentHashMap falls back in to Tree mode
…sort (#2184) ### What is the issue Fixes: riptano/cndb#16344 ### What does this PR fix and why was it fixed When `ANN_USE_SYNTHETIC_SCORE` is true, we send the score from the replica to the coordinator. We need that score to be the full precision score to ensure correct ordering. When a query supplies `rerank_k: 0`, the score is approximate though and can produce out of order results, as the test demonstrates. The fix is to configure the `NodeQueueRowIdIterator` appropriately to indicate when the score is in fact approximate. The FP score is only computed when `ANN_USE_SYNTHETIC_SCORE` is true.
### What is the issue Exclusion of *.SF files from jar that get packaged into the dtest-jar does not work as expected. As a result if a jar contains the *.SF file, like for instance the org.eclipse.jdt:ecj dependency, then the `ECLIPSE_.SF` file get packages into the dtest-jar. It dependent projects like HCD, it results in dtest-jar missing from classpath, which causes runtime exceptions. ### What does this PR fix and why was it fixed This patch ports the dtest jar packaging code from C* 5, which uses an approach of unpackaging all the jars, then filtering the files to remove the undesired metadata and finally it repackages the classes into a fat jar. CNDB PR: riptano/cndb#16382
Add execution information to the log reports for aborted queries. The extension of the log messages is identical to what we did for slow successful queries in CNDB-15260 and CNDB-15260, but applied to aborted queries. For regular queries this information is the number of fetched/returned/deleted partitions and rows. For SAI queries, this information is the SAI query metrics and a tree view of the SAI query plan. This also renames the CassandraRelevantProperties related to monitoring, to align them with ASF/CC5. The renames are: SLOW_QUERY_LOG_MONITORING_REPORT_INTERVAL_IN_MS -> MONITORING_REPORT_INTERVAL_MS SLOW_QUERY_LOG_MONITORING_MAX_OPERATIONS -> MONITORING_MAX_OPERATIONS SLOW_QUERY_LOG_EXECUTION_INFO_ENABLED -> MONITORING_EXECUTION_INFO_ENABLED SAI_SLOW_QUERY_LOG_EXECUTION_INFO_ENABLED -> SAI_MONITORING_EXECUTION_INFO_ENABLED Please note that the real underlying system properties are unchanged, and this only renames their enum names. The feature is toggled with the cassandra.monitoring_execution_info_enabled system property for general queries, and cassandra.sai.monitoring_execution_info_enabled for SAI queries. The per-writer endpoints to enable/disable slow query logging are http://<HOST>:<PORT>/writer/api/v0/nodetool/cassandra_relevant_property?property=MONITORING_EXECUTION_INFO_ENABLED&new_value=<false|true> for general queries, and http://<HOST>:<PORT>/writer/api/v0/nodetool/cassandra_relevant_property?property=SAI_MONITORING_EXECUTION_INFO_ENABLED&new_value=<false|true> for SAI queries
…able to enable/disable TableStateMetrics (#2119)
When a memory index contains very few rows and is split into many shards, we can expect a lot of variance in the number of rows between the shards. Hence, if we took only one shard to estimate the number of matched rows, and extrapolate that on all shards to compute the estimated matching rows from the whole index, we risk making a huge estimation error. This commit changes the algorithm to take as many shards as needed to collect enough returned or indexed rows. For very tiny datasets is it's likely to use all shards for estimation. For big datasets, one shard will likely be enough, speeding up estimation. This change also allows to remove one estimation method. We no longer need to manually choose between the estimation from the first shard and from all shards. Additionally, the accuracy of estimating of NOT_EQ rows has been improved by letting the planner know the union generated by NOT_EQ is disjoint so the result set cardinality is the sum of cardinalities of the subplans. The commit contains also a fix for a bug that caused some non-hybrid queries be counted as hybrid by the query metrics. Unused keyRange parameters have been removed from the methods for estimating row counts in the index classes.
…ut. (#2194) ### What is the issue There are two different ways in which network timeouts are detected during Repair operations--there is a configured count-down latch which will fail if the requests take too long, and there is a separate timeout returned by the underlying messaging system. Unfortunately, when the underlying messaging system timed out, it was being treated as a general network error instead of a timeout. The net result is that _very rarely_ the network will timeout before the count down latch, and some tests in the CI build will fail with an incorrect error message. ### What does this PR fix and why was it fixed This resolves riptano/cndb#14687. The main motivation is flaky unit tests, but the end user will also see a more consistent error message in the event of a network timeout that is detected at a lower level than the latch timeout.
…Written (#2181) ### What is the issue riptano/cndb#16330 ### What does this PR fix and why was it fixed This removes the lock in `PartialLifecycleTransaction::trackNewWritten` on the main transaction `trackNewWritten` as `RemoteLogTransaction::trackNewWritten` performs the upload of SSTables to remote storage which takes a long time and thus it leads to contention and prevents parallel upload to remote storage.
…ost (#2189) ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ```
…porting repair message timeouts. (#2185) ### What is the issue A repair task in CNDB would hang forever if a timeout for validate or sync request is reached. This was happening because the repair initiator (the repair service in the primary region) would consider the peer repair service as not supporting timeouts, and therefore it would not fail the repair task, leaving it hanging. This behavior is due to the way C* used to detect if a peer supports repair timeouts, which is by checking the C* version of the peer in the `Nodes.peers()` table. In CNDB, though, the repair services (or external services in general) are not added to the peers as they are not handled by the ServiceTracker. ### What does this PR fix and why was it fixed Add system property to always consider the remote peer supporting repair message timeouts. Right now the peer's version is checked to figure out if timeouts are supported, but that doesn't work in CNDB, as repair services are not added to the Nodes.peers() table, so there's no clear way to check what version a remote peer is running on. The newly introduced system property allows to skip this version check and always consider timeouts supported by the remote peer. The default value of the property is `false` to provide backward compatibility.
This commit adds new metrics related to the operation of SAI query planner. The metrics should help checking if the query planner makes proper decisions by correlating them with the other metrics, e.g. the metrics of the actual query execution. Per-query metrics (histograms): - `RowsToReturnEstimated`: the estimated number of rows to be returned by the query - `RowsToFetchEstimated`: the estimated number of rows the query is going to fetch from storage - `KeysToIterateEstimated`: the estimated number of primary keys to read from the indexes when executing the query to completion - `CostEstimated`: the abstract cost of query execution - `LogSelectivityEstimated`: minus decimal logarithm of query selectivity, before applying the query LIMIT (0 means the query selects all rows, 5 means it selects 10^(-5) = 0.00001 subset of rows) - `IndexReferencesInQuery`: the number of index references in the unoptimized query execution plan (the same index may be referenced multiple times and counts separately) - `IndexReferencesInPlan`: the number of index references in the optimized query execution plan (the same index may be referenced multiple times and counts separately) Per-table: - `TotalRowsToReturnEstimated`: the sum of all estimates of returned rows from all completed queries - `TotalRowsToFetchEstimated`: the sum of all estimates of fetched rows from all completed queries - `TotalKeysToIterateEstimated`: the sum of all estimates of iterated primary keys from all completed queries - `TotalCostEstimated`: counts the sum of all cost estimates from all completed queries
…w id mapping (#2192) PrimaryKeyMap interface provides important API to retrieve exact with inverted ceiling, ceiling or floor Row IDs for given primary keys, which are used in index searcher. One of the API methods were not checking all out of range conditions. This bug does not affect current functionality, however, is an issue for riptano/cndb#15608, where it was discovered. Since there were no unit tests on the API, it was difficult to discover the bug. The bug in method `indexOf` of `AbstractBlockPackedReader` is fixed by checking the out-of-bound condition on a non-inverted value. Furthermore, since the same condition is repeated in several places, it moved into a separate method to to help prevent re-introducing the bug. Unit tests are added on primary key map API for exact with inverted ceiling, ceiling and floor methods for partition and row aware versions, and covers the bug case. `MapWalker` helper classes are introduced in each unit test to make test cases more readable While they look very similar, they defer due to differences in the test databases and behaviors between partition and row aware formats and between skinny and wide tables. Added unit tests are also helpful to prevent regressions during changes in implementations of primary key map APIs. Such changes are being implemented in riptano/cndb#15608.
### What is the issue Fixes: riptano/cndb#16484 ### What does this PR fix and why was it fixed When a partition restriction filters down to rows without vectors, we can get a list of 0 rows, and then we attempt to create a bounded heap of size 0, which is invalid. The added test catches this error. The solution is to exit the method early in the event of a zero length `SegmentRowIdOrdinalPairs` object. ``` java.lang.IllegalArgumentException: initialSize must be > 0 and < 2147483630; got: 0 at io.github.jbellis.jvector.util.AbstractLongHeap.<init>(AbstractLongHeap.java:52) at io.github.jbellis.jvector.util.BoundedLongHeap.<init>(BoundedLongHeap.java:47) at io.github.jbellis.jvector.util.BoundedLongHeap.<init>(BoundedLongHeap.java:43) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderByBruteForce(V2VectorIndexSearcher.java:328) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderByBruteForce(V2VectorIndexSearcher.java:283) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.searchInternal(V2VectorIndexSearcher.java:246) at org.apache.cassandra.index.sai.disk.v2.V2VectorIndexSearcher.orderBy(V2VectorIndexSearcher.java:172) at org.apache.cassandra.index.sai.disk.v1.Segment.orderBy(Segment.java:160) ```
…upported in UI scripting for multibranch but IBM TAAS needs it in the repository
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira