CASSANDRA-16116 emit oversized mutation metric #743

leeyutang · 2020-09-09T19:27:01Z

No description provided.

src/java/org/apache/cassandra/db/commitlog/CommitLog.java

src/java/org/apache/cassandra/db/Mutation.java

commit f9e589098e89417cd41ef627b3e1c7371986e3e1 Merge: b0dbc8bd57 d32eed9d68 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 11:28:48 2023 -0500 Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915 commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 10:12:30 2023 -0500 Update expected string for error message change. commit f2cafdac9c3e5bf838c10e0078895d56e3271370 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Thu Sep 14 08:41:07 2023 -0700 Optimize partition-aware queries to use bloom filter (#729) Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72). commit 9765b23b5b748286391b8e374bab24a31cb5d934 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Sep 14 09:47:19 2023 -0500 Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly commit 66c00e094942de72f2f1fdb780ac00239e0aa284 Author: Mike Adamson <madamson@datastax.com> Date: Thu Sep 14 14:33:41 2023 +0100 Remove smile-nlp and add internal Glove implementation for testing (#743) commit b13098e0cd5a9f961066c0059953b525f5bfa787 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Thu Sep 14 13:28:01 2023 +0200 CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692) * CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows * RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0 Author: Michael Marshall <michael.marshall@datastax.com> Date: Sun Sep 10 01:01:47 2023 -0500 Fix JsonTest and AnalyzerViewTest errors (#730) commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 19:34:09 2023 -0500 Reject SAI creation for invalid combinations of options (#736) commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee Author: Jonathan Ellis <jbellis@gmail.com> Date: Fri Sep 8 11:52:53 2023 -0500 Fix exceptions when executing complex queries (#735) * fix ReorderingRangeIterator.performSkipTo * fix NPE commit acc39a8281cf9163057c00ec9413219b210cc262 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 11:04:58 2023 -0500 Analyzer cleanup: add comments and fix docs (#733) commit 4989dff0f150bb65535fa598e53290bca1f3ce0c Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Sep 7 08:51:09 2023 -0500 Fix RowAwarePrimaryKey#hashCode for deferred keys (#725) * Fix RowAwarePrimaryKey#hashCode for deferred keys * Use recommendation from code review commit 2268c1c806632d62cf1a29b815740207a0e2939c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:38:39 2023 -0500 optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:22:07 2023 -0500 PKM is not threadsafe, need to allocate a new one for every request commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:10:11 2023 -0500 r/m slightly gratuitous IOException from PKM.Factory signature commit 73e96df69a1861d5a8bda04ab69252a480a78c62 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:35:40 2023 -0500 fix updateTestWithPredicate by removing broken assert (did not account for deleted rows) commit 36327a93b8c1651acd7af1d8b687dd92e08021e3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:34:44 2023 -0500 add failing updateTestWithPredicate commit 80eb443b65637dc1782cb3a745d4d56431f787d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:05:32 2023 -0500 comment upsertTest commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com> Date: Wed Sep 6 19:42:15 2023 +0100 Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710) * Optimize sorting in postQuerySort method * Remove need for prepareFor method following sort optimization in postQuerySort * Add testOrderResults * Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java * Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method * Use Pair.create instead of new Pair * Put back comment on ANN support only * Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair * Add licence to StorageAttachedIndexTest.java * Move StorageAttachedIndexTest.java up a directory * Simplify map to create listPairsVectorsScores commit 23555ab43101cd6010fe9cc320e7429779faa678 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Fri Sep 1 15:29:22 2023 -0700 Add row count field to columnFamilyStore (#713) * Add row count field to columnFamilyStore * Add test to row count field * delete used import * delete used function and separate test * Add row count field to columnFamilyStore * Add test t0 row count field * delete used import * delete used function and separate test * resolve version * return to updated version * restore other tests as vsearch branch --------- Co-authored-by: Michael Marshall <michael.marshall@datastax.com> commit 1eae389fadecfc6637207933fd6ae1739355dc00 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 31 17:59:53 2023 -0500 Fix NPE in LuceneAnalyzer#end (#721) When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested. Here is the code: https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100 We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE. Here is the NPE: ``` ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query java.lang.NullPointerException: null at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110) at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99) at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300) at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341) at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244) at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284) at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130) at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108) at org.apache.cassandra.transport.Message$Request.execute(Message.java:242) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131) at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) ``` I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix. commit ea97cc414ef3e68ad30e8c666ecf04b719352901 Author: Michael Marshall <michael.marshall@datastax.com> Date: Wed Aug 30 09:10:14 2023 -0500 Fix failing KDTreeIndexSearcherTest (#717) #680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass. commit cee55b1b2139627b331d9f7225c522ca5bee299b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:56:24 2023 -0500 Remove ability to configure unique query_analyzer (#712) commit c248f905cc13833f9857b9d750a60091afc5c38a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:18 2023 -0500 Fix failing distributed SAI tests using : operator (#715) commit 367e84adf0663e8baab3f9195f3eb81412640b8c Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:03 2023 -0500 Test compound predicate queries for : operator (#716) commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 11:10:02 2023 -0500 Refactor SAI analyzer configuration; add built in analyzers (#711) The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers. commit 67ef0f3794cbd2b9ff41b632205dd087a992e674 Author: Michael Marshall <michael.marshall@datastax.com> Date: Mon Aug 28 11:58:30 2023 -0500 Return more specific exception on incorrect usage of : operator (#705) * When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries. * When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message. * Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`. commit c26cbfa2298d422813e02b439f4db6494bb64a84 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Mon Aug 28 14:12:15 2023 +0200 Additional testcases for read query tracking with different data models (#698) * Additional testcases for read query tracking with different data models commit bf143a2b3fc97979e5e083f8eac7aca017cd618d Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 24 13:28:47 2023 -0500 Ignore default analyzer settings when classifying an SAI as analyzed (#706) ### Problem When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive. ### Details Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`: ``` CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? } ``` The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters". The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false. When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed". ### Solution * Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values. * Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values. * Updated some tests to use `=` on the non-analyzed SAI commit 7a6c4755f150ad647aae2bc806d8b91c423db14b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 22 15:57:08 2023 -0500 Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702) * Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation. * Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`. * Updated many tests. commit 77cb7f265e40600288fe90fffd74ef6e428f87fe Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Aug 18 10:48:54 2023 -0500 comment commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 17 16:55:51 2023 -0500 Fail SAI index creation when using analyzer on column in primary key (#699) * add testStandardAnalyzer * debugging wip * Fail index creation when using analyzer on column in primary key * Remove unnecessary debug logging * cleanup * Reject all attempts to add analyzer to index on pk columns * Rename noop analyzer test and add explanation --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created. This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`. commit 4f5585be198a949b0264eb64d0403e2adda9ab52 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Aug 17 17:41:18 2023 -0500 default vector cache size per segment bumped to 4MB commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb Author: Marianne Lyne Manaog <marianne.manaog@datastax.com> Date: Thu Jul 13 20:32:36 2023 +0100 Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Wed Aug 16 13:27:12 2023 -0500 VECTOR-79: Simplify population of VectorCache (#695) * use BFS at higher levels so we can avoid a separate cachedNodes set * try to naively cache as much of level 0 as possible * add VectorCacheTest * cleanup and revert to intended behavior of not caching L0 --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1c58e431e55d8b327314ddb82b30bf8eced59269 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Tue Aug 15 11:24:30 2023 -0500 Avoiding sorting node ordinals for level 0 (#697) * avoiding sorting node ordinals for level 0 commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e Author: Jonathan Ellis <jbellis@gmail.com> Date: Tue Aug 15 08:47:48 2023 -0500 extract OnDiskVectors to top level class so we can use it in debugging tools (#693) commit 2271a230761a416fbd5c016d47684b388be93989 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Aug 9 09:59:45 2023 -0500 VECTOR-77 fix reading vectors past the first 2GB mmap region commit 5705daa97f848454d8d29ab08234296906b3dbc4 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Thu Aug 10 18:06:28 2023 +0100 VECTOR-76: Add back vector size guardrail (#694) patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730 commit cb3bedccc255476300528183e8ba30a4e390f57f Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Tue Aug 8 16:52:16 2023 +0200 CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691) commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Fri Jul 28 15:42:42 2023 +0100 Add system property to reject non-float vectors, true by default (#689) commit 5a4e439dc73b479d79bd427472d382a53b6fe680 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jul 21 13:10:44 2023 +0200 r/m failing emptyIndexTest until we can address it (VECTOR-59) commit eec8b7b387cd041fbd54aef8f0090773962a9e49 Author: Piotr Kołaczkowski <pkolaczk@gmail.com> Date: Tue Jul 18 17:36:43 2023 +0200 VECTOR-69: Fix LWT test failure caused by null key bounds (#686) * VECTOR-69: Fix LWT test failure caused by null key bounds --------- Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com> Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Tue Jul 18 15:23:30 2023 +0100 VECTOR-68 Vector dimensions are not being validated correctly (#679) commit e22ff6977f986a4148f931bad4b04645bf93d2e9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jul 18 06:01:04 2023 +0200 set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest commit ccaba92656e78e63765da5c34f562f2c3b582bd0 Author: Mike Adamson <madamson@datastax.com> Date: Tue Jul 11 16:12:31 2023 +0100 VECTOR-55: Review REVIEWME comments: (#680) * VECTOR-55: Review REVIEWME comments: - Refactor search methods to only return long variant - Remove SSTableQueryContext in favour of QueryContext - Add checking to SegmentMetadata.toSegmentRowId method commit 338c8507de197570f4879019e0e1fc8e1f77025e Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jul 10 21:46:56 2023 -0500 Fix IndexError when querying a list/set/map column. commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 21:03:51 2023 +0200 add test that fails before decompose fix and passes after commit 664911ac7de336d3fcffd90083e6a220069bd4e2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:44:28 2023 +0200 re-use TypeUtil.decomposeVector commit e17493c95ac247ca1da7caa3ccd58be866958d6a Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:28:56 2023 +0200 fix casting in deserialize commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:30:02 2023 +0200 fix NPE when using ReadExecutionController.empty commit aa899f55f77df5b157e3e6afa930c377e4cacc57 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Fri Jul 7 10:12:28 2023 +0800 assert no single partition trace for SAI request commit 277251e1aad15e2a5aedaaaae3170d574d224fe5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 29 12:08:05 2023 -0500 - add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise - update QueryViewBuilder to include a count of sstables per index accessed commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jul 6 12:13:20 2023 -0500 simplify commit 52d19a1ef5605aebdb87385786dd323057267164 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Aug 30 14:17:12 2023 -0500 Squashed commit of the following: commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jul 5 12:03:22 2023 -0500 add test for creating index after data is added commit 0acaae364c19ed7556b66b5576762ecc131af5fd Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Jul 5 09:23:01 2023 -0500 Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance" This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5. commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:49 2023 -0500 VECTOR-64 set sstable_growth to 0.5 by default commit 3ca75046e6802ce5836a654459b01948c83d1d7b Merge: a4fa072833 10a2a31ae8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:18 2023 -0500 merge ds-trunk commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:24:41 2023 -0500 enable segment compaction by default for vector indexes -- we should prioritize read performance commit 08153b38549d700505d064c25d7a2070c2ff07aa Merge: 6a063b4f84 1a45fdc5f9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:18:38 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:13:51 2023 -0500 VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 26 15:57:10 2023 -0500 Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb. commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:13:00 2023 -0500 VECTOR-62 don't flush a graph that only contains deleted vectors commit 4a9afe205c20b1f863e89d4e02138557d225c3fc Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:11:11 2023 -0500 r/m obsolete comment commit a925dc22696bcbebcad17a460441e8b84ff97660 Merge: 7c7d160dd9 2b22984fe1 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 22 09:03:08 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53 Merge: a14b387372 7a5e374cea Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 21:42:51 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a14b387372ef3f220ece68b019acaafaf0d56f42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:52:13 2023 -0500 move new options to jvm17-server.options commit dc67582585a90628e026a693497793b7d8a298ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:47:27 2023 -0500 copy jvm11 options to jvm17 commit b6717410f925486a858c4c9414e417eed50950d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:30:48 2023 -0500 enable simd commit 92b9e6e2f136254fb1f8e50732809ea529617f48 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 15:31:09 2023 -0500 make it run under jdk 20 commit 3e985953637819ac8504955552ce0feba29bdd40 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:13:56 2023 -0500 update to lucene with simd commit 7a5e374cea785678670444121ad36be6170dd59b Merge: 5e5ae4de82 e9ddd5f0d9 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Tue Jun 20 15:00:24 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:57:18 2023 -0500 we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation. commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9 Merge: f2c9a7cb59 c605f8c9f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:20:02 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit f2c9a7cb59b113a349992011b739079106274c1e Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:19:34 2023 -0500 VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 09:13:24 2023 -0500 add LVT.testMultiplePostings (works fine) commit c605f8c9f037e724544b21fe2aadcf0d8503d86f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 17 12:44:36 2023 +0800 skip replica-filtering-protection for ANN requests commit 71935bd07c9218ba70f1ffde7b6db1e88009e529 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:20:06 2023 -0500 forgot that maxBruteForceRows has to stay non-final for Byteman commit b264afc07727305ce88ab2f18b8a2643d2510bd3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:10:19 2023 -0500 VECTOR-54 - validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine) - check query vectors with the same criteria as vectors to index commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:07:41 2023 -0500 if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended commit e0bdfdbe421a1cca30746604cda9ef5b992002bb Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:33:37 2023 -0500 add [failing] emptyIndexTest commit 75cba96f90730b32ea4a39375c72aaff4ffd6221 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 15 16:52:46 2023 -0500 re-use bitsets across searches commit 2af2ce8a321f7fea268d81c393da7f8f6d886776 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:05:52 2023 -0500 more asserts that graph is in sane state when we write it commit a63492686d13e12b799acdf50c6f0c20dd680f21 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:13:23 2023 -0500 update lucene commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 16:24:33 2023 -0500 add debugging information when node is not found on level commit 81c8e730bd4ca314a81bc13e30b197603a8eb005 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 11:57:24 2023 -0500 add similarityWithAnn test commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 14 18:26:02 2023 -0500 add tests to confirm that zero-length vectors are rejected commit 2510c1b987d846821e7095fbc2c368fa51d9a15d Merge: a3d9e33554 e86f91c568 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:09:22 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:08:43 2023 -0500 use ArrayList in CVV instead of HashMap since we know the keys are consecutive commit e86f91c568f8a82e83e61bd0dd768478a7584cfc Merge: 77c1bc11f3 8a7a6d9c4f Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:25:06 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:24:56 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4. commit cae30e9879c463d80de2a2b7e4a642979803c13a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:23:39 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3. commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:03:03 2023 -0500 replace our TODO entries with VSTODO to make them easier to find commit 415b3f29e8e7f1526e28655308a1937a1175e575 Merge: 2fd8b848be 481d29721a Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:02:16 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 481d29721a7993e9babd44f792be55aca925d41a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 12 21:26:43 2023 +0800 Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670) * Fix VectorMemtableIndex to handle max token and min/max bound * Fix Segment#intersects to compare bound instead of token and add tests for range search * make brute force rows per query for VectorMemtableIndex * apply feedback on Segment#intersects * add comments to VectorMemtableIndex#search * Fix SegmentTest commit 2fd8b848be82e62999e64b57cb9800d03de8e953 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 07:51:18 2023 -0500 clean up a couple REVIEWME commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 14:26:42 2023 +0800 fix flaky VectorDistributedTest commit 6c719a752e90e031916dddf6cc7b7db23627bc38 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:48:01 2023 -0500 fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:46:34 2023 -0500 use maxBruteForceRows when deciding whether to skip ANN commit 1859f7355231ce7972e3d4b7af70e5d2967da516 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:27:03 2023 -0500 simplify partitionKeySearchTest using euclidean distance commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:09:24 2023 -0500 typo commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:08:23 2023 -0500 rename methods that return Bits but had bitset in their names commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:06:51 2023 -0500 simplify skipANN commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 16:36:56 2023 +0800 optimize VectorIndexSearcher#searchPosting - return empty posting if key range is not found in current sstable - return empty posting if all row ids are shadowed - skip ANN if matching row ids are less than limit commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 20:54:41 2023 +0800 fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 12:31:53 2023 +0800 fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results commit 2e636529c63d0d726868d6df9b3e11d15d3871d8 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 08:31:39 2023 +0800 Vector-48: index#update is not triggered by partition/range deletion,… (#665) * Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys - during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set * add comments to VP * revert redundant variables * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit daa2623b880e4b82d99204fe718db22e52ae3b42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 13:54:38 2023 -0500 use mmapped builder in OnDiskHnswGraphTest commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 08:34:41 2023 -0500 fix logic in partitionKeySearchTest, test still fails commit a7f5a2f824085691da80ef62e37a430e5637c31c Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:57:34 2023 -0500 ignore invalid vectors during build against existing data, instead of failing the build commit 2af203646a2e57fca0a2521c31ba4269b0d8f326 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:43:47 2023 -0500 switch from ignoring zero vectors to throwing IRE commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:15:57 2023 -0500 Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)" This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564. commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:09:30 2023 -0500 don't attempt to add zero vector to cosine indexes commit a77792447d33600f69dd0a75a280a2b6dac51389 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 16:43:34 2023 -0500 add failing partitionKeySearchTest commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:52:08 2023 -0500 inline the test ops commit 6e5eab787a3add985dc30eca82c5dee2646f5564 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:51:09 2023 -0500 if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search) commit 518c055dfeabacf0fa0863e40bf814058f2ad981 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:28:57 2023 -0500 cleanup commit e922b03ffe5436dc1341dbe20ee7aaca9901058f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:27:53 2023 -0500 failing tests for primary key search commit 428e4b713e25a5820d02ca790c30e9f1477c0f44 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:21:59 2023 -0500 cleanup commit 5319bcdf3b694e861509b2a9f549f39bf51cd253 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:20:02 2023 -0500 move testInvalidColumnNameWithAnn to VectorInvalidQueryTest commit d8b64845e59c0bd3658a16df21dcc75564417e3b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:12:40 2023 -0500 upgrade lucene to reduce Integer boxing on build path commit a62356e49c0a32f61732b8525d2bb655f46dd767 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:11:55 2023 -0500 cleanup commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 08:29:34 2023 -0500 replace CHM with NBHMLong to avoid boxing commit 53d232ad2bcddefd72b10542f703cf36728c05f5 Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Fri Jun 9 12:17:46 2023 +0200 STAR-550 Handle SAI AbortedOperationException AbortedOperationException is thrown by SAI when index search hits a timeout. Now instead of allowing this exception to bubble up to the top and be eventually logged as error, we catch and swallow it in the InboundSink after creating the query response. Additionally now we also we set a proper error code (TIMEOUT) in the response, so the client has a hint on what happened. commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663 Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Fri Jun 9 08:44:10 2023 +0100 CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664) For offline services such as the compactor it's possible to not have live memtables. Account for this during flushing after removing indexes to avoid triggering an IndexOutOfBoundsException: Error happened while updating the schema java.lang.IndexOutOfBoundsException: Index: -1 at java.base/java.util.Collections$EmptyList.get(Collections.java:4483) at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106) at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115) at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552) at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576) at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837) at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420) commit c241f7d41ee46c28f60d691611f32e071dac6684 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 19:19:09 2023 -0500 read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors commit 32f021a4f0623e57d82d70c54350eab6f0f57db7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 18:08:30 2023 -0500 VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 17:37:08 2023 -0500 r/m node cache from query metrics since there is nothing the operator can act on there commit 890df8e6c36c5c9e1314dbcc0a681415401120dd Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 14:15:25 2023 -0500 add more information to exception when reading row offsets goes wrong commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:26:17 2023 -0500 revise ordinals cache as follows: // cache full levels including neighbors up to neighborsRamBudget, starting with the top level, // but always cache all levels above the bottom two levels -- this will be ~1% of the graph. // then on L1, cache at least the offsets // L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset commit e131014c5cba59624e0ed9fac142abe5340e2fc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:53:10 2023 -0500 add deletes test commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:34:42 2023 -0500 must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph commit 0654fbb61aec86deae69edf83fb1cf653ed7df65 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 08:44:28 2023 -0500 info -> debug for validatePerIndexComponents commit b289db2362b928f29b6c5c6289e09b33f0be1e88 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:08:18 2023 -0500 update lucene commit 347dae644b0d395014d8faf2affd48a6b2546e96 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:23:41 2023 -0500 undo burntest logging config change commit 87a6a84f38d0a0423acd9a0dda67d386876681b5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 15:20:06 2023 -0500 add debug logging commit 20013bc865c7fb2c34111c98362aaf997dc724dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 14:50:39 2023 -0500 add a bit more information to exception when we fail in index construction from disk commit 86389ceae8188b8e427d8a528fada2f913f1f633 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:25:31 2023 -0500 reduce test vector count from "all of them" to 200 commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f Merge: 2e2fce09ee f2c697ac46 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:21:20 2023 -0500 Merge branch 'wip' into vsearch commit f2c697ac46348301260c5947ade4ebed7dee90ee Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:28 2023 -0500 multipleSegmentsMultiplePostingsTest commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:04 2023 -0500 cleanup commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 23:33:00 2023 +0800 fix OnDiskOrdinalsMap to seek to segment offset before reading (#661) commit 9502f1040229f2be6619f7e4497dc25ca5126b39 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 10:04:38 2023 -0500 fix build commit 3535444b57366fe705e0fd8c1ca095e60ed2a706 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 09:19:41 2023 -0500 add write-only workload commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:15:53 2023 +0800 VECTOR-44: improve in-memory partition-restricted query perf (#660) * VECTOR-44: improve in-memory partition-restricted query perf - using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:06:37 2023 +0800 Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659) * VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment * return empty iterator if results are empty instead of ReorderingRangeIterator commit 87217b4f49935fbbd13c73f4d0aadb84836696a7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 07:31:46 2023 -0500 don't make areL0ShardsEnabled final, it breaks mocks commit 2bb3a299df43561369544b662674872575975b7c Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Wed Jun 7 10:56:08 2023 +0100 VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657) Using an non-existent column with an ANN expression triggers an NPE like so, java.lang.NullPointerException: null at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43) at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54) at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178) at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139) Use TM.getExistingColumn() which throws InvalidRequestException if the column is undefined. commit f29d4528fc43f34889e560acafa810eee1b88ba9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 21:18:47 2023 -0500 restore query timeouts commit 6e5734e52b41ab1e355d202ad0433440828fbb75 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 19:08:52 2023 -0500 looking at performance over time in LongVectorTest commit f96af8f0b9185163a49156bdab107801d2588bf0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:00:23 2023 -0500 log level = info for burn tests commit 1247cf372927cc185be5c9767a06fdd15f102dbe Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 17:48:00 2023 -0500 write the in-memory deleted ordinals to disk at the start of the postings component commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:36:27 2023 -0500 don't allocate unnecessary objects on the happy path of no tombstones commit 6d146a8460a17d146de756aab2ccbbf2abdb967a Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:19:56 2023 -0500 VECTOR-23: force UCS to not shard L0 commit 79d6e177093312bc1b951d6740ad45ce8a6c5875 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:01:05 2023 -0500 mark AutoCloseable commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 14:37:15 2023 -0500 Support string literals as vectors Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com> commit 73a53c37536835e462e3cf17c452027c7aa591ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable commit 51f4f419d31916a81088f171b158f1e002b5c800 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 2) fix race conditions across concurrent inserts + searches in memtable commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:35:32 2023 -0500 Revert "fix race conditions across concurrent inserts + searches in memtable" This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891. commit c4d41b492190fb2644f83bf4902acf71d1e4f891 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 fix race conditions across concurrent inserts + searches in memtable commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:28:30 2023 -0500 fix AOOB in bruteForceRows logic commit b5624f62c88378e03260399440edf4d335ccc3af Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:12:53 2023 -0500 call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>) commit 13ad40718737b54652c875d81ec20992b5828777 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:36:32 2023 -0500 create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead commit b83f4e4007e3d597963023b8ae32f4d0934b792d Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:00:44 2023 -0500 add ASL header to injections.md to make CI happy commit 16074336b1322d4bade40eb71562ca50221399dc Merge: 227c6c13a3 1ff6992354 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:48:04 2023 -0500 Merge branch 'VECTOR-37' into vsearch commit 227c6c13a3cfe42484fc560d629d54a93f8265e0 Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com> Date: Tue Jun 6 13:02:37 2023 +0200 CNDB-7007 return expired tables level from getLevels commit 312d07de8d297cdede39017354b678f2fb1b1006 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:31:26 2023 -0500 randomize our brute force threshold, which will get the actual index scans exercised more commit 0b897961869965629893c94250b7d4b1fb7c0f86 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 13:44:01 2023 +0800 Reference ANN sstable indexes in case of ann hybrid search (#653) * Reference ANN sstable indexes in case of ann hybrid search * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:55:14 2023 -0500 per-query hnsw metrics commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:16:14 2023 -0500 reduce test size to prevent Jenkins OOM commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03 Merge: 5a04dbcf67 c5cd09cebf Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:15:44 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 09:38:42 2023 +0800 Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30 commit 5a04dbcf6738d7681732e28518d6eb37f74357b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 11:21:15 2023 -0500 add injections.md from bdp repo commit baf9cce301a39f9506bd60c5a75d2011150f2fea Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 10:02:02 2023 -0500 VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 09:53:51 2023 -0500 limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed) commit 2f64695a90651d9e76eee0a3399032ea241c983c Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 14:55:20 2023 +0800 Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648) * Revert "Revert VECTOR-6" This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8. * Vector-6 take 2: - fix NPE in SelectStatement by skipping reversed() for null ColumnComparator - fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order - fix VectorLocalTest compilation - remove debug log in CassandraOnHeapHnsw commit 9da51b315d207417de4f3e005616e275c62c74cb Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 12:38:21 2023 +0800 fix SAI test failures in vsearch branch (#649) - BatchlogEndpointFilterTest - rangeRestrictedTest - SegmentFlushTest - SegmentMergerTest - OperationTest - RangeIntersectionIteratorTest commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 13:35:11 2023 -0500 create views for ordinals map so we don't have to open a new Reader for each method call commit c6385022ff2949c23e0dd0161a043cf4061f050c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 08:19:32 2023 -0500 add assert sstableContext != null commit f340e7590db578b71201326e82006f72dd491047 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:36 2023 -0500 on-disk Searcher bits should not need to be growable commit fd5cc7f989a47d03689dd29495b974697ad1338b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:18 2023 -0500 add asserts commit 6436e5f4ad49881a62c1442692edfedfe31968dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:21:50 2023 -0500 cleanup commit 32c8e005740d3c7ec0b1008091728e12260aecdd Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:57:18 2023 -0500 comment commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:32:24 2023 -0500 r/m unused search method returning iterators over PrimaryKey commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:25:30 2023 -0500 we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments commit e7a9186bf3426ad025e809481e47db85a7bf7190 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:22:35 2023 -0500 r/m obsolete FIXME commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:12:43 2023 -0500 add testAppendedGraphs commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:57:37 2023 -0500 move write-to-File to test class commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:46:01 2023 -0500 fix Ref leaks in test commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:26:05 2023 -0500 fix tests to use View commit 92797ced03ed26f9a5cce398504d6f0de519b899 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 3 19:19:38 2023 +0800 VECTOR-35: fix vector on-disk writer to append segments (#647) commit 22f5c01860367586d0cc39def7d975a6d6bae784 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 18:54:41 2023 -0500 add "Vector indexes only support ANN queries" check commit bfc0b056dbacd0a1eca86020e901f545dca7670e Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:34:24 2023 -0500 split invalid requests into separate test class commit f66768c35920101382c1975c9d0ba0e3301b7c61 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:28:56 2023 -0500 add specific error message when trying to do ANN without an index commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 12:31:22 2023 -0500 Revert VECTOR-6 This reverts commits: a37009c187edeba68389d239dc1b9f40519b1187 5565690fbe1056d4c159ddbe233fa22c7695320a e7733bb8f858a16b082b8a5c64d0322db6f6271a commit 3d496662b259470b505d88212074c436660c27ad Merge: fab67bb134 4bdae7e362 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:05:53 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 4bdae7e36213f5112efce085696dd24aaa0adfad Author: Mike Adamson <madamson@datastax.com> Date: Fri Jun 2 14:36:01 2023 +0100 Stabilise random tests using word2vec model vectors commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c Merge: 8e04280312 1f9179002a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 10:05:15 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 8e04280312583a18085e7e7b9d31810790b039f4 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 09:51:06 2023 -0500 Revert this after CNDB-6974 is fixed. commit fab67bb13499691f813c893cd39a3bbd406653d2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 08:59:12 2023 -0500 vector cache defaulting to 1MB per segment commit 67a2b7eba3c957a993726d37706643979e92a3a3 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 1 17:49:35 2023 -0500 Revert this after CNDB-6974 is fixed. commit 4538956b31d8c40d2e9b603b3b0a3392343e1853 Merge: 7518680a4a af4c83aef4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:17:14 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7518680a4a17cb2589cf06a9175befc10c9eab1a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:14:33 2023 -0500 re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea) commit d20bd8aedeb6232358dd16f901e2708f217b8108 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 15:35:17 2023 -0500 optimize vectorValue() with direct access to the mmap-ed region commit 24572025af5b7893279b5b03c58555b682f3abdb Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:33:55 2023 -0500 decomposeVector does not modify the underlying buffer, so no need to duplicate() here commit efef35d47a24222fae300d1cbd5a801e9ed79442 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:29:30 2023 -0500 specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads commit 63c27f5550d6c3e22d960b9a42b9097029e3b760 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 11:04:23 2023 -0500 default target size of 5GB commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Jun 1 07:20:38 2023 -0700 make the on-disk hnsw code threadsafe using FileHandle.createReader (#641) commit b194cae2e56133b946d0231768254064a47f01b6 Merge: 7ee6422084 23c2891e7a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 06:22:17 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1 Merge: c286ec0ee2 a37009c187 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:41:48 2023 -0500 Merge branch 'VECTOR-6' into vsearch commit c286ec0ee231ededded18e81357edebe72064c59 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:40:21 2023 -0500 add testLargeGraph commit a37009c187edeba68389d239dc1b9f40519b1187 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Thu Jun 1 08:50:07 2023 +0800 cleanup unused code commit 6246dbe3a970a0f4672701632003cdf576705c24 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:42 2023 -0500 comment commit c7420440c7d3c81eac96e716148d570ae3cceb1d Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:37 2023 -0500 encapsulate shadowedPrimaryKey better commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 15:57:52 2023 -0500 make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable) commit 55747d2351e915645697fa8dec404c749540456f Author: Andrés de la Peña <a.penya.garcia@gmail.com> Date: Wed May 31 19:28:47 2023 +0100 Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions commit 5565690fbe1056d4c159ddbe233fa22c7695320a Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 12:59:42 2023 -0500 cleanup and fix commit e7733bb8f858a16b082b8a5c64d0322db6f6271a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed May 24 16:01:26 2023 +0800 VECTOR-6: return vector results to client in ANN order instead of token order commit 771067d1475c94484da178236928d2bad78e00b1 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:33:55 2023 +0100 Fix annOrderingMustHaveLimit test commit 8dd76a73541e70da0da022dd463e6711a315f971 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:30:49 2023 +0100 Improve the validation of ORDER BY <column> ANN OF (#639) * Improve the validation of ORDER BY <column> ANN OF * Change to hasNonClusteredOrdering and improve limit message commit 1ef6f2e3418db19af23f0eae24cd344d72290455 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri May 26 18:34:25 2023 -0500 add vector similarity functions commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 09:39:45 2023 -0500 pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst commit 0ef2614346c9426a966004df220221f74353de70 Author: Jonathan Ellis <jbellis@gmail.com> Date: Wed May 31 06:22:41 2023 -0700 Add support for updates + deletes (#636) commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue May 30 16:32:02 2023 -0500 rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering commit caf6eb2154ddf57c88d69c6246404b65e183057d Author: Mike Adamson <madamson@datastax.com> Date: Tue May 30 22:41:29 2023 +0100 Fix V1SearchableIndex.reorderOneComponent to use segments (#635) commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Mon May 29 20:17:44 2023 +0200 VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY Before: SELEC…

…pache#743)

commit f9e589098e89417cd41ef627b3e1c7371986e3e1 Merge: b0dbc8bd57 d32eed9d68 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 11:28:48 2023 -0500 Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915 commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 10:12:30 2023 -0500 Update expected string for error message change. commit f2cafdac9c3e5bf838c10e0078895d56e3271370 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Thu Sep 14 08:41:07 2023 -0700 Optimize partition-aware queries to use bloom filter (#729) Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72). commit 9765b23b5b748286391b8e374bab24a31cb5d934 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Sep 14 09:47:19 2023 -0500 Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly commit 66c00e094942de72f2f1fdb780ac00239e0aa284 Author: Mike Adamson <madamson@datastax.com> Date: Thu Sep 14 14:33:41 2023 +0100 Remove smile-nlp and add internal Glove implementation for testing (#743) commit b13098e0cd5a9f961066c0059953b525f5bfa787 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Thu Sep 14 13:28:01 2023 +0200 CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692) * CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows * RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0 Author: Michael Marshall <michael.marshall@datastax.com> Date: Sun Sep 10 01:01:47 2023 -0500 Fix JsonTest and AnalyzerViewTest errors (#730) commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 19:34:09 2023 -0500 Reject SAI creation for invalid combinations of options (#736) commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee Author: Jonathan Ellis <jbellis@gmail.com> Date: Fri Sep 8 11:52:53 2023 -0500 Fix exceptions when executing complex queries (#735) * fix ReorderingRangeIterator.performSkipTo * fix NPE commit acc39a8281cf9163057c00ec9413219b210cc262 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 11:04:58 2023 -0500 Analyzer cleanup: add comments and fix docs (#733) commit 4989dff0f150bb65535fa598e53290bca1f3ce0c Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Sep 7 08:51:09 2023 -0500 Fix RowAwarePrimaryKey#hashCode for deferred keys (#725) * Fix RowAwarePrimaryKey#hashCode for deferred keys * Use recommendation from code review commit 2268c1c806632d62cf1a29b815740207a0e2939c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:38:39 2023 -0500 optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:22:07 2023 -0500 PKM is not threadsafe, need to allocate a new one for every request commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:10:11 2023 -0500 r/m slightly gratuitous IOException from PKM.Factory signature commit 73e96df69a1861d5a8bda04ab69252a480a78c62 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:35:40 2023 -0500 fix updateTestWithPredicate by removing broken assert (did not account for deleted rows) commit 36327a93b8c1651acd7af1d8b687dd92e08021e3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:34:44 2023 -0500 add failing updateTestWithPredicate commit 80eb443b65637dc1782cb3a745d4d56431f787d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:05:32 2023 -0500 comment upsertTest commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com> Date: Wed Sep 6 19:42:15 2023 +0100 Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710) * Optimize sorting in postQuerySort method * Remove need for prepareFor method following sort optimization in postQuerySort * Add testOrderResults * Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java * Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method * Use Pair.create instead of new Pair * Put back comment on ANN support only * Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair * Add licence to StorageAttachedIndexTest.java * Move StorageAttachedIndexTest.java up a directory * Simplify map to create listPairsVectorsScores commit 23555ab43101cd6010fe9cc320e7429779faa678 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Fri Sep 1 15:29:22 2023 -0700 Add row count field to columnFamilyStore (#713) * Add row count field to columnFamilyStore * Add test to row count field * delete used import * delete used function and separate test * Add row count field to columnFamilyStore * Add test t0 row count field * delete used import * delete used function and separate test * resolve version * return to updated version * restore other tests as vsearch branch --------- Co-authored-by: Michael Marshall <michael.marshall@datastax.com> commit 1eae389fadecfc6637207933fd6ae1739355dc00 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 31 17:59:53 2023 -0500 Fix NPE in LuceneAnalyzer#end (#721) When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested. Here is the code: https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100 We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE. Here is the NPE: ``` ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query java.lang.NullPointerException: null at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110) at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99) at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300) at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341) at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244) at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284) at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130) at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108) at org.apache.cassandra.transport.Message$Request.execute(Message.java:242) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131) at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) ``` I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix. commit ea97cc414ef3e68ad30e8c666ecf04b719352901 Author: Michael Marshall <michael.marshall@datastax.com> Date: Wed Aug 30 09:10:14 2023 -0500 Fix failing KDTreeIndexSearcherTest (#717) #680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass. commit cee55b1b2139627b331d9f7225c522ca5bee299b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:56:24 2023 -0500 Remove ability to configure unique query_analyzer (#712) commit c248f905cc13833f9857b9d750a60091afc5c38a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:18 2023 -0500 Fix failing distributed SAI tests using : operator (#715) commit 367e84adf0663e8baab3f9195f3eb81412640b8c Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:03 2023 -0500 Test compound predicate queries for : operator (#716) commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 11:10:02 2023 -0500 Refactor SAI analyzer configuration; add built in analyzers (#711) The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers. commit 67ef0f3794cbd2b9ff41b632205dd087a992e674 Author: Michael Marshall <michael.marshall@datastax.com> Date: Mon Aug 28 11:58:30 2023 -0500 Return more specific exception on incorrect usage of : operator (#705) * When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries. * When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message. * Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`. commit c26cbfa2298d422813e02b439f4db6494bb64a84 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Mon Aug 28 14:12:15 2023 +0200 Additional testcases for read query tracking with different data models (#698) * Additional testcases for read query tracking with different data models commit bf143a2b3fc97979e5e083f8eac7aca017cd618d Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 24 13:28:47 2023 -0500 Ignore default analyzer settings when classifying an SAI as analyzed (#706) ### Problem When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive. ### Details Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`: ``` CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? } ``` The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters". The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false. When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed". ### Solution * Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values. * Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values. * Updated some tests to use `=` on the non-analyzed SAI commit 7a6c4755f150ad647aae2bc806d8b91c423db14b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 22 15:57:08 2023 -0500 Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702) * Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation. * Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`. * Updated many tests. commit 77cb7f265e40600288fe90fffd74ef6e428f87fe Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Aug 18 10:48:54 2023 -0500 comment commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 17 16:55:51 2023 -0500 Fail SAI index creation when using analyzer on column in primary key (#699) * add testStandardAnalyzer * debugging wip * Fail index creation when using analyzer on column in primary key * Remove unnecessary debug logging * cleanup * Reject all attempts to add analyzer to index on pk columns * Rename noop analyzer test and add explanation --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created. This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`. commit 4f5585be198a949b0264eb64d0403e2adda9ab52 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Aug 17 17:41:18 2023 -0500 default vector cache size per segment bumped to 4MB commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb Author: Marianne Lyne Manaog <marianne.manaog@datastax.com> Date: Thu Jul 13 20:32:36 2023 +0100 Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Wed Aug 16 13:27:12 2023 -0500 VECTOR-79: Simplify population of VectorCache (#695) * use BFS at higher levels so we can avoid a separate cachedNodes set * try to naively cache as much of level 0 as possible * add VectorCacheTest * cleanup and revert to intended behavior of not caching L0 --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1c58e431e55d8b327314ddb82b30bf8eced59269 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Tue Aug 15 11:24:30 2023 -0500 Avoiding sorting node ordinals for level 0 (#697) * avoiding sorting node ordinals for level 0 commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e Author: Jonathan Ellis <jbellis@gmail.com> Date: Tue Aug 15 08:47:48 2023 -0500 extract OnDiskVectors to top level class so we can use it in debugging tools (#693) commit 2271a230761a416fbd5c016d47684b388be93989 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Aug 9 09:59:45 2023 -0500 VECTOR-77 fix reading vectors past the first 2GB mmap region commit 5705daa97f848454d8d29ab08234296906b3dbc4 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Thu Aug 10 18:06:28 2023 +0100 VECTOR-76: Add back vector size guardrail (#694) patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730 commit cb3bedccc255476300528183e8ba30a4e390f57f Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Tue Aug 8 16:52:16 2023 +0200 CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691) commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Fri Jul 28 15:42:42 2023 +0100 Add system property to reject non-float vectors, true by default (#689) commit 5a4e439dc73b479d79bd427472d382a53b6fe680 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jul 21 13:10:44 2023 +0200 r/m failing emptyIndexTest until we can address it (VECTOR-59) commit eec8b7b387cd041fbd54aef8f0090773962a9e49 Author: Piotr Kołaczkowski <pkolaczk@gmail.com> Date: Tue Jul 18 17:36:43 2023 +0200 VECTOR-69: Fix LWT test failure caused by null key bounds (#686) * VECTOR-69: Fix LWT test failure caused by null key bounds --------- Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com> Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Tue Jul 18 15:23:30 2023 +0100 VECTOR-68 Vector dimensions are not being validated correctly (#679) commit e22ff6977f986a4148f931bad4b04645bf93d2e9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jul 18 06:01:04 2023 +0200 set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest commit ccaba92656e78e63765da5c34f562f2c3b582bd0 Author: Mike Adamson <madamson@datastax.com> Date: Tue Jul 11 16:12:31 2023 +0100 VECTOR-55: Review REVIEWME comments: (#680) * VECTOR-55: Review REVIEWME comments: - Refactor search methods to only return long variant - Remove SSTableQueryContext in favour of QueryContext - Add checking to SegmentMetadata.toSegmentRowId method commit 338c8507de197570f4879019e0e1fc8e1f77025e Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jul 10 21:46:56 2023 -0500 Fix IndexError when querying a list/set/map column. commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 21:03:51 2023 +0200 add test that fails before decompose fix and passes after commit 664911ac7de336d3fcffd90083e6a220069bd4e2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:44:28 2023 +0200 re-use TypeUtil.decomposeVector commit e17493c95ac247ca1da7caa3ccd58be866958d6a Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:28:56 2023 +0200 fix casting in deserialize commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:30:02 2023 +0200 fix NPE when using ReadExecutionController.empty commit aa899f55f77df5b157e3e6afa930c377e4cacc57 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Fri Jul 7 10:12:28 2023 +0800 assert no single partition trace for SAI request commit 277251e1aad15e2a5aedaaaae3170d574d224fe5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 29 12:08:05 2023 -0500 - add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise - update QueryViewBuilder to include a count of sstables per index accessed commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jul 6 12:13:20 2023 -0500 simplify commit 52d19a1ef5605aebdb87385786dd323057267164 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Aug 30 14:17:12 2023 -0500 Squashed commit of the following: commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jul 5 12:03:22 2023 -0500 add test for creating index after data is added commit 0acaae364c19ed7556b66b5576762ecc131af5fd Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Jul 5 09:23:01 2023 -0500 Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance" This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5. commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:49 2023 -0500 VECTOR-64 set sstable_growth to 0.5 by default commit 3ca75046e6802ce5836a654459b01948c83d1d7b Merge: a4fa072833 10a2a31ae8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:18 2023 -0500 merge ds-trunk commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:24:41 2023 -0500 enable segment compaction by default for vector indexes -- we should prioritize read performance commit 08153b38549d700505d064c25d7a2070c2ff07aa Merge: 6a063b4f84 1a45fdc5f9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:18:38 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:13:51 2023 -0500 VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 26 15:57:10 2023 -0500 Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb. commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:13:00 2023 -0500 VECTOR-62 don't flush a graph that only contains deleted vectors commit 4a9afe205c20b1f863e89d4e02138557d225c3fc Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:11:11 2023 -0500 r/m obsolete comment commit a925dc22696bcbebcad17a460441e8b84ff97660 Merge: 7c7d160dd9 2b22984fe1 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 22 09:03:08 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53 Merge: a14b387372 7a5e374cea Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 21:42:51 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a14b387372ef3f220ece68b019acaafaf0d56f42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:52:13 2023 -0500 move new options to jvm17-server.options commit dc67582585a90628e026a693497793b7d8a298ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:47:27 2023 -0500 copy jvm11 options to jvm17 commit b6717410f925486a858c4c9414e417eed50950d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:30:48 2023 -0500 enable simd commit 92b9e6e2f136254fb1f8e50732809ea529617f48 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 15:31:09 2023 -0500 make it run under jdk 20 commit 3e985953637819ac8504955552ce0feba29bdd40 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:13:56 2023 -0500 update to lucene with simd commit 7a5e374cea785678670444121ad36be6170dd59b Merge: 5e5ae4de82 e9ddd5f0d9 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Tue Jun 20 15:00:24 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:57:18 2023 -0500 we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation. commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9 Merge: f2c9a7cb59 c605f8c9f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:20:02 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit f2c9a7cb59b113a349992011b739079106274c1e Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:19:34 2023 -0500 VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 09:13:24 2023 -0500 add LVT.testMultiplePostings (works fine) commit c605f8c9f037e724544b21fe2aadcf0d8503d86f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 17 12:44:36 2023 +0800 skip replica-filtering-protection for ANN requests commit 71935bd07c9218ba70f1ffde7b6db1e88009e529 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:20:06 2023 -0500 forgot that maxBruteForceRows has to stay non-final for Byteman commit b264afc07727305ce88ab2f18b8a2643d2510bd3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:10:19 2023 -0500 VECTOR-54 - validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine) - check query vectors with the same criteria as vectors to index commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:07:41 2023 -0500 if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended commit e0bdfdbe421a1cca30746604cda9ef5b992002bb Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:33:37 2023 -0500 add [failing] emptyIndexTest commit 75cba96f90730b32ea4a39375c72aaff4ffd6221 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 15 16:52:46 2023 -0500 re-use bitsets across searches commit 2af2ce8a321f7fea268d81c393da7f8f6d886776 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:05:52 2023 -0500 more asserts that graph is in sane state when we write it commit a63492686d13e12b799acdf50c6f0c20dd680f21 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:13:23 2023 -0500 update lucene commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 16:24:33 2023 -0500 add debugging information when node is not found on level commit 81c8e730bd4ca314a81bc13e30b197603a8eb005 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 11:57:24 2023 -0500 add similarityWithAnn test commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 14 18:26:02 2023 -0500 add tests to confirm that zero-length vectors are rejected commit 2510c1b987d846821e7095fbc2c368fa51d9a15d Merge: a3d9e33554 e86f91c568 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:09:22 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:08:43 2023 -0500 use ArrayList in CVV instead of HashMap since we know the keys are consecutive commit e86f91c568f8a82e83e61bd0dd768478a7584cfc Merge: 77c1bc11f3 8a7a6d9c4f Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:25:06 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:24:56 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4. commit cae30e9879c463d80de2a2b7e4a642979803c13a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:23:39 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3. commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:03:03 2023 -0500 replace our TODO entries with VSTODO to make them easier to find commit 415b3f29e8e7f1526e28655308a1937a1175e575 Merge: 2fd8b848be 481d29721a Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:02:16 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 481d29721a7993e9babd44f792be55aca925d41a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 12 21:26:43 2023 +0800 Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670) * Fix VectorMemtableIndex to handle max token and min/max bound * Fix Segment#intersects to compare bound instead of token and add tests for range search * make brute force rows per query for VectorMemtableIndex * apply feedback on Segment#intersects * add comments to VectorMemtableIndex#search * Fix SegmentTest commit 2fd8b848be82e62999e64b57cb9800d03de8e953 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 07:51:18 2023 -0500 clean up a couple REVIEWME commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 14:26:42 2023 +0800 fix flaky VectorDistributedTest commit 6c719a752e90e031916dddf6cc7b7db23627bc38 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:48:01 2023 -0500 fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:46:34 2023 -0500 use maxBruteForceRows when deciding whether to skip ANN commit 1859f7355231ce7972e3d4b7af70e5d2967da516 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:27:03 2023 -0500 simplify partitionKeySearchTest using euclidean distance commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:09:24 2023 -0500 typo commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:08:23 2023 -0500 rename methods that return Bits but had bitset in their names commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:06:51 2023 -0500 simplify skipANN commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 16:36:56 2023 +0800 optimize VectorIndexSearcher#searchPosting - return empty posting if key range is not found in current sstable - return empty posting if all row ids are shadowed - skip ANN if matching row ids are less than limit commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 20:54:41 2023 +0800 fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 12:31:53 2023 +0800 fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results commit 2e636529c63d0d726868d6df9b3e11d15d3871d8 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 08:31:39 2023 +0800 Vector-48: index#update is not triggered by partition/range deletion,… (#665) * Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys - during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set * add comments to VP * revert redundant variables * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit daa2623b880e4b82d99204fe718db22e52ae3b42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 13:54:38 2023 -0500 use mmapped builder in OnDiskHnswGraphTest commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 08:34:41 2023 -0500 fix logic in partitionKeySearchTest, test still fails commit a7f5a2f824085691da80ef62e37a430e5637c31c Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:57:34 2023 -0500 ignore invalid vectors during build against existing data, instead of failing the build commit 2af203646a2e57fca0a2521c31ba4269b0d8f326 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:43:47 2023 -0500 switch from ignoring zero vectors to throwing IRE commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:15:57 2023 -0500 Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)" This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564. commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:09:30 2023 -0500 don't attempt to add zero vector to cosine indexes commit a77792447d33600f69dd0a75a280a2b6dac51389 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 16:43:34 2023 -0500 add failing partitionKeySearchTest commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:52:08 2023 -0500 inline the test ops commit 6e5eab787a3add985dc30eca82c5dee2646f5564 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:51:09 2023 -0500 if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search) commit 518c055dfeabacf0fa0863e40bf814058f2ad981 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:28:57 2023 -0500 cleanup commit e922b03ffe5436dc1341dbe20ee7aaca9901058f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:27:53 2023 -0500 failing tests for primary key search commit 428e4b713e25a5820d02ca790c30e9f1477c0f44 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:21:59 2023 -0500 cleanup commit 5319bcdf3b694e861509b2a9f549f39bf51cd253 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:20:02 2023 -0500 move testInvalidColumnNameWithAnn to VectorInvalidQueryTest commit d8b64845e59c0bd3658a16df21dcc75564417e3b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:12:40 2023 -0500 upgrade lucene to reduce Integer boxing on build path commit a62356e49c0a32f61732b8525d2bb655f46dd767 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:11:55 2023 -0500 cleanup commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 08:29:34 2023 -0500 replace CHM with NBHMLong to avoid boxing commit 53d232ad2bcddefd72b10542f703cf36728c05f5 Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Fri Jun 9 12:17:46 2023 +0200 STAR-550 Handle SAI AbortedOperationException AbortedOperationException is thrown by SAI when index search hits a timeout. Now instead of allowing this exception to bubble up to the top and be eventually logged as error, we catch and swallow it in the InboundSink after creating the query response. Additionally now we also we set a proper error code (TIMEOUT) in the response, so the client has a hint on what happened. commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663 Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Fri Jun 9 08:44:10 2023 +0100 CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664) For offline services such as the compactor it's possible to not have live memtables. Account for this during flushing after removing indexes to avoid triggering an IndexOutOfBoundsException: Error happened while updating the schema java.lang.IndexOutOfBoundsException: Index: -1 at java.base/java.util.Collections$EmptyList.get(Collections.java:4483) at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106) at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115) at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552) at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576) at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837) at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420) commit c241f7d41ee46c28f60d691611f32e071dac6684 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 19:19:09 2023 -0500 read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors commit 32f021a4f0623e57d82d70c54350eab6f0f57db7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 18:08:30 2023 -0500 VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 17:37:08 2023 -0500 r/m node cache from query metrics since there is nothing the operator can act on there commit 890df8e6c36c5c9e1314dbcc0a681415401120dd Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 14:15:25 2023 -0500 add more information to exception when reading row offsets goes wrong commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:26:17 2023 -0500 revise ordinals cache as follows: // cache full levels including neighbors up to neighborsRamBudget, starting with the top level, // but always cache all levels above the bottom two levels -- this will be ~1% of the graph. // then on L1, cache at least the offsets // L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset commit e131014c5cba59624e0ed9fac142abe5340e2fc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:53:10 2023 -0500 add deletes test commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:34:42 2023 -0500 must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph commit 0654fbb61aec86deae69edf83fb1cf653ed7df65 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 08:44:28 2023 -0500 info -> debug for validatePerIndexComponents commit b289db2362b928f29b6c5c6289e09b33f0be1e88 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:08:18 2023 -0500 update lucene commit 347dae644b0d395014d8faf2affd48a6b2546e96 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:23:41 2023 -0500 undo burntest logging config change commit 87a6a84f38d0a0423acd9a0dda67d386876681b5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 15:20:06 2023 -0500 add debug logging commit 20013bc865c7fb2c34111c98362aaf997dc724dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 14:50:39 2023 -0500 add a bit more information to exception when we fail in index construction from disk commit 86389ceae8188b8e427d8a528fada2f913f1f633 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:25:31 2023 -0500 reduce test vector count from "all of them" to 200 commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f Merge: 2e2fce09ee f2c697ac46 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:21:20 2023 -0500 Merge branch 'wip' into vsearch commit f2c697ac46348301260c5947ade4ebed7dee90ee Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:28 2023 -0500 multipleSegmentsMultiplePostingsTest commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:04 2023 -0500 cleanup commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 23:33:00 2023 +0800 fix OnDiskOrdinalsMap to seek to segment offset before reading (#661) commit 9502f1040229f2be6619f7e4497dc25ca5126b39 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 10:04:38 2023 -0500 fix build commit 3535444b57366fe705e0fd8c1ca095e60ed2a706 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 09:19:41 2023 -0500 add write-only workload commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:15:53 2023 +0800 VECTOR-44: improve in-memory partition-restricted query perf (#660) * VECTOR-44: improve in-memory partition-restricted query perf - using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:06:37 2023 +0800 Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659) * VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment * return empty iterator if results are empty instead of ReorderingRangeIterator commit 87217b4f49935fbbd13c73f4d0aadb84836696a7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 07:31:46 2023 -0500 don't make areL0ShardsEnabled final, it breaks mocks commit 2bb3a299df43561369544b662674872575975b7c Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Wed Jun 7 10:56:08 2023 +0100 VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657) Using an non-existent column with an ANN expression triggers an NPE like so, java.lang.NullPointerException: null at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43) at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54) at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178) at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139) Use TM.getExistingColumn() which throws InvalidRequestException if the column is undefined. commit f29d4528fc43f34889e560acafa810eee1b88ba9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 21:18:47 2023 -0500 restore query timeouts commit 6e5734e52b41ab1e355d202ad0433440828fbb75 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 19:08:52 2023 -0500 looking at performance over time in LongVectorTest commit f96af8f0b9185163a49156bdab107801d2588bf0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:00:23 2023 -0500 log level = info for burn tests commit 1247cf372927cc185be5c9767a06fdd15f102dbe Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 17:48:00 2023 -0500 write the in-memory deleted ordinals to disk at the start of the postings component commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:36:27 2023 -0500 don't allocate unnecessary objects on the happy path of no tombstones commit 6d146a8460a17d146de756aab2ccbbf2abdb967a Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:19:56 2023 -0500 VECTOR-23: force UCS to not shard L0 commit 79d6e177093312bc1b951d6740ad45ce8a6c5875 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:01:05 2023 -0500 mark AutoCloseable commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 14:37:15 2023 -0500 Support string literals as vectors Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com> commit 73a53c37536835e462e3cf17c452027c7aa591ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable commit 51f4f419d31916a81088f171b158f1e002b5c800 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 2) fix race conditions across concurrent inserts + searches in memtable commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:35:32 2023 -0500 Revert "fix race conditions across concurrent inserts + searches in memtable" This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891. commit c4d41b492190fb2644f83bf4902acf71d1e4f891 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 fix race conditions across concurrent inserts + searches in memtable commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:28:30 2023 -0500 fix AOOB in bruteForceRows logic commit b5624f62c88378e03260399440edf4d335ccc3af Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:12:53 2023 -0500 call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>) commit 13ad40718737b54652c875d81ec20992b5828777 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:36:32 2023 -0500 create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead commit b83f4e4007e3d597963023b8ae32f4d0934b792d Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:00:44 2023 -0500 add ASL header to injections.md to make CI happy commit 16074336b1322d4bade40eb71562ca50221399dc Merge: 227c6c13a3 1ff6992354 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:48:04 2023 -0500 Merge branch 'VECTOR-37' into vsearch commit 227c6c13a3cfe42484fc560d629d54a93f8265e0 Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com> Date: Tue Jun 6 13:02:37 2023 +0200 CNDB-7007 return expired tables level from getLevels commit 312d07de8d297cdede39017354b678f2fb1b1006 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:31:26 2023 -0500 randomize our brute force threshold, which will get the actual index scans exercised more commit 0b897961869965629893c94250b7d4b1fb7c0f86 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 13:44:01 2023 +0800 Reference ANN sstable indexes in case of ann hybrid search (#653) * Reference ANN sstable indexes in case of ann hybrid search * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:55:14 2023 -0500 per-query hnsw metrics commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:16:14 2023 -0500 reduce test size to prevent Jenkins OOM commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03 Merge: 5a04dbcf67 c5cd09cebf Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:15:44 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 09:38:42 2023 +0800 Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30 commit 5a04dbcf6738d7681732e28518d6eb37f74357b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 11:21:15 2023 -0500 add injections.md from bdp repo commit baf9cce301a39f9506bd60c5a75d2011150f2fea Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 10:02:02 2023 -0500 VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 09:53:51 2023 -0500 limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed) commit 2f64695a90651d9e76eee0a3399032ea241c983c Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 14:55:20 2023 +0800 Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648) * Revert "Revert VECTOR-6" This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8. * Vector-6 take 2: - fix NPE in SelectStatement by skipping reversed() for null ColumnComparator - fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order - fix VectorLocalTest compilation - remove debug log in CassandraOnHeapHnsw commit 9da51b315d207417de4f3e005616e275c62c74cb Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 12:38:21 2023 +0800 fix SAI test failures in vsearch branch (#649) - BatchlogEndpointFilterTest - rangeRestrictedTest - SegmentFlushTest - SegmentMergerTest - OperationTest - RangeIntersectionIteratorTest commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 13:35:11 2023 -0500 create views for ordinals map so we don't have to open a new Reader for each method call commit c6385022ff2949c23e0dd0161a043cf4061f050c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 08:19:32 2023 -0500 add assert sstableContext != null commit f340e7590db578b71201326e82006f72dd491047 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:36 2023 -0500 on-disk Searcher bits should not need to be growable commit fd5cc7f989a47d03689dd29495b974697ad1338b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:18 2023 -0500 add asserts commit 6436e5f4ad49881a62c1442692edfedfe31968dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:21:50 2023 -0500 cleanup commit 32c8e005740d3c7ec0b1008091728e12260aecdd Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:57:18 2023 -0500 comment commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:32:24 2023 -0500 r/m unused search method returning iterators over PrimaryKey commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:25:30 2023 -0500 we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments commit e7a9186bf3426ad025e809481e47db85a7bf7190 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:22:35 2023 -0500 r/m obsolete FIXME commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:12:43 2023 -0500 add testAppendedGraphs commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:57:37 2023 -0500 move write-to-File to test class commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:46:01 2023 -0500 fix Ref leaks in test commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:26:05 2023 -0500 fix tests to use View commit 92797ced03ed26f9a5cce398504d6f0de519b899 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 3 19:19:38 2023 +0800 VECTOR-35: fix vector on-disk writer to append segments (#647) commit 22f5c01860367586d0cc39def7d975a6d6bae784 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 18:54:41 2023 -0500 add "Vector indexes only support ANN queries" check commit bfc0b056dbacd0a1eca86020e901f545dca7670e Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:34:24 2023 -0500 split invalid requests into separate test class commit f66768c35920101382c1975c9d0ba0e3301b7c61 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:28:56 2023 -0500 add specific error message when trying to do ANN without an index commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 12:31:22 2023 -0500 Revert VECTOR-6 This reverts commits: a37009c187edeba68389d239dc1b9f40519b1187 5565690fbe1056d4c159ddbe233fa22c7695320a e7733bb8f858a16b082b8a5c64d0322db6f6271a commit 3d496662b259470b505d88212074c436660c27ad Merge: fab67bb134 4bdae7e362 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:05:53 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 4bdae7e36213f5112efce085696dd24aaa0adfad Author: Mike Adamson <madamson@datastax.com> Date: Fri Jun 2 14:36:01 2023 +0100 Stabilise random tests using word2vec model vectors commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c Merge: 8e04280312 1f9179002a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 10:05:15 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 8e04280312583a18085e7e7b9d31810790b039f4 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 09:51:06 2023 -0500 Revert this after CNDB-6974 is fixed. commit fab67bb13499691f813c893cd39a3bbd406653d2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 08:59:12 2023 -0500 vector cache defaulting to 1MB per segment commit 67a2b7eba3c957a993726d37706643979e92a3a3 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 1 17:49:35 2023 -0500 Revert this after CNDB-6974 is fixed. commit 4538956b31d8c40d2e9b603b3b0a3392343e1853 Merge: 7518680a4a af4c83aef4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:17:14 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7518680a4a17cb2589cf06a9175befc10c9eab1a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:14:33 2023 -0500 re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea) commit d20bd8aedeb6232358dd16f901e2708f217b8108 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 15:35:17 2023 -0500 optimize vectorValue() with direct access to the mmap-ed region commit 24572025af5b7893279b5b03c58555b682f3abdb Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:33:55 2023 -0500 decomposeVector does not modify the underlying buffer, so no need to duplicate() here commit efef35d47a24222fae300d1cbd5a801e9ed79442 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:29:30 2023 -0500 specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads commit 63c27f5550d6c3e22d960b9a42b9097029e3b760 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 11:04:23 2023 -0500 default target size of 5GB commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Jun 1 07:20:38 2023 -0700 make the on-disk hnsw code threadsafe using FileHandle.createReader (#641) commit b194cae2e56133b946d0231768254064a47f01b6 Merge: 7ee6422084 23c2891e7a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 06:22:17 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1 Merge: c286ec0ee2 a37009c187 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:41:48 2023 -0500 Merge branch 'VECTOR-6' into vsearch commit c286ec0ee231ededded18e81357edebe72064c59 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:40:21 2023 -0500 add testLargeGraph commit a37009c187edeba68389d239dc1b9f40519b1187 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Thu Jun 1 08:50:07 2023 +0800 cleanup unused code commit 6246dbe3a970a0f4672701632003cdf576705c24 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:42 2023 -0500 comment commit c7420440c7d3c81eac96e716148d570ae3cceb1d Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:37 2023 -0500 encapsulate shadowedPrimaryKey better commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 15:57:52 2023 -0500 make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable) commit 55747d2351e915645697fa8dec404c749540456f Author: Andrés de la Peña <a.penya.garcia@gmail.com> Date: Wed May 31 19:28:47 2023 +0100 Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions commit 5565690fbe1056d4c159ddbe233fa22c7695320a Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 12:59:42 2023 -0500 cleanup and fix commit e7733bb8f858a16b082b8a5c64d0322db6f6271a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed May 24 16:01:26 2023 +0800 VECTOR-6: return vector results to client in ANN order instead of token order commit 771067d1475c94484da178236928d2bad78e00b1 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:33:55 2023 +0100 Fix annOrderingMustHaveLimit test commit 8dd76a73541e70da0da022dd463e6711a315f971 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:30:49 2023 +0100 Improve the validation of ORDER BY <column> ANN OF (#639) * Improve the validation of ORDER BY <column> ANN OF * Change to hasNonClusteredOrdering and improve limit message commit 1ef6f2e3418db19af23f0eae24cd344d72290455 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri May 26 18:34:25 2023 -0500 add vector similarity functions commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 09:39:45 2023 -0500 pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst commit 0ef2614346c9426a966004df220221f74353de70 Author: Jonathan Ellis <jbellis@gmail.com> Date: Wed May 31 06:22:41 2023 -0700 Add support for updates + deletes (#636) commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue May 30 16:32:02 2023 -0500 rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering commit caf6eb2154ddf57c88d69c6246404b65e183057d Author: Mike Adamson <madamson@datastax.com> Date: Tue May 30 22:41:29 2023 +0100 Fix V1SearchableIndex.reorderOneComponent to use segments (#635) commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Mon May 29 20:17:44 2023 +0200 VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY Before: SELEC…

…pache#743)

commit f9e589098e89417cd41ef627b3e1c7371986e3e1 Merge: b0dbc8bd57 d32eed9d68 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 11:28:48 2023 -0500 Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915 commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Sep 18 10:12:30 2023 -0500 Update expected string for error message change. commit f2cafdac9c3e5bf838c10e0078895d56e3271370 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Thu Sep 14 08:41:07 2023 -0700 Optimize partition-aware queries to use bloom filter (#729) Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72). commit 9765b23b5b748286391b8e374bab24a31cb5d934 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Sep 14 09:47:19 2023 -0500 Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly commit 66c00e094942de72f2f1fdb780ac00239e0aa284 Author: Mike Adamson <madamson@datastax.com> Date: Thu Sep 14 14:33:41 2023 +0100 Remove smile-nlp and add internal Glove implementation for testing (#743) commit b13098e0cd5a9f961066c0059953b525f5bfa787 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Thu Sep 14 13:28:01 2023 +0200 CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692) * CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows * RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0 Author: Michael Marshall <michael.marshall@datastax.com> Date: Sun Sep 10 01:01:47 2023 -0500 Fix JsonTest and AnalyzerViewTest errors (#730) commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 19:34:09 2023 -0500 Reject SAI creation for invalid combinations of options (#736) commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee Author: Jonathan Ellis <jbellis@gmail.com> Date: Fri Sep 8 11:52:53 2023 -0500 Fix exceptions when executing complex queries (#735) * fix ReorderingRangeIterator.performSkipTo * fix NPE commit acc39a8281cf9163057c00ec9413219b210cc262 Author: Michael Marshall <michael.marshall@datastax.com> Date: Fri Sep 8 11:04:58 2023 -0500 Analyzer cleanup: add comments and fix docs (#733) commit 4989dff0f150bb65535fa598e53290bca1f3ce0c Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Sep 7 08:51:09 2023 -0500 Fix RowAwarePrimaryKey#hashCode for deferred keys (#725) * Fix RowAwarePrimaryKey#hashCode for deferred keys * Use recommendation from code review commit 2268c1c806632d62cf1a29b815740207a0e2939c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:38:39 2023 -0500 optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:22:07 2023 -0500 PKM is not threadsafe, need to allocate a new one for every request commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 13:10:11 2023 -0500 r/m slightly gratuitous IOException from PKM.Factory signature commit 73e96df69a1861d5a8bda04ab69252a480a78c62 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:35:40 2023 -0500 fix updateTestWithPredicate by removing broken assert (did not account for deleted rows) commit 36327a93b8c1651acd7af1d8b687dd92e08021e3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:34:44 2023 -0500 add failing updateTestWithPredicate commit 80eb443b65637dc1782cb3a745d4d56431f787d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Sep 6 11:05:32 2023 -0500 comment upsertTest commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com> Date: Wed Sep 6 19:42:15 2023 +0100 Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710) * Optimize sorting in postQuerySort method * Remove need for prepareFor method following sort optimization in postQuerySort * Add testOrderResults * Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java * Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method * Use Pair.create instead of new Pair * Put back comment on ANN support only * Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair * Add licence to StorageAttachedIndexTest.java * Move StorageAttachedIndexTest.java up a directory * Simplify map to create listPairsVectorsScores commit 23555ab43101cd6010fe9cc320e7429779faa678 Author: qannap <130002578+qannap@users.noreply.github.com> Date: Fri Sep 1 15:29:22 2023 -0700 Add row count field to columnFamilyStore (#713) * Add row count field to columnFamilyStore * Add test to row count field * delete used import * delete used function and separate test * Add row count field to columnFamilyStore * Add test t0 row count field * delete used import * delete used function and separate test * resolve version * return to updated version * restore other tests as vsearch branch --------- Co-authored-by: Michael Marshall <michael.marshall@datastax.com> commit 1eae389fadecfc6637207933fd6ae1739355dc00 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 31 17:59:53 2023 -0500 Fix NPE in LuceneAnalyzer#end (#721) When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested. Here is the code: https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100 We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE. Here is the NPE: ``` ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query java.lang.NullPointerException: null at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110) at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99) at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300) at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341) at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244) at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284) at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130) at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65) at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123) at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108) at org.apache.cassandra.transport.Message$Request.execute(Message.java:242) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111) at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131) at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) ``` I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix. commit ea97cc414ef3e68ad30e8c666ecf04b719352901 Author: Michael Marshall <michael.marshall@datastax.com> Date: Wed Aug 30 09:10:14 2023 -0500 Fix failing KDTreeIndexSearcherTest (#717) #680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass. commit cee55b1b2139627b331d9f7225c522ca5bee299b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:56:24 2023 -0500 Remove ability to configure unique query_analyzer (#712) commit c248f905cc13833f9857b9d750a60091afc5c38a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:18 2023 -0500 Fix failing distributed SAI tests using : operator (#715) commit 367e84adf0663e8baab3f9195f3eb81412640b8c Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 21:55:03 2023 -0500 Test compound predicate queries for : operator (#716) commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 29 11:10:02 2023 -0500 Refactor SAI analyzer configuration; add built in analyzers (#711) The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers. commit 67ef0f3794cbd2b9ff41b632205dd087a992e674 Author: Michael Marshall <michael.marshall@datastax.com> Date: Mon Aug 28 11:58:30 2023 -0500 Return more specific exception on incorrect usage of : operator (#705) * When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries. * When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message. * Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`. commit c26cbfa2298d422813e02b439f4db6494bb64a84 Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Mon Aug 28 14:12:15 2023 +0200 Additional testcases for read query tracking with different data models (#698) * Additional testcases for read query tracking with different data models commit bf143a2b3fc97979e5e083f8eac7aca017cd618d Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 24 13:28:47 2023 -0500 Ignore default analyzer settings when classifying an SAI as analyzed (#706) ### Problem When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive. ### Details Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`: ``` CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? } CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? } ``` The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters". The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false. When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed". ### Solution * Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values. * Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values. * Updated some tests to use `=` on the non-analyzed SAI commit 7a6c4755f150ad647aae2bc806d8b91c423db14b Author: Michael Marshall <michael.marshall@datastax.com> Date: Tue Aug 22 15:57:08 2023 -0500 Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702) * Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation. * Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`. * Updated many tests. commit 77cb7f265e40600288fe90fffd74ef6e428f87fe Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Aug 18 10:48:54 2023 -0500 comment commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1 Author: Michael Marshall <michael.marshall@datastax.com> Date: Thu Aug 17 16:55:51 2023 -0500 Fail SAI index creation when using analyzer on column in primary key (#699) * add testStandardAnalyzer * debugging wip * Fail index creation when using analyzer on column in primary key * Remove unnecessary debug logging * cleanup * Reject all attempts to add analyzer to index on pk columns * Rename noop analyzer test and add explanation --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created. This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`. commit 4f5585be198a949b0264eb64d0403e2adda9ab52 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Aug 17 17:41:18 2023 -0500 default vector cache size per segment bumped to 4MB commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb Author: Marianne Lyne Manaog <marianne.manaog@datastax.com> Date: Thu Jul 13 20:32:36 2023 +0100 Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Wed Aug 16 13:27:12 2023 -0500 VECTOR-79: Simplify population of VectorCache (#695) * use BFS at higher levels so we can avoid a separate cachedNodes set * try to naively cache as much of level 0 as possible * add VectorCacheTest * cleanup and revert to intended behavior of not caching L0 --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1c58e431e55d8b327314ddb82b30bf8eced59269 Author: Shaunak Das <ShaunakDas88@users.noreply.github.com> Date: Tue Aug 15 11:24:30 2023 -0500 Avoiding sorting node ordinals for level 0 (#697) * avoiding sorting node ordinals for level 0 commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e Author: Jonathan Ellis <jbellis@gmail.com> Date: Tue Aug 15 08:47:48 2023 -0500 extract OnDiskVectors to top level class so we can use it in debugging tools (#693) commit 2271a230761a416fbd5c016d47684b388be93989 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Aug 9 09:59:45 2023 -0500 VECTOR-77 fix reading vectors past the first 2GB mmap region commit 5705daa97f848454d8d29ab08234296906b3dbc4 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Thu Aug 10 18:06:28 2023 +0100 VECTOR-76: Add back vector size guardrail (#694) patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730 commit cb3bedccc255476300528183e8ba30a4e390f57f Author: Jakub Żytka <jakub.zytka@gmail.com> Date: Tue Aug 8 16:52:16 2023 +0200 CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691) commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Fri Jul 28 15:42:42 2023 +0100 Add system property to reject non-float vectors, true by default (#689) commit 5a4e439dc73b479d79bd427472d382a53b6fe680 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jul 21 13:10:44 2023 +0200 r/m failing emptyIndexTest until we can address it (VECTOR-59) commit eec8b7b387cd041fbd54aef8f0090773962a9e49 Author: Piotr Kołaczkowski <pkolaczk@gmail.com> Date: Tue Jul 18 17:36:43 2023 +0200 VECTOR-69: Fix LWT test failure caused by null key bounds (#686) * VECTOR-69: Fix LWT test failure caused by null key bounds --------- Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com> Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3 Author: Andrés de la Peña <adelapena@users.noreply.github.com> Date: Tue Jul 18 15:23:30 2023 +0100 VECTOR-68 Vector dimensions are not being validated correctly (#679) commit e22ff6977f986a4148f931bad4b04645bf93d2e9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jul 18 06:01:04 2023 +0200 set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest commit ccaba92656e78e63765da5c34f562f2c3b582bd0 Author: Mike Adamson <madamson@datastax.com> Date: Tue Jul 11 16:12:31 2023 +0100 VECTOR-55: Review REVIEWME comments: (#680) * VECTOR-55: Review REVIEWME comments: - Refactor search methods to only return long variant - Remove SSTableQueryContext in favour of QueryContext - Add checking to SegmentMetadata.toSegmentRowId method commit 338c8507de197570f4879019e0e1fc8e1f77025e Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jul 10 21:46:56 2023 -0500 Fix IndexError when querying a list/set/map column. commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 21:03:51 2023 +0200 add test that fails before decompose fix and passes after commit 664911ac7de336d3fcffd90083e6a220069bd4e2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:44:28 2023 +0200 re-use TypeUtil.decomposeVector commit e17493c95ac247ca1da7caa3ccd58be866958d6a Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:28:56 2023 +0200 fix casting in deserialize commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sun Jul 9 13:30:02 2023 +0200 fix NPE when using ReadExecutionController.empty commit aa899f55f77df5b157e3e6afa930c377e4cacc57 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Fri Jul 7 10:12:28 2023 +0800 assert no single partition trace for SAI request commit 277251e1aad15e2a5aedaaaae3170d574d224fe5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 29 12:08:05 2023 -0500 - add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise - update QueryViewBuilder to include a count of sstables per index accessed commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jul 6 12:13:20 2023 -0500 simplify commit 52d19a1ef5605aebdb87385786dd323057267164 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Aug 30 14:17:12 2023 -0500 Squashed commit of the following: commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jul 5 12:03:22 2023 -0500 add test for creating index after data is added commit 0acaae364c19ed7556b66b5576762ecc131af5fd Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Wed Jul 5 09:23:01 2023 -0500 Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance" This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5. commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:49 2023 -0500 VECTOR-64 set sstable_growth to 0.5 by default commit 3ca75046e6802ce5836a654459b01948c83d1d7b Merge: a4fa072833 10a2a31ae8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:54:18 2023 -0500 merge ds-trunk commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:24:41 2023 -0500 enable segment compaction by default for vector indexes -- we should prioritize read performance commit 08153b38549d700505d064c25d7a2070c2ff07aa Merge: 6a063b4f84 1a45fdc5f9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:18:38 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 28 10:13:51 2023 -0500 VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 26 15:57:10 2023 -0500 Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb. commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:13:00 2023 -0500 VECTOR-62 don't flush a graph that only contains deleted vectors commit 4a9afe205c20b1f863e89d4e02138557d225c3fc Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 22 16:11:11 2023 -0500 r/m obsolete comment commit a925dc22696bcbebcad17a460441e8b84ff97660 Merge: 7c7d160dd9 2b22984fe1 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 22 09:03:08 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53 Merge: a14b387372 7a5e374cea Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 21:42:51 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a14b387372ef3f220ece68b019acaafaf0d56f42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:52:13 2023 -0500 move new options to jvm17-server.options commit dc67582585a90628e026a693497793b7d8a298ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:47:27 2023 -0500 copy jvm11 options to jvm17 commit b6717410f925486a858c4c9414e417eed50950d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:30:48 2023 -0500 enable simd commit 92b9e6e2f136254fb1f8e50732809ea529617f48 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 15:31:09 2023 -0500 make it run under jdk 20 commit 3e985953637819ac8504955552ce0feba29bdd40 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 20 17:13:56 2023 -0500 update to lucene with simd commit 7a5e374cea785678670444121ad36be6170dd59b Merge: 5e5ae4de82 e9ddd5f0d9 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Tue Jun 20 15:00:24 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:57:18 2023 -0500 we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation. commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9 Merge: f2c9a7cb59 c605f8c9f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:20:02 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit f2c9a7cb59b113a349992011b739079106274c1e Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 10:19:34 2023 -0500 VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 19 09:13:24 2023 -0500 add LVT.testMultiplePostings (works fine) commit c605f8c9f037e724544b21fe2aadcf0d8503d86f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 17 12:44:36 2023 +0800 skip replica-filtering-protection for ANN requests commit 71935bd07c9218ba70f1ffde7b6db1e88009e529 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:20:06 2023 -0500 forgot that maxBruteForceRows has to stay non-final for Byteman commit b264afc07727305ce88ab2f18b8a2643d2510bd3 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:10:19 2023 -0500 VECTOR-54 - validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine) - check query vectors with the same criteria as vectors to index commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 20:07:41 2023 -0500 if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended commit e0bdfdbe421a1cca30746604cda9ef5b992002bb Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:33:37 2023 -0500 add [failing] emptyIndexTest commit 75cba96f90730b32ea4a39375c72aaff4ffd6221 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 15 16:52:46 2023 -0500 re-use bitsets across searches commit 2af2ce8a321f7fea268d81c393da7f8f6d886776 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:05:52 2023 -0500 more asserts that graph is in sane state when we write it commit a63492686d13e12b799acdf50c6f0c20dd680f21 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 17:13:23 2023 -0500 update lucene commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 16:24:33 2023 -0500 add debugging information when node is not found on level commit 81c8e730bd4ca314a81bc13e30b197603a8eb005 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 16 11:57:24 2023 -0500 add similarityWithAnn test commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 14 18:26:02 2023 -0500 add tests to confirm that zero-length vectors are rejected commit 2510c1b987d846821e7095fbc2c368fa51d9a15d Merge: a3d9e33554 e86f91c568 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:09:22 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 17:08:43 2023 -0500 use ArrayList in CVV instead of HashMap since we know the keys are consecutive commit e86f91c568f8a82e83e61bd0dd768478a7584cfc Merge: 77c1bc11f3 8a7a6d9c4f Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:25:06 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:24:56 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4. commit cae30e9879c463d80de2a2b7e4a642979803c13a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Mon Jun 12 09:23:39 2023 -0500 Revert "Revert this after CNDB-6974 is fixed." This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3. commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:03:03 2023 -0500 replace our TODO entries with VSTODO to make them easier to find commit 415b3f29e8e7f1526e28655308a1937a1175e575 Merge: 2fd8b848be 481d29721a Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 09:02:16 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 481d29721a7993e9babd44f792be55aca925d41a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 12 21:26:43 2023 +0800 Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670) * Fix VectorMemtableIndex to handle max token and min/max bound * Fix Segment#intersects to compare bound instead of token and add tests for range search * make brute force rows per query for VectorMemtableIndex * apply feedback on Segment#intersects * add comments to VectorMemtableIndex#search * Fix SegmentTest commit 2fd8b848be82e62999e64b57cb9800d03de8e953 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 12 07:51:18 2023 -0500 clean up a couple REVIEWME commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 14:26:42 2023 +0800 fix flaky VectorDistributedTest commit 6c719a752e90e031916dddf6cc7b7db23627bc38 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:48:01 2023 -0500 fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:46:34 2023 -0500 use maxBruteForceRows when deciding whether to skip ANN commit 1859f7355231ce7972e3d4b7af70e5d2967da516 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:27:03 2023 -0500 simplify partitionKeySearchTest using euclidean distance commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:09:24 2023 -0500 typo commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:08:23 2023 -0500 rename methods that return Bits but had bitset in their names commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 10:06:51 2023 -0500 simplify skipANN commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 16:36:56 2023 +0800 optimize VectorIndexSearcher#searchPosting - return empty posting if key range is not found in current sstable - return empty posting if all row ids are shadowed - skip ANN if matching row ids are less than limit commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 20:54:41 2023 +0800 fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 10 12:31:53 2023 +0800 fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results commit 2e636529c63d0d726868d6df9b3e11d15d3871d8 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sun Jun 11 08:31:39 2023 +0800 Vector-48: index#update is not triggered by partition/range deletion,… (#665) * Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys - during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set * add comments to VP * revert redundant variables * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit daa2623b880e4b82d99204fe718db22e52ae3b42 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 13:54:38 2023 -0500 use mmapped builder in OnDiskHnswGraphTest commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 10 08:34:41 2023 -0500 fix logic in partitionKeySearchTest, test still fails commit a7f5a2f824085691da80ef62e37a430e5637c31c Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:57:34 2023 -0500 ignore invalid vectors during build against existing data, instead of failing the build commit 2af203646a2e57fca0a2521c31ba4269b0d8f326 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:43:47 2023 -0500 switch from ignoring zero vectors to throwing IRE commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:15:57 2023 -0500 Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)" This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564. commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 17:09:30 2023 -0500 don't attempt to add zero vector to cosine indexes commit a77792447d33600f69dd0a75a280a2b6dac51389 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 16:43:34 2023 -0500 add failing partitionKeySearchTest commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:52:08 2023 -0500 inline the test ops commit 6e5eab787a3add985dc30eca82c5dee2646f5564 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:51:09 2023 -0500 if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search) commit 518c055dfeabacf0fa0863e40bf814058f2ad981 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:28:57 2023 -0500 cleanup commit e922b03ffe5436dc1341dbe20ee7aaca9901058f Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:27:53 2023 -0500 failing tests for primary key search commit 428e4b713e25a5820d02ca790c30e9f1477c0f44 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:21:59 2023 -0500 cleanup commit 5319bcdf3b694e861509b2a9f549f39bf51cd253 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:20:02 2023 -0500 move testInvalidColumnNameWithAnn to VectorInvalidQueryTest commit d8b64845e59c0bd3658a16df21dcc75564417e3b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:12:40 2023 -0500 upgrade lucene to reduce Integer boxing on build path commit a62356e49c0a32f61732b8525d2bb655f46dd767 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 10:11:55 2023 -0500 cleanup commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 9 08:29:34 2023 -0500 replace CHM with NBHMLong to avoid boxing commit 53d232ad2bcddefd72b10542f703cf36728c05f5 Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Fri Jun 9 12:17:46 2023 +0200 STAR-550 Handle SAI AbortedOperationException AbortedOperationException is thrown by SAI when index search hits a timeout. Now instead of allowing this exception to bubble up to the top and be eventually logged as error, we catch and swallow it in the InboundSink after creating the query response. Additionally now we also we set a proper error code (TIMEOUT) in the response, so the client has a hint on what happened. commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663 Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Fri Jun 9 08:44:10 2023 +0100 CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664) For offline services such as the compactor it's possible to not have live memtables. Account for this during flushing after removing indexes to avoid triggering an IndexOutOfBoundsException: Error happened while updating the schema java.lang.IndexOutOfBoundsException: Index: -1 at java.base/java.util.Collections$EmptyList.get(Collections.java:4483) at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106) at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115) at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552) at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576) at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837) at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420) commit c241f7d41ee46c28f60d691611f32e071dac6684 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 19:19:09 2023 -0500 read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors commit 32f021a4f0623e57d82d70c54350eab6f0f57db7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 18:08:30 2023 -0500 VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 17:37:08 2023 -0500 r/m node cache from query metrics since there is nothing the operator can act on there commit 890df8e6c36c5c9e1314dbcc0a681415401120dd Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 14:15:25 2023 -0500 add more information to exception when reading row offsets goes wrong commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:26:17 2023 -0500 revise ordinals cache as follows: // cache full levels including neighbors up to neighborsRamBudget, starting with the top level, // but always cache all levels above the bottom two levels -- this will be ~1% of the graph. // then on L1, cache at least the offsets // L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset commit e131014c5cba59624e0ed9fac142abe5340e2fc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:53:10 2023 -0500 add deletes test commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 13:34:42 2023 -0500 must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph commit 0654fbb61aec86deae69edf83fb1cf653ed7df65 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 8 08:44:28 2023 -0500 info -> debug for validatePerIndexComponents commit b289db2362b928f29b6c5c6289e09b33f0be1e88 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:08:18 2023 -0500 update lucene commit 347dae644b0d395014d8faf2affd48a6b2546e96 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 18:23:41 2023 -0500 undo burntest logging config change commit 87a6a84f38d0a0423acd9a0dda67d386876681b5 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 15:20:06 2023 -0500 add debug logging commit 20013bc865c7fb2c34111c98362aaf997dc724dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 14:50:39 2023 -0500 add a bit more information to exception when we fail in index construction from disk commit 86389ceae8188b8e427d8a528fada2f913f1f633 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:25:31 2023 -0500 reduce test vector count from "all of them" to 200 commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f Merge: 2e2fce09ee f2c697ac46 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:21:20 2023 -0500 Merge branch 'wip' into vsearch commit f2c697ac46348301260c5947ade4ebed7dee90ee Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:28 2023 -0500 multipleSegmentsMultiplePostingsTest commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 12:16:04 2023 -0500 cleanup commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 23:33:00 2023 +0800 fix OnDiskOrdinalsMap to seek to segment offset before reading (#661) commit 9502f1040229f2be6619f7e4497dc25ca5126b39 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 10:04:38 2023 -0500 fix build commit 3535444b57366fe705e0fd8c1ca095e60ed2a706 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 09:19:41 2023 -0500 add write-only workload commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:15:53 2023 +0800 VECTOR-44: improve in-memory partition-restricted query perf (#660) * VECTOR-44: improve in-memory partition-restricted query perf - using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed Jun 7 21:06:37 2023 +0800 Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659) * VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment * return empty iterator if results are empty instead of ReorderingRangeIterator commit 87217b4f49935fbbd13c73f4d0aadb84836696a7 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed Jun 7 07:31:46 2023 -0500 don't make areL0ShardsEnabled final, it breaks mocks commit 2bb3a299df43561369544b662674872575975b7c Author: Matt Fleming <matt@codeblueprint.co.uk> Date: Wed Jun 7 10:56:08 2023 +0100 VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657) Using an non-existent column with an ANN expression triggers an NPE like so, java.lang.NullPointerException: null at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43) at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54) at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178) at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139) Use TM.getExistingColumn() which throws InvalidRequestException if the column is undefined. commit f29d4528fc43f34889e560acafa810eee1b88ba9 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 21:18:47 2023 -0500 restore query timeouts commit 6e5734e52b41ab1e355d202ad0433440828fbb75 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 19:08:52 2023 -0500 looking at performance over time in LongVectorTest commit f96af8f0b9185163a49156bdab107801d2588bf0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:00:23 2023 -0500 log level = info for burn tests commit 1247cf372927cc185be5c9767a06fdd15f102dbe Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 17:48:00 2023 -0500 write the in-memory deleted ordinals to disk at the start of the postings component commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:36:27 2023 -0500 don't allocate unnecessary objects on the happy path of no tombstones commit 6d146a8460a17d146de756aab2ccbbf2abdb967a Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:19:56 2023 -0500 VECTOR-23: force UCS to not shard L0 commit 79d6e177093312bc1b951d6740ad45ce8a6c5875 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 16:01:05 2023 -0500 mark AutoCloseable commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 14:37:15 2023 -0500 Support string literals as vectors Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com> commit 73a53c37536835e462e3cf17c452027c7aa591ed Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable commit 51f4f419d31916a81088f171b158f1e002b5c800 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 (take 2) fix race conditions across concurrent inserts + searches in memtable commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:35:32 2023 -0500 Revert "fix race conditions across concurrent inserts + searches in memtable" This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891. commit c4d41b492190fb2644f83bf4902acf71d1e4f891 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 12:25:39 2023 -0500 fix race conditions across concurrent inserts + searches in memtable commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:28:30 2023 -0500 fix AOOB in bruteForceRows logic commit b5624f62c88378e03260399440edf4d335ccc3af Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 10:12:53 2023 -0500 call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>) commit 13ad40718737b54652c875d81ec20992b5828777 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:36:32 2023 -0500 create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead commit b83f4e4007e3d597963023b8ae32f4d0934b792d Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 08:00:44 2023 -0500 add ASL header to injections.md to make CI happy commit 16074336b1322d4bade40eb71562ca50221399dc Merge: 227c6c13a3 1ff6992354 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:48:04 2023 -0500 Merge branch 'VECTOR-37' into vsearch commit 227c6c13a3cfe42484fc560d629d54a93f8265e0 Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com> Date: Tue Jun 6 13:02:37 2023 +0200 CNDB-7007 return expired tables level from getLevels commit 312d07de8d297cdede39017354b678f2fb1b1006 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue Jun 6 07:31:26 2023 -0500 randomize our brute force threshold, which will get the actual index scans exercised more commit 0b897961869965629893c94250b7d4b1fb7c0f86 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 13:44:01 2023 +0800 Reference ANN sstable indexes in case of ann hybrid search (#653) * Reference ANN sstable indexes in case of ann hybrid search * simplify --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:55:14 2023 -0500 per-query hnsw metrics commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:16:14 2023 -0500 reduce test size to prevent Jenkins OOM commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03 Merge: 5a04dbcf67 c5cd09cebf Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 21:15:44 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Tue Jun 6 09:38:42 2023 +0800 Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30 commit 5a04dbcf6738d7681732e28518d6eb37f74357b8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 11:21:15 2023 -0500 add injections.md from bdp repo commit baf9cce301a39f9506bd60c5a75d2011150f2fea Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 10:02:02 2023 -0500 VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Mon Jun 5 09:53:51 2023 -0500 limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed) commit 2f64695a90651d9e76eee0a3399032ea241c983c Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 14:55:20 2023 +0800 Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648) * Revert "Revert VECTOR-6" This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8. * Vector-6 take 2: - fix NPE in SelectStatement by skipping reversed() for null ColumnComparator - fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order - fix VectorLocalTest compilation - remove debug log in CassandraOnHeapHnsw commit 9da51b315d207417de4f3e005616e275c62c74cb Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Mon Jun 5 12:38:21 2023 +0800 fix SAI test failures in vsearch branch (#649) - BatchlogEndpointFilterTest - rangeRestrictedTest - SegmentFlushTest - SegmentMergerTest - OperationTest - RangeIntersectionIteratorTest commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 13:35:11 2023 -0500 create views for ordinals map so we don't have to open a new Reader for each method call commit c6385022ff2949c23e0dd0161a043cf4061f050c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 08:19:32 2023 -0500 add assert sstableContext != null commit f340e7590db578b71201326e82006f72dd491047 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:36 2023 -0500 on-disk Searcher bits should not need to be growable commit fd5cc7f989a47d03689dd29495b974697ad1338b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:37:18 2023 -0500 add asserts commit 6436e5f4ad49881a62c1442692edfedfe31968dc Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:21:50 2023 -0500 cleanup commit 32c8e005740d3c7ec0b1008091728e12260aecdd Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:57:18 2023 -0500 comment commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 13:32:24 2023 -0500 r/m unused search method returning iterators over PrimaryKey commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:25:30 2023 -0500 we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments commit e7a9186bf3426ad025e809481e47db85a7bf7190 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:22:35 2023 -0500 r/m obsolete FIXME commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 07:12:43 2023 -0500 add testAppendedGraphs commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:57:37 2023 -0500 move write-to-File to test class commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:46:01 2023 -0500 fix Ref leaks in test commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b Author: Jonathan Ellis <jbellis@datastax.com> Date: Sat Jun 3 06:26:05 2023 -0500 fix tests to use View commit 92797ced03ed26f9a5cce398504d6f0de519b899 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Sat Jun 3 19:19:38 2023 +0800 VECTOR-35: fix vector on-disk writer to append segments (#647) commit 22f5c01860367586d0cc39def7d975a6d6bae784 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 18:54:41 2023 -0500 add "Vector indexes only support ANN queries" check commit bfc0b056dbacd0a1eca86020e901f545dca7670e Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:34:24 2023 -0500 split invalid requests into separate test class commit f66768c35920101382c1975c9d0ba0e3301b7c61 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 17:28:56 2023 -0500 add specific error message when trying to do ANN without an index commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 12:31:22 2023 -0500 Revert VECTOR-6 This reverts commits: a37009c187edeba68389d239dc1b9f40519b1187 5565690fbe1056d4c159ddbe233fa22c7695320a e7733bb8f858a16b082b8a5c64d0322db6f6271a commit 3d496662b259470b505d88212074c436660c27ad Merge: fab67bb134 4bdae7e362 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 11:05:53 2023 -0500 Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch commit 4bdae7e36213f5112efce085696dd24aaa0adfad Author: Mike Adamson <madamson@datastax.com> Date: Fri Jun 2 14:36:01 2023 +0100 Stabilise random tests using word2vec model vectors commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c Merge: 8e04280312 1f9179002a Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 10:05:15 2023 -0500 Merge remote-tracking branch 'datastax/ds-trunk' into vsearch commit 8e04280312583a18085e7e7b9d31810790b039f4 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Fri Jun 2 09:51:06 2023 -0500 Revert this after CNDB-6974 is fixed. commit fab67bb13499691f813c893cd39a3bbd406653d2 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri Jun 2 08:59:12 2023 -0500 vector cache defaulting to 1MB per segment commit 67a2b7eba3c957a993726d37706643979e92a3a3 Author: Jeremiah D Jordan <jeremiah@datastax.com> Date: Thu Jun 1 17:49:35 2023 -0500 Revert this after CNDB-6974 is fixed. commit 4538956b31d8c40d2e9b603b3b0a3392343e1853 Merge: 7518680a4a af4c83aef4 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:17:14 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7518680a4a17cb2589cf06a9175befc10c9eab1a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 16:14:33 2023 -0500 re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea) commit d20bd8aedeb6232358dd16f901e2708f217b8108 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 15:35:17 2023 -0500 optimize vectorValue() with direct access to the mmap-ed region commit 24572025af5b7893279b5b03c58555b682f3abdb Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:33:55 2023 -0500 decomposeVector does not modify the underlying buffer, so no need to duplicate() here commit efef35d47a24222fae300d1cbd5a801e9ed79442 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 14:29:30 2023 -0500 specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads commit 63c27f5550d6c3e22d960b9a42b9097029e3b760 Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 11:04:23 2023 -0500 default target size of 5GB commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669 Author: Jonathan Ellis <jbellis@gmail.com> Date: Thu Jun 1 07:20:38 2023 -0700 make the on-disk hnsw code threadsafe using FileHandle.createReader (#641) commit b194cae2e56133b946d0231768254064a47f01b6 Merge: 7ee6422084 23c2891e7a Author: Jonathan Ellis <jbellis@datastax.com> Date: Thu Jun 1 06:22:17 2023 -0500 Merge branch 'ds-trunk' into vsearch commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1 Merge: c286ec0ee2 a37009c187 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:41:48 2023 -0500 Merge branch 'VECTOR-6' into vsearch commit c286ec0ee231ededded18e81357edebe72064c59 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 20:40:21 2023 -0500 add testLargeGraph commit a37009c187edeba68389d239dc1b9f40519b1187 Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Thu Jun 1 08:50:07 2023 +0800 cleanup unused code commit 6246dbe3a970a0f4672701632003cdf576705c24 Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:42 2023 -0500 comment commit c7420440c7d3c81eac96e716148d570ae3cceb1d Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 16:43:37 2023 -0500 encapsulate shadowedPrimaryKey better commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 15:57:52 2023 -0500 make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable) commit 55747d2351e915645697fa8dec404c749540456f Author: Andrés de la Peña <a.penya.garcia@gmail.com> Date: Wed May 31 19:28:47 2023 +0100 Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions commit 5565690fbe1056d4c159ddbe233fa22c7695320a Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 12:59:42 2023 -0500 cleanup and fix commit e7733bb8f858a16b082b8a5c64d0322db6f6271a Author: Zhao Yang <zhaoyangsingapore@gmail.com> Date: Wed May 24 16:01:26 2023 +0800 VECTOR-6: return vector results to client in ANN order instead of token order commit 771067d1475c94484da178236928d2bad78e00b1 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:33:55 2023 +0100 Fix annOrderingMustHaveLimit test commit 8dd76a73541e70da0da022dd463e6711a315f971 Author: Mike Adamson <madamson@datastax.com> Date: Wed May 31 18:30:49 2023 +0100 Improve the validation of ORDER BY <column> ANN OF (#639) * Improve the validation of ORDER BY <column> ANN OF * Change to hasNonClusteredOrdering and improve limit message commit 1ef6f2e3418db19af23f0eae24cd344d72290455 Author: Jonathan Ellis <jbellis@datastax.com> Date: Fri May 26 18:34:25 2023 -0500 add vector similarity functions commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b Author: Jonathan Ellis <jbellis@datastax.com> Date: Wed May 31 09:39:45 2023 -0500 pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst commit 0ef2614346c9426a966004df220221f74353de70 Author: Jonathan Ellis <jbellis@gmail.com> Date: Wed May 31 06:22:41 2023 -0700 Add support for updates + deletes (#636) commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975 Author: Jonathan Ellis <jbellis@datastax.com> Date: Tue May 30 16:32:02 2023 -0500 rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering commit caf6eb2154ddf57c88d69c6246404b65e183057d Author: Mike Adamson <madamson@datastax.com> Date: Tue May 30 22:41:29 2023 +0100 Fix V1SearchableIndex.reorderOneComponent to use segments (#635) commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f Author: Piotr Kołaczkowski <pkolaczk@datastax.com> Date: Mon May 29 20:17:44 2023 +0200 VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY Before: SELEC…

clohfink reviewed Sep 22, 2020

View reviewed changes

src/java/org/apache/cassandra/db/commitlog/CommitLog.java Outdated Show resolved Hide resolved

leeyutang force-pushed the CASSANDRA-16116-emit-oversized-metric branch 4 times, most recently from 674cb39 to cf1c73f Compare September 22, 2020 22:33

clohfink reviewed Sep 22, 2020

View reviewed changes

src/java/org/apache/cassandra/db/Mutation.java Outdated Show resolved Hide resolved

CASSANDRA-16116 emit oversized mutation metric

da1309b

leeyutang force-pushed the CASSANDRA-16116-emit-oversized-metric branch from cf1c73f to da1309b Compare September 22, 2020 22:43

clohfink approved these changes Sep 22, 2020

View reviewed changes

smiklosovic closed this Mar 16, 2022

adelapena pushed a commit to adelapena/cassandra that referenced this pull request Sep 26, 2023

Remove smile-nlp and add internal Glove implementation for testing (a…

57591cf

…pache#743)

adelapena pushed a commit to adelapena/cassandra that referenced this pull request Sep 26, 2023

Remove smile-nlp and add internal Glove implementation for testing (a…

66c00e0

…pache#743)

ekaterinadimitrova2 pushed a commit to ekaterinadimitrova2/cassandra that referenced this pull request Jun 3, 2024

Remove smile-nlp and add internal Glove implementation for testing (a…

84308e2

…pache#743)

michaelsembwever pushed a commit to thelastpickle/cassandra that referenced this pull request Jan 7, 2026

Remove smile-nlp and add internal Glove implementation for testing (a…

bdcd8e2

…pache#743)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CASSANDRA-16116 emit oversized mutation metric #743

CASSANDRA-16116 emit oversized mutation metric #743

Uh oh!

leeyutang commented Sep 9, 2020

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CASSANDRA-16116 emit oversized mutation metric #743

CASSANDRA-16116 emit oversized mutation metric #743

Uh oh!

Conversation

leeyutang commented Sep 9, 2020

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants