-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-16116 emit oversized mutation metric #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
leeyutang
wants to merge
1
commit into
apache:trunk
from
leeyutang:CASSANDRA-16116-emit-oversized-metric
Closed
CASSANDRA-16116 emit oversized mutation metric #743
leeyutang
wants to merge
1
commit into
apache:trunk
from
leeyutang:CASSANDRA-16116-emit-oversized-metric
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
clohfink
reviewed
Sep 22, 2020
674cb39 to
cf1c73f
Compare
clohfink
reviewed
Sep 22, 2020
cf1c73f to
da1309b
Compare
clohfink
approved these changes
Sep 22, 2020
adelapena
pushed a commit
to adelapena/cassandra
that referenced
this pull request
Sep 26, 2023
adelapena
pushed a commit
to adelapena/cassandra
that referenced
this pull request
Sep 26, 2023
adelapena
pushed a commit
to adelapena/cassandra
that referenced
this pull request
Sep 26, 2023
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
ekaterinadimitrova2
pushed a commit
to ekaterinadimitrova2/cassandra
that referenced
this pull request
Jun 3, 2024
ekaterinadimitrova2
pushed a commit
to ekaterinadimitrova2/cassandra
that referenced
this pull request
Jun 3, 2024
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
michaelsembwever
pushed a commit
to thelastpickle/cassandra
that referenced
this pull request
Jan 7, 2026
michaelsembwever
pushed a commit
to thelastpickle/cassandra
that referenced
this pull request
Jan 7, 2026
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.