-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-15908: Added more details for full() indexing requirement o… #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4539616 to
2936c75
Compare
|
|
||
| if (column.type.isFrozenCollection() && target.type != Type.FULL) | ||
| throw ire("Cannot create %s() index on frozen column %s. Frozen collections only support full() indexes", target.type, column); | ||
| throw ire("Cannot create %s() index on frozen column %s. Frozen collections only support indexes on the " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a slightly different wording:
"Cannot create %s() index on frozen column %s. Frozen collections are immutable and must be fully indexed " +
"by using the FULL(<column-name>) modifier"
Maybe even sub in the column name, but be aware that quotes may be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrynCooke nice suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
b6fde4c to
042bc03
Compare
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELECT * FROM table WHERE <vector column> ANN OF <vector value>;
After:
SELECT * FROM table ORDER BY <vector column> ANN OF <vector value>;
commit 226266ef124fc220819d328ed8547c5d86626c4b
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 18:58:16 2023 +0100
Implement getInstance(TypeParser) for VectorType. Fix losing data at startup.
commit 2bbe88ad88e2243f12dc8df789ea9bdf208ff74b
Merge: dd0ffb64f3 a3b8661746
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 12:37:58 2023 -0500
merge ds-trunk
commit dd0ffb64f3e54e2ce1af75da52fe3ece66a49f3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:40:20 2023 -0500
reduce startup log noise at info level
commit 1ef17101def1aacbb84f3434633a9b706291aa4b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:34:26 2023 -0500
add partialUpdateTest
commit f816046a17809b06ad754a16c0f089c0d26909f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:57:12 2023 -0500
fix recall computation in test
commit 1c1442c36baafdf4e1641a9c0288dbd520cf7c0e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:34:02 2023 -0500
also apply tolerance for inexact results to searchWithKey, but only for size > 10
commit 39df5112c6a551b53649ad98164f2bde93cbdc10
Merge: d598206329 8ca4bba861
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:27:24 2023 -0500
merge ds-trunk
commit 8ca4bba8617e8d0fa4f48de1c479de66e3a77dd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:19:32 2023 -0500
rename MemtableIndex -> TrieMemtableIndex to make merge to vsearch easier
commit d5982063292b6a91b156da32454abacd36fa6980
Merge: ae96e0f423 1488a5f0b9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:03:24 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit ae96e0f42393eab3547e2314f1acdb1ff766f2a6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:02:12 2023 -0500
check for results within a percentage (I went with 5%) of the expected; the A in ANN means we shouldn't expect to find 100% of matches unless the graph is tiny
commit 1488a5f0b9c65473bb6a69c9e18c045d931a1fe3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 29 09:20:28 2023 +0800
Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id (#627)
* Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id
commit 14c41f5f913dc7f7f4cfc64b932bb6b9df475f19
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 17:35:16 2023 -0500
use incremental bytes used estimate from hnsw to avoid recomputing full ramBytesUsed on every call to add
commit f8d2aaad6e96693f532d745f71e0eaabb1ddf934
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri May 26 12:02:54 2023 -0500
Update cqlsh for new syntax
commit f0a882ae1171d41bc9c6ccf224ce3be80f7128e8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:24:38 2023 +0100
Only allow VectorType to accept float
commit a29712b1365cdfac9a709e35325e85f7fa6c61c8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:17:51 2023 +0100
Fix max term size for vectors at 16k is SSTableIndexWriter
commit 363d3f4e35c1a4df0f257557309bb042fd3f3ec7
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 15:23:02 2023 +0100
Merge CASSANDRA-18504 vector grammar (#628)
- Now uses vector<type, dimension> to describe vector
- This commit does not bring in the whole 18504 patch
only the essential grammar and type parts of the
patch
commit 18c4e35d4ce14cda7a3c03398e16edf368ad9e6a
Merge: 16158a5b48 b0f71ee2c3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 16:24:41 2023 -0500
Merge branch 'VECTOR-3' into vsearch
commit b0f71ee2c371d1003cf241c3aedd7437385bcecb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:51:27 2023 -0500
cleanup
commit 2d840eae74c079229729767731d3719d12ca1931
Merge: c01cacf300 197a3207b1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:47:39 2023 -0500
Merge branch 'vsearch' (early part) into VECTOR-3
commit 16158a5b48665f57bde649e87a09d078298932f5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:45:51 2023 -0500
optimize ramBytesUsed
commit 197a3207b16ec97bae4924883247e2cd6a2923bd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 13:34:19 2023 -0500
update lucene
commit a7bfcc7a6f31090c3c8443d299dabb7acd437ccd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 08:05:41 2023 -0500
optimize ConcurrentVectorValues.write
commit 456fb08af0b922922ea0e4f24def0184cf4b32c6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 21:25:52 2023 -0500
fix NPE better
commit c01cacf30025d181a2d86c4632042449396333bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu May 25 08:53:38 2023 +0800
Return negative ordinal if row id is not found; add failing test for null vector
commit 009fadf2ed889bcc6748f500b7fa4b1d43e910df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 18:00:23 2023 -0500
search() errors out when an empty graph is passed to it, so special-case that
commit 8c9972d7453fad46f55e6fe5e7538442235e7fd6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:46:50 2023 -0500
use serializer dimensions instead of trying to cache it from the first vector added, because we might need it before vectors are added (if someone tries to search an empty graph)
commit 8423541c4997462b65fe8f1231b0baf15dab3297
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:42:30 2023 -0500
fix NPEs when nulls are inserted
commit d4510e0f1934f03f66d89b7c585b3d7aac17dda2
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 24 15:54:38 2023 +0100
Rebuild HNSW graph on flush since we don't know ahead of time where UCS will want to split the range boundaries
commit 6b925531011798285a6b25d70961643092e2febe
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:58:22 2023 +0100
disable index segment compaction by default
commit 3480a80e3303c13b8c04f718f83f87489315e381
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 14:46:03 2023 -0500
move chatty logs to debug level
commit 05c5758244f6f6d4fe30fbcefbb13932f68f8891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:59:01 2023 -0500
r/m fixme obsoleted by #624
commit b09db4e0425c92491e3e9c73ebdbdb84508075fd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:56:46 2023 -0500
pre-size the intersection builder
commit 6d62d8c2546cda1f487c14dcde1303e14a18c804
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:45:14 2023 -0500
fix nested boolean expressions
commit b8301a86d37513b67545883548c5670131281dc1
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 14:19:41 2023 +0800
Vector-3: support partition/range restricted query
commit 8ba56fad62d21da228e3a61ed7d5971441572cf9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 10:35:16 2023 -0500
update lucene version hash
commit a4768c583eb1a029f629e5ebda274ab2fcce3130
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Revert the revert of "Use published lucene fork jar.""
This reverts commit 7a8ff12860a97d451045c9dd1978a3599a0f4679.
commit 063a176caeec0f1e298d044490aa29ecdd468811
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 22:03:32 2023 +0800
Union the results from multiple segments per index (#624)
- Union the results from multiple segments per index
- Revert intersection behavior to pick 2 most selective indexes if there is no ANN
- Fix SSTableRowIdPostingList to return END_OF_STREAM if next row is END_OF_STREAM
- Skip TermTree for vector index
- Fixes:
- VectorMemtableIndexTest#randomQueryTest
- SegmentMergerTest
- SingleNodeQueryFailureTest#testFailedRangeIteratorOnMultiIndexesQuery
- SelectiveIntersectionTest
- QueryTimeoutTest
commit 643f34c2b654b80573ba92381e386b4286f7fb63
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:25:35 2023 -0500
switch all the hnsw internals to use float[] so we don't need to keep both representations around
commit 7a8ff12860a97d451045c9dd1978a3599a0f4679
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:06:28 2023 -0500
Revert "Use published lucene fork jar."
This reverts commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd.
commit 8731eefe4122a5af0b3723a3b07884a201616e7a
Merge: e50d5e955b 53c6c90920
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:05:18 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit e50d5e955bfde6357a5bb892b2ea70d46e8cc77d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:02:52 2023 -0500
cache float[] from ByteBuffer
commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 18:29:00 2023 -0500
Use published lucene fork jar.
commit f2567cd2fb2a683e7c0fee23945baac3ceca6ad4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:56:25 2023 -0500
attempting to add back IN support but with limited success, even "basicOrTest" gets parsed as child nodes
commit 9e1f018f5c0d5832446fc6a0338aaf95c8323303
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:21:23 2023 -0500
reduce log level of some of the chattiest locations
commit efaf501fe3158da42dcd874b06014acb127f5cd9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:55:40 2023 -0500
clean up RangeIntersection Builder overloads
commit e7127d9d32392347f7010de6267521742f831217
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:46:27 2023 -0500
complete generification of RangeIterator in src/ (but tests are still incomplete)
commit c07ded6420aa25b9dfa2b3c0785178b08d286b26
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:04:16 2023 -0500
TermIterator is more or less replaced by CheckpointingIterator
commit 14d03ae99c69c2419dc0cca3291b4f5eaefbc054
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:45:18 2023 -0500
update lucene jar
commit a15b964d56dcba285c6871d354755d7aa489af1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:48:41 2023 -0500
disk and memory recalls should be the same (since disk graph should be exactly equivalent to in-memory)
commit f03680bdf8581fd13a2d772e8265e3c8a713d247
Merge: 868a6b8f11 fd1293dbaf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:08:46 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 868a6b8f11a25c20b7e59a93d65cc7e2b021a432
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:04:11 2023 -0500
use brute force if we expect to perform fewer comparisons that way than with a graph search
commit 2ed32bf53acaf8180130bef1aa5cebd9b6d32d13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:59:11 2023 -0500
refactor AnnKeyRangeIterator -> ReorderingRangeIterator
commit d28bad9b2df3fc84b117f9ed0a04314ecb9c0a02
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:55:15 2023 -0500
rename reorderOneComponent to limitToTopResults
commit fd1293dbaf82f054593b3150249ac00110b286cf
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Update python driver to one with supports the Vector type
commit 6603f7601671357928948dfb0921b0c731bb96d3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:06:12 2023 -0500
extract ReorderingPostingList
commit 94b4ff20ccc3a51e966b4798f5364a2d9ed4b966
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:29:33 2023 -0500
deserialize doesn't modify buffer so duplicate is unnecessary
commit 840053d212985a109cb5242c19e8a5876fdec7b3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:26:57 2023 -0500
use LongHeap instead of generic PQ
commit aac90dc3e0fdb745e465f044f2624228e705be10
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:47:22 2023 -0500
r/m unused defer flag from toPrimaryKeyIterator / toSSTableRowIdsIterator
commit 1fd71c1cb7f2e785237b36c6e42842fb326d2380
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:26:03 2023 -0500
remove unused method marked "to remove"
commit b96c5a9efccf3226acf2f2a08ca149d0acc0b21d
Merge: 54f41e32d3 ba8aa07dd0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:24:44 2023 -0500
Merge remote-tracking branch 'zhao/vsearch-row-id-iterator-for-reordering' into vsearch
commit 54f41e32d3359941ddf70d34fc8e68a842b9091d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 08:14:10 2023 -0500
update lucene to fix NPE in ram usage
commit ba8aa07dd09f52a72e5dffccb950a21ec37a1c11
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 16:51:36 2023 +0800
intersect on row-ids before fetching primary key for multiple non-ann indexes or single index
commit 2a69547167f8dc1b40e8a26b71cdf43a45779280
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 09:44:00 2023 +0800
Use sstable-row-id iterator to re-order ann index
- add searchSSTableRowIds to searcher
commit 04b9a606d9a1aaeeeab3ec0981a5f1300cc2a423
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
Make RangeIterator generic
commit 4ff716a7411d3ad80009d4fb97bf53af51dd5eba
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 14:57:34 2023 -0500
r/m obsolete FIXMEs
commit 175ad0edce9cef3fb29f230fb64fb1913328d101
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
fix vector distributed test: use 1 token and ignore multi-ann-index test (#622)
commit 78adf77d9e6b7f20ece71fe665a5d1aec51f79b6
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:26:48 2023 +0100
Copy lucene snapshot to build folder in build-resolver.xml
commit 5172315dcb51e05a44af224ddfe944bca29005d0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 08:17:41 2023 -0500
declare lucene dependency to come from lib/
commit d4ee45b8675f79b4a1e06f97d0ce618ab08f030a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 21 05:40:30 2023 -0500
add caching and test code
commit 5102d96e2fb3460f1e541cc1b481e224d112eac1
Merge: 6afcfabe88 87d39a7b25
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 11:15:14 2023 -0500
merge cep-vsearch
commit 6afcfabe883c5627c96eda8eb488c51c06e17c91
Merge: a993dbe49a fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 09:15:13 2023 -0500
Merge commit 'fa85a191' into vsearch
commit a993dbe49a75dde4b9d5528e0816b81ae427a1dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit aadf168e47a4271cc3faeb62ec7044cbfe193893
Merge: c217120935 5bc4d4b42e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:52:48 2023 -0500
merge 5bc4d4b42e
commit c2171209350cc307f3b5fcc73a6ec09423d203df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit 2e44491d3db6a881324126490e6e1511d6082f39
Merge: 68f85f7009 0ff4566080
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:15:16 2023 -0500
merge 0ff4566080
commit 68f85f700995b274ec0d64654ed2692c7fc8bf61
Merge: c8843779be 2ae60e7411
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:50:16 2023 -0500
Merge commit '2ae60e74' into vsearch
commit c8843779be9fe0c3edd047754bb08f633cff5646
Merge: 3cb57548bb 93616d080c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:43:18 2023 -0500
Merge commit '93616d080c539e880e37cd22fe5f27396a7a2594' into vsearch
commit 87d39a7b252b3f3cc1f4aae0f149085f6bc83abc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 17:24:55 2023 -0700
fix intersection count, and add comments to KRI api
commit 4fdc58fced9ca350333675cb19a42169bebeaccf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 15:27:34 2023 -0700
test recall
commit 26fb1006430513a37f267a4dfb81ef473516ac7b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:19:08 2023 -0700
add tests
commit 1cbf78bbf71fff2d597ddd199fd3a758313c7bf4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:02:36 2023 -0700
comments
commit 65e41490f041535f1a48a726032e28f6c8febcff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:57:04 2023 -0700
rm AnnResult
commit 06b10a28c3f6f19c76b056526cdabb4c9aff1528
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:54:08 2023 -0700
cleanup commented-out code
commit e33236ab8b9da7a025f4c59406c52ecbc3b5ae4f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:52:50 2023 -0700
convert suspicious one-off binary search in OnDiskHnswGraph to use DiskBinarySearch
commit fbd30bce13e87c63b129db26c5c123ae9bf8f9ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:31:16 2023 -0700
perform disk-based binary search for ordinals
commit 77bb1b5cfcc743224bdb01d21563365caa184c21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 08:22:36 2023 -0700
move OnDiskOrdinalsMap to its own class, and move write() code into it
commit 8a1fd9f0a38f4c16b72d5bc7b2aeb3ee07bacc7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 18 21:35:28 2023 -0700
add ordinal to row mapping
commit db32d9c7b077133a213ac3a0da161f163c4310a1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:43:49 2023 -0500
mostly-working reordering
commit 62ba004d79aa7511f519ebb56fa8af2cb424e21e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:13:10 2023 -0500
turn QueryView inside out; perform intersections first for each sstable, then union the results
commit d65039a6ccd8364e8eb8ef31933cdf8dca298437
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:37:07 2023 -0500
update MIM to give individual iterators back
commit a190f1e917b93b6cabd15347dec55b1683807b1f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:59:13 2023 -0500
refactor CheckpointingIterator to just take an iterator to wrap, and index references to close if something goes wrong
commit 307fe1c20b400e17048a356f73822ef51f66f341
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:47:20 2023 -0500
rename IndexSearchResultIterator -> CheckpointingIterator
commit 690751b216afbe3e77ed0f89ce04d16e5c6e985d
Merge: 4b6d0f2fb5 fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 10:17:49 2023 -0500
Merge branch 'cep-7-sai' into cep-vsearch
commit 4b6d0f2fb531f6832d2b24465b0982683e741dfe
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 17 16:45:42 2023 +0800
Add vector top-k filter at replica side before returning to coordinator (#618)
* added QueryPlan#postIndexQueryProcessor to filter top-k at replica side before sending response to coordinator
commit 5bc4d4b42e77180df050bdbd9f5fd18fbb37332c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 16 07:38:45 2023 -0500
failing test was failing b/c test was wrong
commit f37e40f5c5d99aeaafb33f41be0bd10bff6b5756
Merge: ae1bffa49b 1c41a095e1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:24:47 2023 -0500
merge
commit ae1bffa49b4c8757a3d9110ae065b8cb70d8e416
Merge: bbc4f72b8d 381e04aaa1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:22:16 2023 -0500
Merge commit '381e04aaa132cf7a255efad3f792852d9ec1729d' into cep-vsearch
commit 1c41a095e130bce4c7c215650f9ef3f817ae635e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:47 2023 -0500
fix computeNext to return endOfData when done
commit b845395ad7e5f4670597cd171f28b98c941d03a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:15 2023 -0500
cleanup
commit 58df11dad894aa5fb36bfbdaaf4c7dc1987fcc94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:59:12 2023 -0500
failing tests
commit ea5977620ebb07d220ad51800c9b90f470494079
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:58:52 2023 -0500
FIXME temporarily remove checkstyle
commit 715790922379ad7b5dbdea5d3358f24542762889
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:49:10 2023 -0500
cleanup
commit 19e47d363c856f81bda3323be02bfe18f60b51c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:55:52 2023 -0500
update NeighborSet usage
commit 920abafce00c94bbc7d6b0c86ba78a2a9d3f9540
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:53:43 2023 -0500
cleanup
commit d58db09004427506efe03e81364cadf44fc914c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 15:46:06 2023 -0500
pull in latest hnsw optimizations
commit bbc4f72b8d7d99db003103103c90d78db4ee04d1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 10:02:40 2023 -0500
use IndexOutputWriter in other components
commit 381e04aaa132cf7a255efad3f792852d9ec1729d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 09:32:25 2023 -0500
switch other components to also use IndexOutputWriter
commit 5951690f5472502bcb093179e8905212c1fcf815
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 15 14:48:16 2023 +0100
Fix CassandraHnswGraphWriter to use correct output writer (#616)
commit bc5b956607fa77d53fcab9ad338776af69610feb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 15 17:35:42 2023 +0800
Distributed vector search (#613)
* distributed vector search
- query all replicas selected by consistency level at once with full request range
- filter top-K results at coordinator in QueryPlan#postProcessor
- skip short-read-protection, read repair and replica filtering protection because replica response will be top-k
- fail ANN query without limit or limit exceeding MAX_TOP_K
- make vector search max_top_k configurable and default to 1k
commit 6d8c94535354f162f60099deecb0523f9f71bb99
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 14 10:01:08 2023 -0500
fix imports for checkstyle
commit 68c20dc9ceeaf0c2169718aa4a1dce9c0f92f003
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 19:23:58 2023 -0500
add the index and ordinal mapping to the set of components that the system knows about
commit ba7efaac142c8b3181881f86020f659837c90463
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:57:22 2023 -0500
cleanup
commit 5ae06fe31840e5ba1899086f3b8eb8194f825db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:23:30 2023 -0500
update to latest lucene snapshot that addresses all known concurrency bugs
commit f996edd63f66cf7b688f8e014b31f15450dd1d3d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 12 16:08:40 2023 -0500
fix for cqlsh by Bret McGuire
commit fa85a191c5e0bd508da584412648308888769cb9
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Mon Apr 24 16:39:26 2023 +0100
Allow CQL queries on multiple indexes without ALLOW FILTERING
patch by Andrés de la Peña; reviewed by Berenguer Blasi for CASSANDRA-18217
commit e2f3d2150ab2bc5e038f2661b8e2337d3b5cd4bf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 11 14:34:22 2023 -0500
imports
commit 68e59bc715de6d1f69d69530310655a598a54200
Author: Mike Adamson <madamson@datastax.com>
Date: Thu May 11 17:20:52 2023 +0100
Use correct bind types for vector
commit c57119254b7e065433804e36ed6ead71d53984ea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:06:19 2023 -0500
avoid calling neighborSet.size() in write path
commit f91e9ea40fa4bfc07c0bb280b7c479d473956da3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:05:11 2023 -0500
add more asserts. countNeighbors is failing
commit 1bd6789304a60762f0ef497c2137dd96f806a262
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:40:01 2023 -0500
lock out updates to the graph while we're writing it to disk
commit da7cb2984f96c06a5890ed2da21c42d946d7c754
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:06:58 2023 -0500
imports
commit 3676a86b821c6c05d9c5df33f4e4fbc9eeb1f591
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:03:14 2023 -0500
compact by building graph in memory, like it did before
commit fa1ebd013bc03def3d211932fe2a98d1bb4442d9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 16:15:58 2023 -0500
rewrite save-to-disk without Lucene code, to support multiple rows having the same vector value
bonus: we don't need to rebuild an index that we already have in memory
commit 3b328324d7c3afcda34a95c4b9fcbb9fe017384c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 17:00:02 2023 -0500
upgrade to latest lucene snapshot
commit 1f4e768e9ec3327a5…
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELECT * FROM table WHERE <vector column> ANN OF <vector value>;
After:
SELECT * FROM table ORDER BY <vector column> ANN OF <vector value>;
commit 226266ef124fc220819d328ed8547c5d86626c4b
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 18:58:16 2023 +0100
Implement getInstance(TypeParser) for VectorType. Fix losing data at startup.
commit 2bbe88ad88e2243f12dc8df789ea9bdf208ff74b
Merge: dd0ffb64f3 a3b8661746
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 12:37:58 2023 -0500
merge ds-trunk
commit dd0ffb64f3e54e2ce1af75da52fe3ece66a49f3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:40:20 2023 -0500
reduce startup log noise at info level
commit 1ef17101def1aacbb84f3434633a9b706291aa4b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:34:26 2023 -0500
add partialUpdateTest
commit f816046a17809b06ad754a16c0f089c0d26909f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:57:12 2023 -0500
fix recall computation in test
commit 1c1442c36baafdf4e1641a9c0288dbd520cf7c0e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:34:02 2023 -0500
also apply tolerance for inexact results to searchWithKey, but only for size > 10
commit 39df5112c6a551b53649ad98164f2bde93cbdc10
Merge: d598206329 8ca4bba861
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:27:24 2023 -0500
merge ds-trunk
commit 8ca4bba8617e8d0fa4f48de1c479de66e3a77dd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:19:32 2023 -0500
rename MemtableIndex -> TrieMemtableIndex to make merge to vsearch easier
commit d5982063292b6a91b156da32454abacd36fa6980
Merge: ae96e0f423 1488a5f0b9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:03:24 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit ae96e0f42393eab3547e2314f1acdb1ff766f2a6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:02:12 2023 -0500
check for results within a percentage (I went with 5%) of the expected; the A in ANN means we shouldn't expect to find 100% of matches unless the graph is tiny
commit 1488a5f0b9c65473bb6a69c9e18c045d931a1fe3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 29 09:20:28 2023 +0800
Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id (#627)
* Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id
commit 14c41f5f913dc7f7f4cfc64b932bb6b9df475f19
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 17:35:16 2023 -0500
use incremental bytes used estimate from hnsw to avoid recomputing full ramBytesUsed on every call to add
commit f8d2aaad6e96693f532d745f71e0eaabb1ddf934
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri May 26 12:02:54 2023 -0500
Update cqlsh for new syntax
commit f0a882ae1171d41bc9c6ccf224ce3be80f7128e8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:24:38 2023 +0100
Only allow VectorType to accept float
commit a29712b1365cdfac9a709e35325e85f7fa6c61c8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:17:51 2023 +0100
Fix max term size for vectors at 16k is SSTableIndexWriter
commit 363d3f4e35c1a4df0f257557309bb042fd3f3ec7
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 15:23:02 2023 +0100
Merge CASSANDRA-18504 vector grammar (#628)
- Now uses vector<type, dimension> to describe vector
- This commit does not bring in the whole 18504 patch
only the essential grammar and type parts of the
patch
commit 18c4e35d4ce14cda7a3c03398e16edf368ad9e6a
Merge: 16158a5b48 b0f71ee2c3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 16:24:41 2023 -0500
Merge branch 'VECTOR-3' into vsearch
commit b0f71ee2c371d1003cf241c3aedd7437385bcecb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:51:27 2023 -0500
cleanup
commit 2d840eae74c079229729767731d3719d12ca1931
Merge: c01cacf300 197a3207b1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:47:39 2023 -0500
Merge branch 'vsearch' (early part) into VECTOR-3
commit 16158a5b48665f57bde649e87a09d078298932f5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:45:51 2023 -0500
optimize ramBytesUsed
commit 197a3207b16ec97bae4924883247e2cd6a2923bd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 13:34:19 2023 -0500
update lucene
commit a7bfcc7a6f31090c3c8443d299dabb7acd437ccd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 08:05:41 2023 -0500
optimize ConcurrentVectorValues.write
commit 456fb08af0b922922ea0e4f24def0184cf4b32c6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 21:25:52 2023 -0500
fix NPE better
commit c01cacf30025d181a2d86c4632042449396333bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu May 25 08:53:38 2023 +0800
Return negative ordinal if row id is not found; add failing test for null vector
commit 009fadf2ed889bcc6748f500b7fa4b1d43e910df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 18:00:23 2023 -0500
search() errors out when an empty graph is passed to it, so special-case that
commit 8c9972d7453fad46f55e6fe5e7538442235e7fd6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:46:50 2023 -0500
use serializer dimensions instead of trying to cache it from the first vector added, because we might need it before vectors are added (if someone tries to search an empty graph)
commit 8423541c4997462b65fe8f1231b0baf15dab3297
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:42:30 2023 -0500
fix NPEs when nulls are inserted
commit d4510e0f1934f03f66d89b7c585b3d7aac17dda2
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 24 15:54:38 2023 +0100
Rebuild HNSW graph on flush since we don't know ahead of time where UCS will want to split the range boundaries
commit 6b925531011798285a6b25d70961643092e2febe
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:58:22 2023 +0100
disable index segment compaction by default
commit 3480a80e3303c13b8c04f718f83f87489315e381
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 14:46:03 2023 -0500
move chatty logs to debug level
commit 05c5758244f6f6d4fe30fbcefbb13932f68f8891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:59:01 2023 -0500
r/m fixme obsoleted by #624
commit b09db4e0425c92491e3e9c73ebdbdb84508075fd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:56:46 2023 -0500
pre-size the intersection builder
commit 6d62d8c2546cda1f487c14dcde1303e14a18c804
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:45:14 2023 -0500
fix nested boolean expressions
commit b8301a86d37513b67545883548c5670131281dc1
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 14:19:41 2023 +0800
Vector-3: support partition/range restricted query
commit 8ba56fad62d21da228e3a61ed7d5971441572cf9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 10:35:16 2023 -0500
update lucene version hash
commit a4768c583eb1a029f629e5ebda274ab2fcce3130
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Revert the revert of "Use published lucene fork jar.""
This reverts commit 7a8ff12860a97d451045c9dd1978a3599a0f4679.
commit 063a176caeec0f1e298d044490aa29ecdd468811
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 22:03:32 2023 +0800
Union the results from multiple segments per index (#624)
- Union the results from multiple segments per index
- Revert intersection behavior to pick 2 most selective indexes if there is no ANN
- Fix SSTableRowIdPostingList to return END_OF_STREAM if next row is END_OF_STREAM
- Skip TermTree for vector index
- Fixes:
- VectorMemtableIndexTest#randomQueryTest
- SegmentMergerTest
- SingleNodeQueryFailureTest#testFailedRangeIteratorOnMultiIndexesQuery
- SelectiveIntersectionTest
- QueryTimeoutTest
commit 643f34c2b654b80573ba92381e386b4286f7fb63
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:25:35 2023 -0500
switch all the hnsw internals to use float[] so we don't need to keep both representations around
commit 7a8ff12860a97d451045c9dd1978a3599a0f4679
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:06:28 2023 -0500
Revert "Use published lucene fork jar."
This reverts commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd.
commit 8731eefe4122a5af0b3723a3b07884a201616e7a
Merge: e50d5e955b 53c6c90920
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:05:18 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit e50d5e955bfde6357a5bb892b2ea70d46e8cc77d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:02:52 2023 -0500
cache float[] from ByteBuffer
commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 18:29:00 2023 -0500
Use published lucene fork jar.
commit f2567cd2fb2a683e7c0fee23945baac3ceca6ad4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:56:25 2023 -0500
attempting to add back IN support but with limited success, even "basicOrTest" gets parsed as child nodes
commit 9e1f018f5c0d5832446fc6a0338aaf95c8323303
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:21:23 2023 -0500
reduce log level of some of the chattiest locations
commit efaf501fe3158da42dcd874b06014acb127f5cd9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:55:40 2023 -0500
clean up RangeIntersection Builder overloads
commit e7127d9d32392347f7010de6267521742f831217
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:46:27 2023 -0500
complete generification of RangeIterator in src/ (but tests are still incomplete)
commit c07ded6420aa25b9dfa2b3c0785178b08d286b26
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:04:16 2023 -0500
TermIterator is more or less replaced by CheckpointingIterator
commit 14d03ae99c69c2419dc0cca3291b4f5eaefbc054
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:45:18 2023 -0500
update lucene jar
commit a15b964d56dcba285c6871d354755d7aa489af1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:48:41 2023 -0500
disk and memory recalls should be the same (since disk graph should be exactly equivalent to in-memory)
commit f03680bdf8581fd13a2d772e8265e3c8a713d247
Merge: 868a6b8f11 fd1293dbaf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:08:46 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 868a6b8f11a25c20b7e59a93d65cc7e2b021a432
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:04:11 2023 -0500
use brute force if we expect to perform fewer comparisons that way than with a graph search
commit 2ed32bf53acaf8180130bef1aa5cebd9b6d32d13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:59:11 2023 -0500
refactor AnnKeyRangeIterator -> ReorderingRangeIterator
commit d28bad9b2df3fc84b117f9ed0a04314ecb9c0a02
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:55:15 2023 -0500
rename reorderOneComponent to limitToTopResults
commit fd1293dbaf82f054593b3150249ac00110b286cf
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Update python driver to one with supports the Vector type
commit 6603f7601671357928948dfb0921b0c731bb96d3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:06:12 2023 -0500
extract ReorderingPostingList
commit 94b4ff20ccc3a51e966b4798f5364a2d9ed4b966
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:29:33 2023 -0500
deserialize doesn't modify buffer so duplicate is unnecessary
commit 840053d212985a109cb5242c19e8a5876fdec7b3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:26:57 2023 -0500
use LongHeap instead of generic PQ
commit aac90dc3e0fdb745e465f044f2624228e705be10
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:47:22 2023 -0500
r/m unused defer flag from toPrimaryKeyIterator / toSSTableRowIdsIterator
commit 1fd71c1cb7f2e785237b36c6e42842fb326d2380
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:26:03 2023 -0500
remove unused method marked "to remove"
commit b96c5a9efccf3226acf2f2a08ca149d0acc0b21d
Merge: 54f41e32d3 ba8aa07dd0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:24:44 2023 -0500
Merge remote-tracking branch 'zhao/vsearch-row-id-iterator-for-reordering' into vsearch
commit 54f41e32d3359941ddf70d34fc8e68a842b9091d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 08:14:10 2023 -0500
update lucene to fix NPE in ram usage
commit ba8aa07dd09f52a72e5dffccb950a21ec37a1c11
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 16:51:36 2023 +0800
intersect on row-ids before fetching primary key for multiple non-ann indexes or single index
commit 2a69547167f8dc1b40e8a26b71cdf43a45779280
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 09:44:00 2023 +0800
Use sstable-row-id iterator to re-order ann index
- add searchSSTableRowIds to searcher
commit 04b9a606d9a1aaeeeab3ec0981a5f1300cc2a423
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
Make RangeIterator generic
commit 4ff716a7411d3ad80009d4fb97bf53af51dd5eba
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 14:57:34 2023 -0500
r/m obsolete FIXMEs
commit 175ad0edce9cef3fb29f230fb64fb1913328d101
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
fix vector distributed test: use 1 token and ignore multi-ann-index test (#622)
commit 78adf77d9e6b7f20ece71fe665a5d1aec51f79b6
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:26:48 2023 +0100
Copy lucene snapshot to build folder in build-resolver.xml
commit 5172315dcb51e05a44af224ddfe944bca29005d0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 08:17:41 2023 -0500
declare lucene dependency to come from lib/
commit d4ee45b8675f79b4a1e06f97d0ce618ab08f030a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 21 05:40:30 2023 -0500
add caching and test code
commit 5102d96e2fb3460f1e541cc1b481e224d112eac1
Merge: 6afcfabe88 87d39a7b25
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 11:15:14 2023 -0500
merge cep-vsearch
commit 6afcfabe883c5627c96eda8eb488c51c06e17c91
Merge: a993dbe49a fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 09:15:13 2023 -0500
Merge commit 'fa85a191' into vsearch
commit a993dbe49a75dde4b9d5528e0816b81ae427a1dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit aadf168e47a4271cc3faeb62ec7044cbfe193893
Merge: c217120935 5bc4d4b42e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:52:48 2023 -0500
merge 5bc4d4b42e
commit c2171209350cc307f3b5fcc73a6ec09423d203df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit 2e44491d3db6a881324126490e6e1511d6082f39
Merge: 68f85f7009 0ff4566080
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:15:16 2023 -0500
merge 0ff4566080
commit 68f85f700995b274ec0d64654ed2692c7fc8bf61
Merge: c8843779be 2ae60e7411
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:50:16 2023 -0500
Merge commit '2ae60e74' into vsearch
commit c8843779be9fe0c3edd047754bb08f633cff5646
Merge: 3cb57548bb 93616d080c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:43:18 2023 -0500
Merge commit '93616d080c539e880e37cd22fe5f27396a7a2594' into vsearch
commit 87d39a7b252b3f3cc1f4aae0f149085f6bc83abc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 17:24:55 2023 -0700
fix intersection count, and add comments to KRI api
commit 4fdc58fced9ca350333675cb19a42169bebeaccf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 15:27:34 2023 -0700
test recall
commit 26fb1006430513a37f267a4dfb81ef473516ac7b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:19:08 2023 -0700
add tests
commit 1cbf78bbf71fff2d597ddd199fd3a758313c7bf4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:02:36 2023 -0700
comments
commit 65e41490f041535f1a48a726032e28f6c8febcff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:57:04 2023 -0700
rm AnnResult
commit 06b10a28c3f6f19c76b056526cdabb4c9aff1528
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:54:08 2023 -0700
cleanup commented-out code
commit e33236ab8b9da7a025f4c59406c52ecbc3b5ae4f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:52:50 2023 -0700
convert suspicious one-off binary search in OnDiskHnswGraph to use DiskBinarySearch
commit fbd30bce13e87c63b129db26c5c123ae9bf8f9ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:31:16 2023 -0700
perform disk-based binary search for ordinals
commit 77bb1b5cfcc743224bdb01d21563365caa184c21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 08:22:36 2023 -0700
move OnDiskOrdinalsMap to its own class, and move write() code into it
commit 8a1fd9f0a38f4c16b72d5bc7b2aeb3ee07bacc7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 18 21:35:28 2023 -0700
add ordinal to row mapping
commit db32d9c7b077133a213ac3a0da161f163c4310a1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:43:49 2023 -0500
mostly-working reordering
commit 62ba004d79aa7511f519ebb56fa8af2cb424e21e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:13:10 2023 -0500
turn QueryView inside out; perform intersections first for each sstable, then union the results
commit d65039a6ccd8364e8eb8ef31933cdf8dca298437
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:37:07 2023 -0500
update MIM to give individual iterators back
commit a190f1e917b93b6cabd15347dec55b1683807b1f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:59:13 2023 -0500
refactor CheckpointingIterator to just take an iterator to wrap, and index references to close if something goes wrong
commit 307fe1c20b400e17048a356f73822ef51f66f341
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:47:20 2023 -0500
rename IndexSearchResultIterator -> CheckpointingIterator
commit 690751b216afbe3e77ed0f89ce04d16e5c6e985d
Merge: 4b6d0f2fb5 fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 10:17:49 2023 -0500
Merge branch 'cep-7-sai' into cep-vsearch
commit 4b6d0f2fb531f6832d2b24465b0982683e741dfe
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 17 16:45:42 2023 +0800
Add vector top-k filter at replica side before returning to coordinator (#618)
* added QueryPlan#postIndexQueryProcessor to filter top-k at replica side before sending response to coordinator
commit 5bc4d4b42e77180df050bdbd9f5fd18fbb37332c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 16 07:38:45 2023 -0500
failing test was failing b/c test was wrong
commit f37e40f5c5d99aeaafb33f41be0bd10bff6b5756
Merge: ae1bffa49b 1c41a095e1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:24:47 2023 -0500
merge
commit ae1bffa49b4c8757a3d9110ae065b8cb70d8e416
Merge: bbc4f72b8d 381e04aaa1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:22:16 2023 -0500
Merge commit '381e04aaa132cf7a255efad3f792852d9ec1729d' into cep-vsearch
commit 1c41a095e130bce4c7c215650f9ef3f817ae635e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:47 2023 -0500
fix computeNext to return endOfData when done
commit b845395ad7e5f4670597cd171f28b98c941d03a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:15 2023 -0500
cleanup
commit 58df11dad894aa5fb36bfbdaaf4c7dc1987fcc94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:59:12 2023 -0500
failing tests
commit ea5977620ebb07d220ad51800c9b90f470494079
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:58:52 2023 -0500
FIXME temporarily remove checkstyle
commit 715790922379ad7b5dbdea5d3358f24542762889
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:49:10 2023 -0500
cleanup
commit 19e47d363c856f81bda3323be02bfe18f60b51c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:55:52 2023 -0500
update NeighborSet usage
commit 920abafce00c94bbc7d6b0c86ba78a2a9d3f9540
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:53:43 2023 -0500
cleanup
commit d58db09004427506efe03e81364cadf44fc914c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 15:46:06 2023 -0500
pull in latest hnsw optimizations
commit bbc4f72b8d7d99db003103103c90d78db4ee04d1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 10:02:40 2023 -0500
use IndexOutputWriter in other components
commit 381e04aaa132cf7a255efad3f792852d9ec1729d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 09:32:25 2023 -0500
switch other components to also use IndexOutputWriter
commit 5951690f5472502bcb093179e8905212c1fcf815
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 15 14:48:16 2023 +0100
Fix CassandraHnswGraphWriter to use correct output writer (#616)
commit bc5b956607fa77d53fcab9ad338776af69610feb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 15 17:35:42 2023 +0800
Distributed vector search (#613)
* distributed vector search
- query all replicas selected by consistency level at once with full request range
- filter top-K results at coordinator in QueryPlan#postProcessor
- skip short-read-protection, read repair and replica filtering protection because replica response will be top-k
- fail ANN query without limit or limit exceeding MAX_TOP_K
- make vector search max_top_k configurable and default to 1k
commit 6d8c94535354f162f60099deecb0523f9f71bb99
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 14 10:01:08 2023 -0500
fix imports for checkstyle
commit 68c20dc9ceeaf0c2169718aa4a1dce9c0f92f003
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 19:23:58 2023 -0500
add the index and ordinal mapping to the set of components that the system knows about
commit ba7efaac142c8b3181881f86020f659837c90463
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:57:22 2023 -0500
cleanup
commit 5ae06fe31840e5ba1899086f3b8eb8194f825db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:23:30 2023 -0500
update to latest lucene snapshot that addresses all known concurrency bugs
commit f996edd63f66cf7b688f8e014b31f15450dd1d3d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 12 16:08:40 2023 -0500
fix for cqlsh by Bret McGuire
commit fa85a191c5e0bd508da584412648308888769cb9
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Mon Apr 24 16:39:26 2023 +0100
Allow CQL queries on multiple indexes without ALLOW FILTERING
patch by Andrés de la Peña; reviewed by Berenguer Blasi for CASSANDRA-18217
commit e2f3d2150ab2bc5e038f2661b8e2337d3b5cd4bf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 11 14:34:22 2023 -0500
imports
commit 68e59bc715de6d1f69d69530310655a598a54200
Author: Mike Adamson <madamson@datastax.com>
Date: Thu May 11 17:20:52 2023 +0100
Use correct bind types for vector
commit c57119254b7e065433804e36ed6ead71d53984ea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:06:19 2023 -0500
avoid calling neighborSet.size() in write path
commit f91e9ea40fa4bfc07c0bb280b7c479d473956da3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:05:11 2023 -0500
add more asserts. countNeighbors is failing
commit 1bd6789304a60762f0ef497c2137dd96f806a262
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:40:01 2023 -0500
lock out updates to the graph while we're writing it to disk
commit da7cb2984f96c06a5890ed2da21c42d946d7c754
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:06:58 2023 -0500
imports
commit 3676a86b821c6c05d9c5df33f4e4fbc9eeb1f591
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:03:14 2023 -0500
compact by building graph in memory, like it did before
commit fa1ebd013bc03def3d211932fe2a98d1bb4442d9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 16:15:58 2023 -0500
rewrite save-to-disk without Lucene code, to support multiple rows having the same vector value
bonus: we don't need to rebuild an index that we already have in memory
commit 3b328324d7c3afcda34a95c4b9fcbb9fe017384c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 17:00:02 2023 -0500
upgrade to latest lucene snapshot
commit 1f4e768e9ec3327a5…
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELECT * FROM table WHERE <vector column> ANN OF <vector value>;
After:
SELECT * FROM table ORDER BY <vector column> ANN OF <vector value>;
commit 226266ef124fc220819d328ed8547c5d86626c4b
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 18:58:16 2023 +0100
Implement getInstance(TypeParser) for VectorType. Fix losing data at startup.
commit 2bbe88ad88e2243f12dc8df789ea9bdf208ff74b
Merge: dd0ffb64f3 a3b8661746
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 12:37:58 2023 -0500
merge ds-trunk
commit dd0ffb64f3e54e2ce1af75da52fe3ece66a49f3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:40:20 2023 -0500
reduce startup log noise at info level
commit 1ef17101def1aacbb84f3434633a9b706291aa4b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:34:26 2023 -0500
add partialUpdateTest
commit f816046a17809b06ad754a16c0f089c0d26909f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:57:12 2023 -0500
fix recall computation in test
commit 1c1442c36baafdf4e1641a9c0288dbd520cf7c0e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:34:02 2023 -0500
also apply tolerance for inexact results to searchWithKey, but only for size > 10
commit 39df5112c6a551b53649ad98164f2bde93cbdc10
Merge: d598206329 8ca4bba861
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:27:24 2023 -0500
merge ds-trunk
commit 8ca4bba8617e8d0fa4f48de1c479de66e3a77dd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:19:32 2023 -0500
rename MemtableIndex -> TrieMemtableIndex to make merge to vsearch easier
commit d5982063292b6a91b156da32454abacd36fa6980
Merge: ae96e0f423 1488a5f0b9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:03:24 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit ae96e0f42393eab3547e2314f1acdb1ff766f2a6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:02:12 2023 -0500
check for results within a percentage (I went with 5%) of the expected; the A in ANN means we shouldn't expect to find 100% of matches unless the graph is tiny
commit 1488a5f0b9c65473bb6a69c9e18c045d931a1fe3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 29 09:20:28 2023 +0800
Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id (#627)
* Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id
commit 14c41f5f913dc7f7f4cfc64b932bb6b9df475f19
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 17:35:16 2023 -0500
use incremental bytes used estimate from hnsw to avoid recomputing full ramBytesUsed on every call to add
commit f8d2aaad6e96693f532d745f71e0eaabb1ddf934
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri May 26 12:02:54 2023 -0500
Update cqlsh for new syntax
commit f0a882ae1171d41bc9c6ccf224ce3be80f7128e8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:24:38 2023 +0100
Only allow VectorType to accept float
commit a29712b1365cdfac9a709e35325e85f7fa6c61c8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:17:51 2023 +0100
Fix max term size for vectors at 16k is SSTableIndexWriter
commit 363d3f4e35c1a4df0f257557309bb042fd3f3ec7
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 15:23:02 2023 +0100
Merge CASSANDRA-18504 vector grammar (#628)
- Now uses vector<type, dimension> to describe vector
- This commit does not bring in the whole 18504 patch
only the essential grammar and type parts of the
patch
commit 18c4e35d4ce14cda7a3c03398e16edf368ad9e6a
Merge: 16158a5b48 b0f71ee2c3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 16:24:41 2023 -0500
Merge branch 'VECTOR-3' into vsearch
commit b0f71ee2c371d1003cf241c3aedd7437385bcecb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:51:27 2023 -0500
cleanup
commit 2d840eae74c079229729767731d3719d12ca1931
Merge: c01cacf300 197a3207b1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:47:39 2023 -0500
Merge branch 'vsearch' (early part) into VECTOR-3
commit 16158a5b48665f57bde649e87a09d078298932f5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:45:51 2023 -0500
optimize ramBytesUsed
commit 197a3207b16ec97bae4924883247e2cd6a2923bd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 13:34:19 2023 -0500
update lucene
commit a7bfcc7a6f31090c3c8443d299dabb7acd437ccd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 08:05:41 2023 -0500
optimize ConcurrentVectorValues.write
commit 456fb08af0b922922ea0e4f24def0184cf4b32c6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 21:25:52 2023 -0500
fix NPE better
commit c01cacf30025d181a2d86c4632042449396333bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu May 25 08:53:38 2023 +0800
Return negative ordinal if row id is not found; add failing test for null vector
commit 009fadf2ed889bcc6748f500b7fa4b1d43e910df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 18:00:23 2023 -0500
search() errors out when an empty graph is passed to it, so special-case that
commit 8c9972d7453fad46f55e6fe5e7538442235e7fd6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:46:50 2023 -0500
use serializer dimensions instead of trying to cache it from the first vector added, because we might need it before vectors are added (if someone tries to search an empty graph)
commit 8423541c4997462b65fe8f1231b0baf15dab3297
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:42:30 2023 -0500
fix NPEs when nulls are inserted
commit d4510e0f1934f03f66d89b7c585b3d7aac17dda2
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 24 15:54:38 2023 +0100
Rebuild HNSW graph on flush since we don't know ahead of time where UCS will want to split the range boundaries
commit 6b925531011798285a6b25d70961643092e2febe
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:58:22 2023 +0100
disable index segment compaction by default
commit 3480a80e3303c13b8c04f718f83f87489315e381
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 14:46:03 2023 -0500
move chatty logs to debug level
commit 05c5758244f6f6d4fe30fbcefbb13932f68f8891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:59:01 2023 -0500
r/m fixme obsoleted by #624
commit b09db4e0425c92491e3e9c73ebdbdb84508075fd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:56:46 2023 -0500
pre-size the intersection builder
commit 6d62d8c2546cda1f487c14dcde1303e14a18c804
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:45:14 2023 -0500
fix nested boolean expressions
commit b8301a86d37513b67545883548c5670131281dc1
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 14:19:41 2023 +0800
Vector-3: support partition/range restricted query
commit 8ba56fad62d21da228e3a61ed7d5971441572cf9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 10:35:16 2023 -0500
update lucene version hash
commit a4768c583eb1a029f629e5ebda274ab2fcce3130
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Revert the revert of "Use published lucene fork jar.""
This reverts commit 7a8ff12860a97d451045c9dd1978a3599a0f4679.
commit 063a176caeec0f1e298d044490aa29ecdd468811
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 22:03:32 2023 +0800
Union the results from multiple segments per index (#624)
- Union the results from multiple segments per index
- Revert intersection behavior to pick 2 most selective indexes if there is no ANN
- Fix SSTableRowIdPostingList to return END_OF_STREAM if next row is END_OF_STREAM
- Skip TermTree for vector index
- Fixes:
- VectorMemtableIndexTest#randomQueryTest
- SegmentMergerTest
- SingleNodeQueryFailureTest#testFailedRangeIteratorOnMultiIndexesQuery
- SelectiveIntersectionTest
- QueryTimeoutTest
commit 643f34c2b654b80573ba92381e386b4286f7fb63
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:25:35 2023 -0500
switch all the hnsw internals to use float[] so we don't need to keep both representations around
commit 7a8ff12860a97d451045c9dd1978a3599a0f4679
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:06:28 2023 -0500
Revert "Use published lucene fork jar."
This reverts commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd.
commit 8731eefe4122a5af0b3723a3b07884a201616e7a
Merge: e50d5e955b 53c6c90920
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:05:18 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit e50d5e955bfde6357a5bb892b2ea70d46e8cc77d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:02:52 2023 -0500
cache float[] from ByteBuffer
commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 18:29:00 2023 -0500
Use published lucene fork jar.
commit f2567cd2fb2a683e7c0fee23945baac3ceca6ad4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:56:25 2023 -0500
attempting to add back IN support but with limited success, even "basicOrTest" gets parsed as child nodes
commit 9e1f018f5c0d5832446fc6a0338aaf95c8323303
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:21:23 2023 -0500
reduce log level of some of the chattiest locations
commit efaf501fe3158da42dcd874b06014acb127f5cd9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:55:40 2023 -0500
clean up RangeIntersection Builder overloads
commit e7127d9d32392347f7010de6267521742f831217
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:46:27 2023 -0500
complete generification of RangeIterator in src/ (but tests are still incomplete)
commit c07ded6420aa25b9dfa2b3c0785178b08d286b26
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:04:16 2023 -0500
TermIterator is more or less replaced by CheckpointingIterator
commit 14d03ae99c69c2419dc0cca3291b4f5eaefbc054
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:45:18 2023 -0500
update lucene jar
commit a15b964d56dcba285c6871d354755d7aa489af1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:48:41 2023 -0500
disk and memory recalls should be the same (since disk graph should be exactly equivalent to in-memory)
commit f03680bdf8581fd13a2d772e8265e3c8a713d247
Merge: 868a6b8f11 fd1293dbaf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:08:46 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 868a6b8f11a25c20b7e59a93d65cc7e2b021a432
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:04:11 2023 -0500
use brute force if we expect to perform fewer comparisons that way than with a graph search
commit 2ed32bf53acaf8180130bef1aa5cebd9b6d32d13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:59:11 2023 -0500
refactor AnnKeyRangeIterator -> ReorderingRangeIterator
commit d28bad9b2df3fc84b117f9ed0a04314ecb9c0a02
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:55:15 2023 -0500
rename reorderOneComponent to limitToTopResults
commit fd1293dbaf82f054593b3150249ac00110b286cf
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Update python driver to one with supports the Vector type
commit 6603f7601671357928948dfb0921b0c731bb96d3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:06:12 2023 -0500
extract ReorderingPostingList
commit 94b4ff20ccc3a51e966b4798f5364a2d9ed4b966
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:29:33 2023 -0500
deserialize doesn't modify buffer so duplicate is unnecessary
commit 840053d212985a109cb5242c19e8a5876fdec7b3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:26:57 2023 -0500
use LongHeap instead of generic PQ
commit aac90dc3e0fdb745e465f044f2624228e705be10
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:47:22 2023 -0500
r/m unused defer flag from toPrimaryKeyIterator / toSSTableRowIdsIterator
commit 1fd71c1cb7f2e785237b36c6e42842fb326d2380
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:26:03 2023 -0500
remove unused method marked "to remove"
commit b96c5a9efccf3226acf2f2a08ca149d0acc0b21d
Merge: 54f41e32d3 ba8aa07dd0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:24:44 2023 -0500
Merge remote-tracking branch 'zhao/vsearch-row-id-iterator-for-reordering' into vsearch
commit 54f41e32d3359941ddf70d34fc8e68a842b9091d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 08:14:10 2023 -0500
update lucene to fix NPE in ram usage
commit ba8aa07dd09f52a72e5dffccb950a21ec37a1c11
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 16:51:36 2023 +0800
intersect on row-ids before fetching primary key for multiple non-ann indexes or single index
commit 2a69547167f8dc1b40e8a26b71cdf43a45779280
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 09:44:00 2023 +0800
Use sstable-row-id iterator to re-order ann index
- add searchSSTableRowIds to searcher
commit 04b9a606d9a1aaeeeab3ec0981a5f1300cc2a423
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
Make RangeIterator generic
commit 4ff716a7411d3ad80009d4fb97bf53af51dd5eba
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 14:57:34 2023 -0500
r/m obsolete FIXMEs
commit 175ad0edce9cef3fb29f230fb64fb1913328d101
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
fix vector distributed test: use 1 token and ignore multi-ann-index test (#622)
commit 78adf77d9e6b7f20ece71fe665a5d1aec51f79b6
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:26:48 2023 +0100
Copy lucene snapshot to build folder in build-resolver.xml
commit 5172315dcb51e05a44af224ddfe944bca29005d0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 08:17:41 2023 -0500
declare lucene dependency to come from lib/
commit d4ee45b8675f79b4a1e06f97d0ce618ab08f030a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 21 05:40:30 2023 -0500
add caching and test code
commit 5102d96e2fb3460f1e541cc1b481e224d112eac1
Merge: 6afcfabe88 87d39a7b25
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 11:15:14 2023 -0500
merge cep-vsearch
commit 6afcfabe883c5627c96eda8eb488c51c06e17c91
Merge: a993dbe49a fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 09:15:13 2023 -0500
Merge commit 'fa85a191' into vsearch
commit a993dbe49a75dde4b9d5528e0816b81ae427a1dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit aadf168e47a4271cc3faeb62ec7044cbfe193893
Merge: c217120935 5bc4d4b42e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:52:48 2023 -0500
merge 5bc4d4b42e
commit c2171209350cc307f3b5fcc73a6ec09423d203df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit 2e44491d3db6a881324126490e6e1511d6082f39
Merge: 68f85f7009 0ff4566080
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:15:16 2023 -0500
merge 0ff4566080
commit 68f85f700995b274ec0d64654ed2692c7fc8bf61
Merge: c8843779be 2ae60e7411
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:50:16 2023 -0500
Merge commit '2ae60e74' into vsearch
commit c8843779be9fe0c3edd047754bb08f633cff5646
Merge: 3cb57548bb 93616d080c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:43:18 2023 -0500
Merge commit '93616d080c539e880e37cd22fe5f27396a7a2594' into vsearch
commit 87d39a7b252b3f3cc1f4aae0f149085f6bc83abc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 17:24:55 2023 -0700
fix intersection count, and add comments to KRI api
commit 4fdc58fced9ca350333675cb19a42169bebeaccf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 15:27:34 2023 -0700
test recall
commit 26fb1006430513a37f267a4dfb81ef473516ac7b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:19:08 2023 -0700
add tests
commit 1cbf78bbf71fff2d597ddd199fd3a758313c7bf4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:02:36 2023 -0700
comments
commit 65e41490f041535f1a48a726032e28f6c8febcff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:57:04 2023 -0700
rm AnnResult
commit 06b10a28c3f6f19c76b056526cdabb4c9aff1528
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:54:08 2023 -0700
cleanup commented-out code
commit e33236ab8b9da7a025f4c59406c52ecbc3b5ae4f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:52:50 2023 -0700
convert suspicious one-off binary search in OnDiskHnswGraph to use DiskBinarySearch
commit fbd30bce13e87c63b129db26c5c123ae9bf8f9ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:31:16 2023 -0700
perform disk-based binary search for ordinals
commit 77bb1b5cfcc743224bdb01d21563365caa184c21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 08:22:36 2023 -0700
move OnDiskOrdinalsMap to its own class, and move write() code into it
commit 8a1fd9f0a38f4c16b72d5bc7b2aeb3ee07bacc7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 18 21:35:28 2023 -0700
add ordinal to row mapping
commit db32d9c7b077133a213ac3a0da161f163c4310a1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:43:49 2023 -0500
mostly-working reordering
commit 62ba004d79aa7511f519ebb56fa8af2cb424e21e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:13:10 2023 -0500
turn QueryView inside out; perform intersections first for each sstable, then union the results
commit d65039a6ccd8364e8eb8ef31933cdf8dca298437
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:37:07 2023 -0500
update MIM to give individual iterators back
commit a190f1e917b93b6cabd15347dec55b1683807b1f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:59:13 2023 -0500
refactor CheckpointingIterator to just take an iterator to wrap, and index references to close if something goes wrong
commit 307fe1c20b400e17048a356f73822ef51f66f341
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:47:20 2023 -0500
rename IndexSearchResultIterator -> CheckpointingIterator
commit 690751b216afbe3e77ed0f89ce04d16e5c6e985d
Merge: 4b6d0f2fb5 fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 10:17:49 2023 -0500
Merge branch 'cep-7-sai' into cep-vsearch
commit 4b6d0f2fb531f6832d2b24465b0982683e741dfe
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 17 16:45:42 2023 +0800
Add vector top-k filter at replica side before returning to coordinator (#618)
* added QueryPlan#postIndexQueryProcessor to filter top-k at replica side before sending response to coordinator
commit 5bc4d4b42e77180df050bdbd9f5fd18fbb37332c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 16 07:38:45 2023 -0500
failing test was failing b/c test was wrong
commit f37e40f5c5d99aeaafb33f41be0bd10bff6b5756
Merge: ae1bffa49b 1c41a095e1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:24:47 2023 -0500
merge
commit ae1bffa49b4c8757a3d9110ae065b8cb70d8e416
Merge: bbc4f72b8d 381e04aaa1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:22:16 2023 -0500
Merge commit '381e04aaa132cf7a255efad3f792852d9ec1729d' into cep-vsearch
commit 1c41a095e130bce4c7c215650f9ef3f817ae635e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:47 2023 -0500
fix computeNext to return endOfData when done
commit b845395ad7e5f4670597cd171f28b98c941d03a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:15 2023 -0500
cleanup
commit 58df11dad894aa5fb36bfbdaaf4c7dc1987fcc94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:59:12 2023 -0500
failing tests
commit ea5977620ebb07d220ad51800c9b90f470494079
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:58:52 2023 -0500
FIXME temporarily remove checkstyle
commit 715790922379ad7b5dbdea5d3358f24542762889
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:49:10 2023 -0500
cleanup
commit 19e47d363c856f81bda3323be02bfe18f60b51c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:55:52 2023 -0500
update NeighborSet usage
commit 920abafce00c94bbc7d6b0c86ba78a2a9d3f9540
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:53:43 2023 -0500
cleanup
commit d58db09004427506efe03e81364cadf44fc914c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 15:46:06 2023 -0500
pull in latest hnsw optimizations
commit bbc4f72b8d7d99db003103103c90d78db4ee04d1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 10:02:40 2023 -0500
use IndexOutputWriter in other components
commit 381e04aaa132cf7a255efad3f792852d9ec1729d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 09:32:25 2023 -0500
switch other components to also use IndexOutputWriter
commit 5951690f5472502bcb093179e8905212c1fcf815
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 15 14:48:16 2023 +0100
Fix CassandraHnswGraphWriter to use correct output writer (#616)
commit bc5b956607fa77d53fcab9ad338776af69610feb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 15 17:35:42 2023 +0800
Distributed vector search (#613)
* distributed vector search
- query all replicas selected by consistency level at once with full request range
- filter top-K results at coordinator in QueryPlan#postProcessor
- skip short-read-protection, read repair and replica filtering protection because replica response will be top-k
- fail ANN query without limit or limit exceeding MAX_TOP_K
- make vector search max_top_k configurable and default to 1k
commit 6d8c94535354f162f60099deecb0523f9f71bb99
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 14 10:01:08 2023 -0500
fix imports for checkstyle
commit 68c20dc9ceeaf0c2169718aa4a1dce9c0f92f003
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 19:23:58 2023 -0500
add the index and ordinal mapping to the set of components that the system knows about
commit ba7efaac142c8b3181881f86020f659837c90463
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:57:22 2023 -0500
cleanup
commit 5ae06fe31840e5ba1899086f3b8eb8194f825db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:23:30 2023 -0500
update to latest lucene snapshot that addresses all known concurrency bugs
commit f996edd63f66cf7b688f8e014b31f15450dd1d3d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 12 16:08:40 2023 -0500
fix for cqlsh by Bret McGuire
commit fa85a191c5e0bd508da584412648308888769cb9
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Mon Apr 24 16:39:26 2023 +0100
Allow CQL queries on multiple indexes without ALLOW FILTERING
patch by Andrés de la Peña; reviewed by Berenguer Blasi for CASSANDRA-18217
commit e2f3d2150ab2bc5e038f2661b8e2337d3b5cd4bf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 11 14:34:22 2023 -0500
imports
commit 68e59bc715de6d1f69d69530310655a598a54200
Author: Mike Adamson <madamson@datastax.com>
Date: Thu May 11 17:20:52 2023 +0100
Use correct bind types for vector
commit c57119254b7e065433804e36ed6ead71d53984ea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:06:19 2023 -0500
avoid calling neighborSet.size() in write path
commit f91e9ea40fa4bfc07c0bb280b7c479d473956da3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:05:11 2023 -0500
add more asserts. countNeighbors is failing
commit 1bd6789304a60762f0ef497c2137dd96f806a262
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:40:01 2023 -0500
lock out updates to the graph while we're writing it to disk
commit da7cb2984f96c06a5890ed2da21c42d946d7c754
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:06:58 2023 -0500
imports
commit 3676a86b821c6c05d9c5df33f4e4fbc9eeb1f591
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:03:14 2023 -0500
compact by building graph in memory, like it did before
commit fa1ebd013bc03def3d211932fe2a98d1bb4442d9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 16:15:58 2023 -0500
rewrite save-to-disk without Lucene code, to support multiple rows having the same vector value
bonus: we don't need to rebuild an index that we already have in memory
commit 3b328324d7c3afcda34a95c4b9fcbb9fe017384c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 17:00:02 2023 -0500
upgrade to latest lucene snapshot
commit 1f4e768e9ec3327a5…
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELECT * FROM table WHERE <vector column> ANN OF <vector value>;
After:
SELECT * FROM table ORDER BY <vector column> ANN OF <vector value>;
commit 226266ef124fc220819d328ed8547c5d86626c4b
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 18:58:16 2023 +0100
Implement getInstance(TypeParser) for VectorType. Fix losing data at startup.
commit 2bbe88ad88e2243f12dc8df789ea9bdf208ff74b
Merge: dd0ffb64f3 a3b8661746
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 12:37:58 2023 -0500
merge ds-trunk
commit dd0ffb64f3e54e2ce1af75da52fe3ece66a49f3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:40:20 2023 -0500
reduce startup log noise at info level
commit 1ef17101def1aacbb84f3434633a9b706291aa4b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 11:34:26 2023 -0500
add partialUpdateTest
commit f816046a17809b06ad754a16c0f089c0d26909f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:57:12 2023 -0500
fix recall computation in test
commit 1c1442c36baafdf4e1641a9c0288dbd520cf7c0e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:34:02 2023 -0500
also apply tolerance for inexact results to searchWithKey, but only for size > 10
commit 39df5112c6a551b53649ad98164f2bde93cbdc10
Merge: d598206329 8ca4bba861
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:27:24 2023 -0500
merge ds-trunk
commit 8ca4bba8617e8d0fa4f48de1c479de66e3a77dd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 12:19:32 2023 -0500
rename MemtableIndex -> TrieMemtableIndex to make merge to vsearch easier
commit d5982063292b6a91b156da32454abacd36fa6980
Merge: ae96e0f423 1488a5f0b9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:03:24 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit ae96e0f42393eab3547e2314f1acdb1ff766f2a6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 29 09:02:12 2023 -0500
check for results within a percentage (I went with 5%) of the expected; the A in ANN means we shouldn't expect to find 100% of matches unless the graph is tiny
commit 1488a5f0b9c65473bb6a69c9e18c045d931a1fe3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 29 09:20:28 2023 +0800
Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id (#627)
* Vector-19: fix PrimaryKey min/max prefix ByteComparable and add reversed lookup for row id
commit 14c41f5f913dc7f7f4cfc64b932bb6b9df475f19
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 17:35:16 2023 -0500
use incremental bytes used estimate from hnsw to avoid recomputing full ramBytesUsed on every call to add
commit f8d2aaad6e96693f532d745f71e0eaabb1ddf934
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri May 26 12:02:54 2023 -0500
Update cqlsh for new syntax
commit f0a882ae1171d41bc9c6ccf224ce3be80f7128e8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:24:38 2023 +0100
Only allow VectorType to accept float
commit a29712b1365cdfac9a709e35325e85f7fa6c61c8
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 16:17:51 2023 +0100
Fix max term size for vectors at 16k is SSTableIndexWriter
commit 363d3f4e35c1a4df0f257557309bb042fd3f3ec7
Author: Mike Adamson <madamson@datastax.com>
Date: Fri May 26 15:23:02 2023 +0100
Merge CASSANDRA-18504 vector grammar (#628)
- Now uses vector<type, dimension> to describe vector
- This commit does not bring in the whole 18504 patch
only the essential grammar and type parts of the
patch
commit 18c4e35d4ce14cda7a3c03398e16edf368ad9e6a
Merge: 16158a5b48 b0f71ee2c3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 16:24:41 2023 -0500
Merge branch 'VECTOR-3' into vsearch
commit b0f71ee2c371d1003cf241c3aedd7437385bcecb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:51:27 2023 -0500
cleanup
commit 2d840eae74c079229729767731d3719d12ca1931
Merge: c01cacf300 197a3207b1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:47:39 2023 -0500
Merge branch 'vsearch' (early part) into VECTOR-3
commit 16158a5b48665f57bde649e87a09d078298932f5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 15:45:51 2023 -0500
optimize ramBytesUsed
commit 197a3207b16ec97bae4924883247e2cd6a2923bd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 13:34:19 2023 -0500
update lucene
commit a7bfcc7a6f31090c3c8443d299dabb7acd437ccd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 25 08:05:41 2023 -0500
optimize ConcurrentVectorValues.write
commit 456fb08af0b922922ea0e4f24def0184cf4b32c6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 21:25:52 2023 -0500
fix NPE better
commit c01cacf30025d181a2d86c4632042449396333bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu May 25 08:53:38 2023 +0800
Return negative ordinal if row id is not found; add failing test for null vector
commit 009fadf2ed889bcc6748f500b7fa4b1d43e910df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 18:00:23 2023 -0500
search() errors out when an empty graph is passed to it, so special-case that
commit 8c9972d7453fad46f55e6fe5e7538442235e7fd6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:46:50 2023 -0500
use serializer dimensions instead of trying to cache it from the first vector added, because we might need it before vectors are added (if someone tries to search an empty graph)
commit 8423541c4997462b65fe8f1231b0baf15dab3297
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 17:42:30 2023 -0500
fix NPEs when nulls are inserted
commit d4510e0f1934f03f66d89b7c585b3d7aac17dda2
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 24 15:54:38 2023 +0100
Rebuild HNSW graph on flush since we don't know ahead of time where UCS will want to split the range boundaries
commit 6b925531011798285a6b25d70961643092e2febe
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:58:22 2023 +0100
disable index segment compaction by default
commit 3480a80e3303c13b8c04f718f83f87489315e381
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 14:46:03 2023 -0500
move chatty logs to debug level
commit 05c5758244f6f6d4fe30fbcefbb13932f68f8891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:59:01 2023 -0500
r/m fixme obsoleted by #624
commit b09db4e0425c92491e3e9c73ebdbdb84508075fd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:56:46 2023 -0500
pre-size the intersection builder
commit 6d62d8c2546cda1f487c14dcde1303e14a18c804
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 13:45:14 2023 -0500
fix nested boolean expressions
commit b8301a86d37513b67545883548c5670131281dc1
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 14:19:41 2023 +0800
Vector-3: support partition/range restricted query
commit 8ba56fad62d21da228e3a61ed7d5971441572cf9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 10:35:16 2023 -0500
update lucene version hash
commit a4768c583eb1a029f629e5ebda274ab2fcce3130
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Revert the revert of "Use published lucene fork jar.""
This reverts commit 7a8ff12860a97d451045c9dd1978a3599a0f4679.
commit 063a176caeec0f1e298d044490aa29ecdd468811
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 22:03:32 2023 +0800
Union the results from multiple segments per index (#624)
- Union the results from multiple segments per index
- Revert intersection behavior to pick 2 most selective indexes if there is no ANN
- Fix SSTableRowIdPostingList to return END_OF_STREAM if next row is END_OF_STREAM
- Skip TermTree for vector index
- Fixes:
- VectorMemtableIndexTest#randomQueryTest
- SegmentMergerTest
- SingleNodeQueryFailureTest#testFailedRangeIteratorOnMultiIndexesQuery
- SelectiveIntersectionTest
- QueryTimeoutTest
commit 643f34c2b654b80573ba92381e386b4286f7fb63
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:25:35 2023 -0500
switch all the hnsw internals to use float[] so we don't need to keep both representations around
commit 7a8ff12860a97d451045c9dd1978a3599a0f4679
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:06:28 2023 -0500
Revert "Use published lucene fork jar."
This reverts commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd.
commit 8731eefe4122a5af0b3723a3b07884a201616e7a
Merge: e50d5e955b 53c6c90920
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:05:18 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit e50d5e955bfde6357a5bb892b2ea70d46e8cc77d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 24 08:02:52 2023 -0500
cache float[] from ByteBuffer
commit 53c6c90920dd2a48fc3334a6cddb215d3f4ebfcd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 18:29:00 2023 -0500
Use published lucene fork jar.
commit f2567cd2fb2a683e7c0fee23945baac3ceca6ad4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:56:25 2023 -0500
attempting to add back IN support but with limited success, even "basicOrTest" gets parsed as child nodes
commit 9e1f018f5c0d5832446fc6a0338aaf95c8323303
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:21:23 2023 -0500
reduce log level of some of the chattiest locations
commit efaf501fe3158da42dcd874b06014acb127f5cd9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:55:40 2023 -0500
clean up RangeIntersection Builder overloads
commit e7127d9d32392347f7010de6267521742f831217
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:46:27 2023 -0500
complete generification of RangeIterator in src/ (but tests are still incomplete)
commit c07ded6420aa25b9dfa2b3c0785178b08d286b26
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 13:04:16 2023 -0500
TermIterator is more or less replaced by CheckpointingIterator
commit 14d03ae99c69c2419dc0cca3291b4f5eaefbc054
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 14:45:18 2023 -0500
update lucene jar
commit a15b964d56dcba285c6871d354755d7aa489af1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:48:41 2023 -0500
disk and memory recalls should be the same (since disk graph should be exactly equivalent to in-memory)
commit f03680bdf8581fd13a2d772e8265e3c8a713d247
Merge: 868a6b8f11 fd1293dbaf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:08:46 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 868a6b8f11a25c20b7e59a93d65cc7e2b021a432
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 12:04:11 2023 -0500
use brute force if we expect to perform fewer comparisons that way than with a graph search
commit 2ed32bf53acaf8180130bef1aa5cebd9b6d32d13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:59:11 2023 -0500
refactor AnnKeyRangeIterator -> ReorderingRangeIterator
commit d28bad9b2df3fc84b117f9ed0a04314ecb9c0a02
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:55:15 2023 -0500
rename reorderOneComponent to limitToTopResults
commit fd1293dbaf82f054593b3150249ac00110b286cf
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue May 23 10:45:02 2023 -0500
Update python driver to one with supports the Vector type
commit 6603f7601671357928948dfb0921b0c731bb96d3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 11:06:12 2023 -0500
extract ReorderingPostingList
commit 94b4ff20ccc3a51e966b4798f5364a2d9ed4b966
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:29:33 2023 -0500
deserialize doesn't modify buffer so duplicate is unnecessary
commit 840053d212985a109cb5242c19e8a5876fdec7b3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 10:26:57 2023 -0500
use LongHeap instead of generic PQ
commit aac90dc3e0fdb745e465f044f2624228e705be10
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:47:22 2023 -0500
r/m unused defer flag from toPrimaryKeyIterator / toSSTableRowIdsIterator
commit 1fd71c1cb7f2e785237b36c6e42842fb326d2380
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:26:03 2023 -0500
remove unused method marked "to remove"
commit b96c5a9efccf3226acf2f2a08ca149d0acc0b21d
Merge: 54f41e32d3 ba8aa07dd0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 09:24:44 2023 -0500
Merge remote-tracking branch 'zhao/vsearch-row-id-iterator-for-reordering' into vsearch
commit 54f41e32d3359941ddf70d34fc8e68a842b9091d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 23 08:14:10 2023 -0500
update lucene to fix NPE in ram usage
commit ba8aa07dd09f52a72e5dffccb950a21ec37a1c11
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 16:51:36 2023 +0800
intersect on row-ids before fetching primary key for multiple non-ann indexes or single index
commit 2a69547167f8dc1b40e8a26b71cdf43a45779280
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 09:44:00 2023 +0800
Use sstable-row-id iterator to re-order ann index
- add searchSSTableRowIds to searcher
commit 04b9a606d9a1aaeeeab3ec0981a5f1300cc2a423
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
Make RangeIterator generic
commit 4ff716a7411d3ad80009d4fb97bf53af51dd5eba
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 14:57:34 2023 -0500
r/m obsolete FIXMEs
commit 175ad0edce9cef3fb29f230fb64fb1913328d101
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue May 23 00:30:12 2023 +0800
fix vector distributed test: use 1 token and ignore multi-ann-index test (#622)
commit 78adf77d9e6b7f20ece71fe665a5d1aec51f79b6
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 22 14:26:48 2023 +0100
Copy lucene snapshot to build folder in build-resolver.xml
commit 5172315dcb51e05a44af224ddfe944bca29005d0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 22 08:17:41 2023 -0500
declare lucene dependency to come from lib/
commit d4ee45b8675f79b4a1e06f97d0ce618ab08f030a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 21 05:40:30 2023 -0500
add caching and test code
commit 5102d96e2fb3460f1e541cc1b481e224d112eac1
Merge: 6afcfabe88 87d39a7b25
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 11:15:14 2023 -0500
merge cep-vsearch
commit 6afcfabe883c5627c96eda8eb488c51c06e17c91
Merge: a993dbe49a fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 09:15:13 2023 -0500
Merge commit 'fa85a191' into vsearch
commit a993dbe49a75dde4b9d5528e0816b81ae427a1dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit aadf168e47a4271cc3faeb62ec7044cbfe193893
Merge: c217120935 5bc4d4b42e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:52:48 2023 -0500
merge 5bc4d4b42e
commit c2171209350cc307f3b5fcc73a6ec09423d203df
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 22:00:31 2023 -0700
FIXME not sure why version is messed up post merge, this hacks around it
commit 2e44491d3db6a881324126490e6e1511d6082f39
Merge: 68f85f7009 0ff4566080
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 08:15:16 2023 -0500
merge 0ff4566080
commit 68f85f700995b274ec0d64654ed2692c7fc8bf61
Merge: c8843779be 2ae60e7411
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:50:16 2023 -0500
Merge commit '2ae60e74' into vsearch
commit c8843779be9fe0c3edd047754bb08f633cff5646
Merge: 3cb57548bb 93616d080c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 20 07:43:18 2023 -0500
Merge commit '93616d080c539e880e37cd22fe5f27396a7a2594' into vsearch
commit 87d39a7b252b3f3cc1f4aae0f149085f6bc83abc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 17:24:55 2023 -0700
fix intersection count, and add comments to KRI api
commit 4fdc58fced9ca350333675cb19a42169bebeaccf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 15:27:34 2023 -0700
test recall
commit 26fb1006430513a37f267a4dfb81ef473516ac7b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:19:08 2023 -0700
add tests
commit 1cbf78bbf71fff2d597ddd199fd3a758313c7bf4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 14:02:36 2023 -0700
comments
commit 65e41490f041535f1a48a726032e28f6c8febcff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:57:04 2023 -0700
rm AnnResult
commit 06b10a28c3f6f19c76b056526cdabb4c9aff1528
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 13:54:08 2023 -0700
cleanup commented-out code
commit e33236ab8b9da7a025f4c59406c52ecbc3b5ae4f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:52:50 2023 -0700
convert suspicious one-off binary search in OnDiskHnswGraph to use DiskBinarySearch
commit fbd30bce13e87c63b129db26c5c123ae9bf8f9ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 09:31:16 2023 -0700
perform disk-based binary search for ordinals
commit 77bb1b5cfcc743224bdb01d21563365caa184c21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 19 08:22:36 2023 -0700
move OnDiskOrdinalsMap to its own class, and move write() code into it
commit 8a1fd9f0a38f4c16b72d5bc7b2aeb3ee07bacc7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 18 21:35:28 2023 -0700
add ordinal to row mapping
commit db32d9c7b077133a213ac3a0da161f163c4310a1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:43:49 2023 -0500
mostly-working reordering
commit 62ba004d79aa7511f519ebb56fa8af2cb424e21e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 20:13:10 2023 -0500
turn QueryView inside out; perform intersections first for each sstable, then union the results
commit d65039a6ccd8364e8eb8ef31933cdf8dca298437
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:37:07 2023 -0500
update MIM to give individual iterators back
commit a190f1e917b93b6cabd15347dec55b1683807b1f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:59:13 2023 -0500
refactor CheckpointingIterator to just take an iterator to wrap, and index references to close if something goes wrong
commit 307fe1c20b400e17048a356f73822ef51f66f341
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 18:47:20 2023 -0500
rename IndexSearchResultIterator -> CheckpointingIterator
commit 690751b216afbe3e77ed0f89ce04d16e5c6e985d
Merge: 4b6d0f2fb5 fa85a191c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 17 10:17:49 2023 -0500
Merge branch 'cep-7-sai' into cep-vsearch
commit 4b6d0f2fb531f6832d2b24465b0982683e741dfe
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 17 16:45:42 2023 +0800
Add vector top-k filter at replica side before returning to coordinator (#618)
* added QueryPlan#postIndexQueryProcessor to filter top-k at replica side before sending response to coordinator
commit 5bc4d4b42e77180df050bdbd9f5fd18fbb37332c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 16 07:38:45 2023 -0500
failing test was failing b/c test was wrong
commit f37e40f5c5d99aeaafb33f41be0bd10bff6b5756
Merge: ae1bffa49b 1c41a095e1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:24:47 2023 -0500
merge
commit ae1bffa49b4c8757a3d9110ae065b8cb70d8e416
Merge: bbc4f72b8d 381e04aaa1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:22:16 2023 -0500
Merge commit '381e04aaa132cf7a255efad3f792852d9ec1729d' into cep-vsearch
commit 1c41a095e130bce4c7c215650f9ef3f817ae635e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:47 2023 -0500
fix computeNext to return endOfData when done
commit b845395ad7e5f4670597cd171f28b98c941d03a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 21:11:15 2023 -0500
cleanup
commit 58df11dad894aa5fb36bfbdaaf4c7dc1987fcc94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:59:12 2023 -0500
failing tests
commit ea5977620ebb07d220ad51800c9b90f470494079
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:58:52 2023 -0500
FIXME temporarily remove checkstyle
commit 715790922379ad7b5dbdea5d3358f24542762889
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 20:49:10 2023 -0500
cleanup
commit 19e47d363c856f81bda3323be02bfe18f60b51c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:55:52 2023 -0500
update NeighborSet usage
commit 920abafce00c94bbc7d6b0c86ba78a2a9d3f9540
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 17:53:43 2023 -0500
cleanup
commit d58db09004427506efe03e81364cadf44fc914c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 15:46:06 2023 -0500
pull in latest hnsw optimizations
commit bbc4f72b8d7d99db003103103c90d78db4ee04d1
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 10:02:40 2023 -0500
use IndexOutputWriter in other components
commit 381e04aaa132cf7a255efad3f792852d9ec1729d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon May 15 09:32:25 2023 -0500
switch other components to also use IndexOutputWriter
commit 5951690f5472502bcb093179e8905212c1fcf815
Author: Mike Adamson <madamson@datastax.com>
Date: Mon May 15 14:48:16 2023 +0100
Fix CassandraHnswGraphWriter to use correct output writer (#616)
commit bc5b956607fa77d53fcab9ad338776af69610feb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon May 15 17:35:42 2023 +0800
Distributed vector search (#613)
* distributed vector search
- query all replicas selected by consistency level at once with full request range
- filter top-K results at coordinator in QueryPlan#postProcessor
- skip short-read-protection, read repair and replica filtering protection because replica response will be top-k
- fail ANN query without limit or limit exceeding MAX_TOP_K
- make vector search max_top_k configurable and default to 1k
commit 6d8c94535354f162f60099deecb0523f9f71bb99
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun May 14 10:01:08 2023 -0500
fix imports for checkstyle
commit 68c20dc9ceeaf0c2169718aa4a1dce9c0f92f003
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 19:23:58 2023 -0500
add the index and ordinal mapping to the set of components that the system knows about
commit ba7efaac142c8b3181881f86020f659837c90463
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:57:22 2023 -0500
cleanup
commit 5ae06fe31840e5ba1899086f3b8eb8194f825db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat May 13 17:23:30 2023 -0500
update to latest lucene snapshot that addresses all known concurrency bugs
commit f996edd63f66cf7b688f8e014b31f15450dd1d3d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 12 16:08:40 2023 -0500
fix for cqlsh by Bret McGuire
commit fa85a191c5e0bd508da584412648308888769cb9
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Mon Apr 24 16:39:26 2023 +0100
Allow CQL queries on multiple indexes without ALLOW FILTERING
patch by Andrés de la Peña; reviewed by Berenguer Blasi for CASSANDRA-18217
commit e2f3d2150ab2bc5e038f2661b8e2337d3b5cd4bf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu May 11 14:34:22 2023 -0500
imports
commit 68e59bc715de6d1f69d69530310655a598a54200
Author: Mike Adamson <madamson@datastax.com>
Date: Thu May 11 17:20:52 2023 +0100
Use correct bind types for vector
commit c57119254b7e065433804e36ed6ead71d53984ea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:06:19 2023 -0500
avoid calling neighborSet.size() in write path
commit f91e9ea40fa4bfc07c0bb280b7c479d473956da3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 23:05:11 2023 -0500
add more asserts. countNeighbors is failing
commit 1bd6789304a60762f0ef497c2137dd96f806a262
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:40:01 2023 -0500
lock out updates to the graph while we're writing it to disk
commit da7cb2984f96c06a5890ed2da21c42d946d7c754
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:06:58 2023 -0500
imports
commit 3676a86b821c6c05d9c5df33f4e4fbc9eeb1f591
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 22:03:14 2023 -0500
compact by building graph in memory, like it did before
commit fa1ebd013bc03def3d211932fe2a98d1bb4442d9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 16:15:58 2023 -0500
rewrite save-to-disk without Lucene code, to support multiple rows having the same vector value
bonus: we don't need to rebuild an index that we already have in memory
commit 3b328324d7c3afcda34a95c4b9fcbb9fe017384c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 10 17:00:02 2023 -0500
upgrade to latest lucene snapshot
commit 1f4e768e9ec3327a5…
commit f9e589098e89417cd41ef627b3e1c7371986e3e1
Merge: b0dbc8bd57 d32eed9d68
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 11:28:48 2023 -0500
Merge remote-tracking branch 'datastax/vsearch' into cndb-7632-vsearch-rebase-20230915
commit b0dbc8bd57fde3dd473d4aae85a3063f2156106d
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Sep 18 10:12:30 2023 -0500
Update expected string for error message change.
commit f2cafdac9c3e5bf838c10e0078895d56e3271370
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Thu Sep 14 08:41:07 2023 -0700
Optimize partition-aware queries to use bloom filter (#729)
Add boom filter field in QueryViewBuilder. Here is the link to the issue: [Vector-72](https://datastax.jira.com/browse/VECTOR-72).
commit 9765b23b5b748286391b8e374bab24a31cb5d934
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Sep 14 09:47:19 2023 -0500
Vsearch 18 (#741): optimize MergePostingList.advance for the case where it is called repeatedly
commit 66c00e094942de72f2f1fdb780ac00239e0aa284
Author: Mike Adamson <madamson@datastax.com>
Date: Thu Sep 14 14:33:41 2023 +0100
Remove smile-nlp and add internal Glove implementation for testing (#743)
commit b13098e0cd5a9f961066c0059953b525f5bfa787
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Thu Sep 14 13:28:01 2023 +0200
CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows (#692)
* CNDB-7363: extend read tracking API to notify about post-filtered, reconciliated partitions and rows
* RowFilter::getExpressionsPreOrder will now return all row filter's expressions in pre-order
commit 2bca65cfe4ba864a8a6719e0e28ddf76e83c17e0
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Sun Sep 10 01:01:47 2023 -0500
Fix JsonTest and AnalyzerViewTest errors (#730)
commit a0dde9f89e02ef40d9aa87514eb7b5becfdce8e8
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 19:34:09 2023 -0500
Reject SAI creation for invalid combinations of options (#736)
commit abd5399f9c1b6feb0f5ba71b279ff5b76d0f92ee
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Fri Sep 8 11:52:53 2023 -0500
Fix exceptions when executing complex queries (#735)
* fix ReorderingRangeIterator.performSkipTo
* fix NPE
commit acc39a8281cf9163057c00ec9413219b210cc262
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Fri Sep 8 11:04:58 2023 -0500
Analyzer cleanup: add comments and fix docs (#733)
commit 4989dff0f150bb65535fa598e53290bca1f3ce0c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Sep 7 08:51:09 2023 -0500
Fix RowAwarePrimaryKey#hashCode for deferred keys (#725)
* Fix RowAwarePrimaryKey#hashCode for deferred keys
* Use recommendation from code review
commit 2268c1c806632d62cf1a29b815740207a0e2939c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:38:39 2023 -0500
optimize MergePostingList.advance -- we don't care what order we evaluate the merged lists in, since we're going to rebuild the PQ anyway, so use a fast iterator instead of a slow poll() loop
commit 6bb72c3272fbc8d12427fb0d9f00916b6f6ee9c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:22:07 2023 -0500
PKM is not threadsafe, need to allocate a new one for every request
commit f1d7c66b54f7b95e12698d7a6ed00a1e8ac958d6
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 13:10:11 2023 -0500
r/m slightly gratuitous IOException from PKM.Factory signature
commit 73e96df69a1861d5a8bda04ab69252a480a78c62
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:35:40 2023 -0500
fix updateTestWithPredicate by removing broken assert (did not account for deleted rows)
commit 36327a93b8c1651acd7af1d8b687dd92e08021e3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:34:44 2023 -0500
add failing updateTestWithPredicate
commit 80eb443b65637dc1782cb3a745d4d56431f787d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Sep 6 11:05:32 2023 -0500
comment upsertTest
commit 3f5fbc72547aa790989d3c7bce8b92fd4bb2fb2c
Author: Marianne Manaog <91789965+marianne-manaog@users.noreply.github.com>
Date: Wed Sep 6 19:42:15 2023 +0100
Feature/Optimize SAI.getPostQueryOrdering as SAI.postQuerySort (#710)
* Optimize sorting in postQuerySort method
* Remove need for prepareFor method following sort optimization in postQuerySort
* Add testOrderResults
* Simplify creation of resultSet in the test testOrderResults and keep rows as final in ResultSet.java
* Apply DSU pattern to SAI.postQuerySort method and leverage it in SelectStatement.orderResults method
* Use Pair.create instead of new Pair
* Put back comment on ANN support only
* Optimise undecorate step by using List<ByteBuffer> instead of ByteBuffer in the Pair
* Add licence to StorageAttachedIndexTest.java
* Move StorageAttachedIndexTest.java up a directory
* Simplify map to create listPairsVectorsScores
commit 23555ab43101cd6010fe9cc320e7429779faa678
Author: qannap <130002578+qannap@users.noreply.github.com>
Date: Fri Sep 1 15:29:22 2023 -0700
Add row count field to columnFamilyStore (#713)
* Add row count field to columnFamilyStore
* Add test to row count field
* delete used import
* delete used function and separate test
* Add row count field to columnFamilyStore
* Add test t0 row count field
* delete used import
* delete used function and separate test
* resolve version
* return to updated version
* restore other tests as vsearch branch
---------
Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
commit 1eae389fadecfc6637207933fd6ae1739355dc00
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 31 17:59:53 2023 -0500
Fix NPE in LuceneAnalyzer#end (#721)
When testing the analyzer with an integration test, I discovered an NPE from a part of the code that wasn't tested.
Here is the code:
https://github.com/datastax/cassandra/blob/9cabec504d9f70ddfb345122dc4defed56a40f85/src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java#L91-L100
We build the Analyzer to use one of its methods, but not to analyze any text, hence the NPE.
Here is the NPE:
```
ERROR [Native-Transport-Requests-1] 2023-08-31 14:35:38,695 QueryMessage.java:121 - Unexpected error during query
java.lang.NullPointerException: null
at org.apache.cassandra.index.sai.analyzer.LuceneAnalyzer.end(LuceneAnalyzer.java:110)
at org.apache.cassandra.index.sai.plan.StorageAttachedIndexSearcher.filterReplicaFilteringProtection(StorageAttachedIndexSearcher.java:99)
at org.apache.cassandra.service.reads.DataResolver.lambda$preCountFilterForReplicaFilteringProtection$6(DataResolver.java:300)
at org.apache.cassandra.service.reads.DataResolver.resolveInternal(DataResolver.java:341)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReadRepair(DataResolver.java:244)
at org.apache.cassandra.service.reads.DataResolver.resolveWithReplicaFilteringProtection(DataResolver.java:284)
at org.apache.cassandra.service.reads.DataResolver.resolve(DataResolver.java:130)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.waitForResponse(SingleRangeResponse.java:59)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:65)
at org.apache.cassandra.service.reads.range.SingleRangeResponse.computeNext(SingleRangeResponse.java:31)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:123)
at org.apache.cassandra.service.reads.range.RangeCommandIterator.computeNext(RangeCommandIterator.java:45)
at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.db.partitions.PartitionIterators$1.hasNext(PartitionIterators.java:154)
at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93)
at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:892)
at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:518)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:495)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:341)
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:102)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:290)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:394)
at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:111)
at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:131)
at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
```
I am still new to query execution, but I am guessing the unit tests skip this filtering code. I added an integration test that fails without the fix.
commit ea97cc414ef3e68ad30e8c666ecf04b719352901
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Wed Aug 30 09:10:14 2023 -0500
Fix failing KDTreeIndexSearcherTest (#717)
#680 included a change that makes 5 of the KDTreeIndexSearcherTest tests fail. This PR essentially reverts back to the old behavior, and all of the failing tests pass.
commit cee55b1b2139627b331d9f7225c522ca5bee299b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:56:24 2023 -0500
Remove ability to configure unique query_analyzer (#712)
commit c248f905cc13833f9857b9d750a60091afc5c38a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:18 2023 -0500
Fix failing distributed SAI tests using : operator (#715)
commit 367e84adf0663e8baab3f9195f3eb81412640b8c
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 21:55:03 2023 -0500
Test compound predicate queries for : operator (#716)
commit 0ea0d95f52ac18522a0cb5a09c1c9ee00069380a
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 29 11:10:02 2023 -0500
Refactor SAI analyzer configuration; add built in analyzers (#711)
The current `OPTIONS` configuration for the `index_analyzer` does not correctly model the available configuration options. In order to make the configuration more directly match the single tokenizer, and the lists of filters and charFilters, I propose we update the expected JSON schema. Further, we add some default analyzers to simplify the configuration of generic analyzers.
commit 67ef0f3794cbd2b9ff41b632205dd087a992e674
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Mon Aug 28 11:58:30 2023 -0500
Return more specific exception on incorrect usage of : operator (#705)
* When the `:` operator is used for indexes that do not support it, the current error is a recommendation to `ALLOW FILTERING` or an `AssertionError`. Instead, we should match the `LIKE` behavior and reject queries that attempt to use the `:` operator on columns that are not properly indexed. The `LIKE` conditional is slightly modified to simplify the logic for rejecting unsupported queries.
* When the `:` operator is used for `DELETE` and `UPDATE` queries, return a helpful error message.
* Update the `toString` method on the `ANALYZER_MATCHES` enum. The `toString` method is used to generate error messages, and it is confusing to see an error message with `: '<term>'` when it really should just be `:`.
commit c26cbfa2298d422813e02b439f4db6494bb64a84
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Mon Aug 28 14:12:15 2023 +0200
Additional testcases for read query tracking with different data models (#698)
* Additional testcases for read query tracking with different data models
commit bf143a2b3fc97979e5e083f8eac7aca017cd618d
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 24 13:28:47 2023 -0500
Ignore default analyzer settings when classifying an SAI as analyzed (#706)
### Problem
When creating an SAI, the `AbstractAnalyzer` creates an analyzer in cases where it should not because the logic is too naive.
### Details
Users can create SAI indexes with a `NonTokenizingAnalyzer` using the following queries where `?` can be replaced with `true` or `false`:
```
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'normalize' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'case_sensitive' : ? }
CREATE CUSTOM INDEX ON table(some-column) USING 'StorageAttachedIndex' WITH OPTIONS = { 'ascii' : ? }
```
The options are known as "non tokenizing analyzers" because they "analyze" values but do not "tokenize" them. Practically speaking, these analyzers are projections where each input has one and only one output. Lucene calls projections "filters".
The `StorageAttachedIndex` default is for normalize = false, case_sensitive = true, and ascii = false.
When a default value is passed for one of the options, the `AbstractAnalyzer` incorrectly classifies the index as "analyzed". Instead, the `AbstractAnalyzer` should return the `NoOpAnalyzer` and the index should not be classified as "analyzed".
### Solution
* Only construct the `NonTokenizingAnalyzer` when `normalize`, `case_sensitive`, or `ascii` are configured with non default values.
* Do not consider an SAI as analyzed if `normalize`, `case_sensitive`, or `ascii` are configured with default values.
* Updated some tests to use `=` on the non-analyzed SAI
commit 7a6c4755f150ad647aae2bc806d8b91c423db14b
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Tue Aug 22 15:57:08 2023 -0500
Add ANALYZER_MATCHES operation; Disallow = on analyzed SAI columns (#702)
* Adds the `:` operator, a.k.a the ANALYZER_MATCHES operator. This operator only works on SAI indexes that are analyzed where analyzed means the values stored in the index are derivative values from the raw value in the column. The new operator uses the enum value `100` to prevent future collisions with the upstream CQL implementation.
* Disambiguate the `=` operator so that it cannot perform matches on an `SAI` column that is indexed. The primary motivation for this change is to prevent surprising matches. If a query is attempted using `=` where the target column only has analyzed indexes, the query must include `ALLOW FILTERING`.
* Updated many tests.
commit 77cb7f265e40600288fe90fffd74ef6e428f87fe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Aug 18 10:48:54 2023 -0500
comment
commit 2dcdd9fd7464a6e26185bfa4ac3cd44b02574fb1
Author: Michael Marshall <michael.marshall@datastax.com>
Date: Thu Aug 17 16:55:51 2023 -0500
Fail SAI index creation when using analyzer on column in primary key (#699)
* add testStandardAnalyzer
* debugging wip
* Fail index creation when using analyzer on column in primary key
* Remove unnecessary debug logging
* cleanup
* Reject all attempts to add analyzer to index on pk columns
* Rename noop analyzer test and add explanation
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
When the target column for an SAI is part of the primary key and is analyzed by the LuceneAnalyzer, queries do not return correct results. Until the underlying feature is solved, do not allow these kinds of indexes to be created.
This change is covered by a test to verify index creation and search works correctly for the `NoopAnalyzer` and another to verify index creation is rejected for the `LuceneAnalyzer`.
commit 4f5585be198a949b0264eb64d0403e2adda9ab52
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Aug 17 17:41:18 2023 -0500
default vector cache size per segment bumped to 4MB
commit 9464bfe3e2bbd958f4f1d60e799019582a9836cb
Author: Marianne Lyne Manaog <marianne.manaog@datastax.com>
Date: Thu Jul 13 20:32:36 2023 +0100
Deserialize ByteBuffer into a query vector once by moving the deserialization up the call stack
commit bc2ef7ff721493b20b1bb76dd2ae37b43d365071
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Wed Aug 16 13:27:12 2023 -0500
VECTOR-79: Simplify population of VectorCache (#695)
* use BFS at higher levels so we can avoid a separate cachedNodes set
* try to naively cache as much of level 0 as possible
* add VectorCacheTest
* cleanup and revert to intended behavior of not caching L0
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1c58e431e55d8b327314ddb82b30bf8eced59269
Author: Shaunak Das <ShaunakDas88@users.noreply.github.com>
Date: Tue Aug 15 11:24:30 2023 -0500
Avoiding sorting node ordinals for level 0 (#697)
* avoiding sorting node ordinals for level 0
commit 16eeaf89a8ac39fee616f7a71a3f9a4b4170521e
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Tue Aug 15 08:47:48 2023 -0500
extract OnDiskVectors to top level class so we can use it in debugging tools (#693)
commit 2271a230761a416fbd5c016d47684b388be93989
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Aug 9 09:59:45 2023 -0500
VECTOR-77 fix reading vectors past the first 2GB mmap region
commit 5705daa97f848454d8d29ab08234296906b3dbc4
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Thu Aug 10 18:06:28 2023 +0100
VECTOR-76: Add back vector size guardrail (#694)
patch by Andrés de la Peña; reviewed by Brandon Williams and Maxwell Guo for CASSANDRA-18730
commit cb3bedccc255476300528183e8ba30a4e390f57f
Author: Jakub Żytka <jakub.zytka@gmail.com>
Date: Tue Aug 8 16:52:16 2023 +0200
CNDB-7390: fix static rows not being tracked by ReadTrackingTransformation (#691)
commit a8740fb8dae267fb8ddef1a7f5c77e948cd2a0ff
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Fri Jul 28 15:42:42 2023 +0100
Add system property to reject non-float vectors, true by default (#689)
commit 5a4e439dc73b479d79bd427472d382a53b6fe680
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jul 21 13:10:44 2023 +0200
r/m failing emptyIndexTest until we can address it (VECTOR-59)
commit eec8b7b387cd041fbd54aef8f0090773962a9e49
Author: Piotr Kołaczkowski <pkolaczk@gmail.com>
Date: Tue Jul 18 17:36:43 2023 +0200
VECTOR-69: Fix LWT test failure caused by null key bounds (#686)
* VECTOR-69: Fix LWT test failure caused by null key bounds
---------
Co-authored-by: Jakub Zytka <jakub.zytka@datastax.com>
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 033ae1a9567dfef88d0214dd34716f2468c5d4e3
Author: Andrés de la Peña <adelapena@users.noreply.github.com>
Date: Tue Jul 18 15:23:30 2023 +0100
VECTOR-68 Vector dimensions are not being validated correctly (#679)
commit e22ff6977f986a4148f931bad4b04645bf93d2e9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jul 18 06:01:04 2023 +0200
set sai_hnsw_allow_customm_parameters=true for IndexWriterConfigTest
commit ccaba92656e78e63765da5c34f562f2c3b582bd0
Author: Mike Adamson <madamson@datastax.com>
Date: Tue Jul 11 16:12:31 2023 +0100
VECTOR-55: Review REVIEWME comments: (#680)
* VECTOR-55: Review REVIEWME comments:
- Refactor search methods to only return long variant
- Remove SSTableQueryContext in favour of QueryContext
- Add checking to SegmentMetadata.toSegmentRowId method
commit 338c8507de197570f4879019e0e1fc8e1f77025e
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jul 10 21:46:56 2023 -0500
Fix IndexError when querying a list/set/map column.
commit 1b4d6de7e0a051b95c0a9b62c8bc167e23ec65ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 21:03:51 2023 +0200
add test that fails before decompose fix and passes after
commit 664911ac7de336d3fcffd90083e6a220069bd4e2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:44:28 2023 +0200
re-use TypeUtil.decomposeVector
commit e17493c95ac247ca1da7caa3ccd58be866958d6a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:28:56 2023 +0200
fix casting in deserialize
commit d169ff68f6576ce5b150818cb0a52a12abf6f2e0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sun Jul 9 13:30:02 2023 +0200
fix NPE when using ReadExecutionController.empty
commit aa899f55f77df5b157e3e6afa930c377e4cacc57
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Fri Jul 7 10:12:28 2023 +0800
assert no single partition trace for SAI request
commit 277251e1aad15e2a5aedaaaae3170d574d224fe5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 29 12:08:05 2023 -0500
- add Tracing.traceSinglePartitions to avoid cluttering range and index queries with noise
- update QueryViewBuilder to include a count of sstables per index accessed
commit 4682a417fdaaefb135e57b0ebe174c1ff49b7e94
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jul 6 12:13:20 2023 -0500
simplify
commit 52d19a1ef5605aebdb87385786dd323057267164
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Aug 30 14:17:12 2023 -0500
Squashed commit of the following:
commit 4c4c1ff802c7667e1289a4e730cc34bf6b3bd007
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jul 5 12:03:22 2023 -0500
add test for creating index after data is added
commit 0acaae364c19ed7556b66b5576762ecc131af5fd
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Wed Jul 5 09:23:01 2023 -0500
Revert "enable segment compaction by default for vector indexes -- we should prioritize read performance"
This reverts commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5.
commit e3ddb6426de4e89b5fe61ed742b69257865c2d3c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:49 2023 -0500
VECTOR-64 set sstable_growth to 0.5 by default
commit 3ca75046e6802ce5836a654459b01948c83d1d7b
Merge: a4fa072833 10a2a31ae8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:54:18 2023 -0500
merge ds-trunk
commit a4fa0728337ebf236ad03f6ad708cf51c3dc08c5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:24:41 2023 -0500
enable segment compaction by default for vector indexes -- we should prioritize read performance
commit 08153b38549d700505d064c25d7a2070c2ff07aa
Merge: 6a063b4f84 1a45fdc5f9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:18:38 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 6a063b4f8458f119d4b99bd6b78cbc8832f27f13
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 28 10:13:51 2023 -0500
VECTOR-63 add cassandra.sai.hnsw.allow_custom_parameters defaulting to false to control setting maximum_node_connections and construction_beam_width, which can cause memory usage to explode if used incorrectly
commit 1a45fdc5f92e9c82dba6e9e56a5e341806070b13
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 26 15:57:10 2023 -0500
Add the option to configure the file_cache_size_in_mb in a -D like it is in 6.8-cndb.
commit d2e45dab5b57f20fbdd6250d6dc186c83d783ed3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:13:00 2023 -0500
VECTOR-62 don't flush a graph that only contains deleted vectors
commit 4a9afe205c20b1f863e89d4e02138557d225c3fc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 22 16:11:11 2023 -0500
r/m obsolete comment
commit a925dc22696bcbebcad17a460441e8b84ff97660
Merge: 7c7d160dd9 2b22984fe1
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 22 09:03:08 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 7c7d160dd90624b0e365287ef0a2f0b7ef94da53
Merge: a14b387372 7a5e374cea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 21:42:51 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a14b387372ef3f220ece68b019acaafaf0d56f42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:52:13 2023 -0500
move new options to jvm17-server.options
commit dc67582585a90628e026a693497793b7d8a298ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:47:27 2023 -0500
copy jvm11 options to jvm17
commit b6717410f925486a858c4c9414e417eed50950d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:30:48 2023 -0500
enable simd
commit 92b9e6e2f136254fb1f8e50732809ea529617f48
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 15:31:09 2023 -0500
make it run under jdk 20
commit 3e985953637819ac8504955552ce0feba29bdd40
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 20 17:13:56 2023 -0500
update to lucene with simd
commit 7a5e374cea785678670444121ad36be6170dd59b
Merge: 5e5ae4de82 e9ddd5f0d9
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Tue Jun 20 15:00:24 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 5e5ae4de824a7e838bbdfdbabc92e8b256911e3f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:57:18 2023 -0500
we weren't calculating length correctly (supposed to match ordinals range) but it didn't matter b/c length() is not called on the search path where these classes are used. make this explicit by throwing in length implementation.
commit c0d4055a0bc6914d0220fbd4912d6355e9f6a1d9
Merge: f2c9a7cb59 c605f8c9f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:20:02 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit f2c9a7cb59b113a349992011b739079106274c1e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 10:19:34 2023 -0500
VECTOR-61 remove deletedOrdinals collection that was not threadsafe wrt add and remove operations; create postingsByOrdinal map that allows Bits operations to look up the postings, instead
commit 9744a13b405cbeb1b68894bb08cc24796aedd8f7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 19 09:13:24 2023 -0500
add LVT.testMultiplePostings (works fine)
commit c605f8c9f037e724544b21fe2aadcf0d8503d86f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 17 12:44:36 2023 +0800
skip replica-filtering-protection for ANN requests
commit 71935bd07c9218ba70f1ffde7b6db1e88009e529
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:20:06 2023 -0500
forgot that maxBruteForceRows has to stay non-final for Byteman
commit b264afc07727305ce88ab2f18b8a2643d2510bd3
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:10:19 2023 -0500
VECTOR-54
- validateIndexable also checks for very large vectors (that explode to NaN when we take the cosine)
- check query vectors with the same criteria as vectors to index
commit 66c31a6dcfcbf81b7657ac9c4119803e026d150a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 20:07:41 2023 -0500
if CFS.apply catches an InvalidRequestException, keep the same type when we re-throw so it gets back to the client as intended
commit e0bdfdbe421a1cca30746604cda9ef5b992002bb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:33:37 2023 -0500
add [failing] emptyIndexTest
commit 75cba96f90730b32ea4a39375c72aaff4ffd6221
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 15 16:52:46 2023 -0500
re-use bitsets across searches
commit 2af2ce8a321f7fea268d81c393da7f8f6d886776
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:05:52 2023 -0500
more asserts that graph is in sane state when we write it
commit a63492686d13e12b799acdf50c6f0c20dd680f21
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 17:13:23 2023 -0500
update lucene
commit 4c6b43e3606b8c9eb5e4ec4cf0817dc9b061c555
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 16:24:33 2023 -0500
add debugging information when node is not found on level
commit 81c8e730bd4ca314a81bc13e30b197603a8eb005
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 16 11:57:24 2023 -0500
add similarityWithAnn test
commit 87c8da9a5dddd10215a84c07a09dbaaecaa522b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 14 18:26:02 2023 -0500
add tests to confirm that zero-length vectors are rejected
commit 2510c1b987d846821e7095fbc2c368fa51d9a15d
Merge: a3d9e33554 e86f91c568
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:09:22 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit a3d9e3355470f6400328cd7ac1eb07ff06f8e00b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 17:08:43 2023 -0500
use ArrayList in CVV instead of HashMap since we know the keys are consecutive
commit e86f91c568f8a82e83e61bd0dd768478a7584cfc
Merge: 77c1bc11f3 8a7a6d9c4f
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:25:06 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 77c1bc11f31e4ff001c7eff1e6fe50812e136a45
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:24:56 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 8e04280312583a18085e7e7b9d31810790b039f4.
commit cae30e9879c463d80de2a2b7e4a642979803c13a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Mon Jun 12 09:23:39 2023 -0500
Revert "Revert this after CNDB-6974 is fixed."
This reverts commit 67a2b7eba3c957a993726d37706643979e92a3a3.
commit 46d1a45de84f40311f193beba3ca5fc3fe7450d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:03:03 2023 -0500
replace our TODO entries with VSTODO to make them easier to find
commit 415b3f29e8e7f1526e28655308a1937a1175e575
Merge: 2fd8b848be 481d29721a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 09:02:16 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 481d29721a7993e9babd44f792be55aca925d41a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 12 21:26:43 2023 +0800
Fix VectorMemtableIndex to handle max token and fix Segment#intersects (#670)
* Fix VectorMemtableIndex to handle max token and min/max bound
* Fix Segment#intersects to compare bound instead of token and add tests for range search
* make brute force rows per query for VectorMemtableIndex
* apply feedback on Segment#intersects
* add comments to VectorMemtableIndex#search
* Fix SegmentTest
commit 2fd8b848be82e62999e64b57cb9800d03de8e953
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 12 07:51:18 2023 -0500
clean up a couple REVIEWME
commit 3c066e07be6188c3f48c51c02cfc99b395e2a6bc
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 14:26:42 2023 +0800
fix flaky VectorDistributedTest
commit 6c719a752e90e031916dddf6cc7b7db23627bc38
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:48:01 2023 -0500
fix use of bruteForceRows -- should be larger of limit,maxBruteForceRows
commit 0a2279d7f010bcb68053bb14ea2974295b21d1c4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:46:34 2023 -0500
use maxBruteForceRows when deciding whether to skip ANN
commit 1859f7355231ce7972e3d4b7af70e5d2967da516
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:27:03 2023 -0500
simplify partitionKeySearchTest using euclidean distance
commit 4f40e1c2d9b1c30675eeaee7d800a8113cce97c0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:09:24 2023 -0500
typo
commit 7dbe85ac9a248aa6a0985839df1f40ceb2c08e6c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:08:23 2023 -0500
rename methods that return Bits but had bitset in their names
commit 2861bfdba8f2638a1e3f9e81df9cad3878d0e544
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 10:06:51 2023 -0500
simplify skipANN
commit fbfe4bcdd03a66f4dedfa91ce9d06206f89827cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 16:36:56 2023 +0800
optimize VectorIndexSearcher#searchPosting
- return empty posting if key range is not found in current sstable
- return empty posting if all row ids are shadowed
- skip ANN if matching row ids are less than limit
commit 168911b70888a3dda0ddf4f615af1b7ff54f43e3
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 20:54:41 2023 +0800
fix data visibility issue during flush: make sure SSTableAddedNotification is sent before MemtableDiscardedNotification
commit c5dd47e3ce1c7276703fa3f89a9932a3b2cd43b7
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 10 12:31:53 2023 +0800
fix primaryKeySearchTest and partitionKeySearchTest to use correct selector and expected results
commit 2e636529c63d0d726868d6df9b3e11d15d3871d8
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sun Jun 11 08:31:39 2023 +0800
Vector-48: index#update is not triggered by partition/range deletion,… (#665)
* Vector-48: index#update is not triggered by partition/range deletion, we have to fix VectorMemtableIndex to include shadowed primary keys
- during flush, append ordinals that are removed by partition/range deletion into deletedOrdinals set
* add comments to VP
* revert redundant variables
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit daa2623b880e4b82d99204fe718db22e52ae3b42
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 13:54:38 2023 -0500
use mmapped builder in OnDiskHnswGraphTest
commit 553778a22cfa8757daf3b3ec0e2cfa6fc7ba29cf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 10 08:34:41 2023 -0500
fix logic in partitionKeySearchTest, test still fails
commit a7f5a2f824085691da80ef62e37a430e5637c31c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:57:34 2023 -0500
ignore invalid vectors during build against existing data, instead of failing the build
commit 2af203646a2e57fca0a2521c31ba4269b0d8f326
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:43:47 2023 -0500
switch from ignoring zero vectors to throwing IRE
commit 4c98fbb447a275cbf557ae0adf8f4f8a30fba17f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:15:57 2023 -0500
Revert "if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)"
This reverts commit 6e5eab787a3add985dc30eca82c5dee2646f5564.
commit da99598a5d11dfa4ecac3b1eeeb9d19a4bad21e5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 17:09:30 2023 -0500
don't attempt to add zero vector to cosine indexes
commit a77792447d33600f69dd0a75a280a2b6dac51389
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 16:43:34 2023 -0500
add failing partitionKeySearchTest
commit b061dd1c77f30c67324da4f94ea8b5a22d50fba7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:52:08 2023 -0500
inline the test ops
commit 6e5eab787a3add985dc30eca82c5dee2646f5564
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:51:09 2023 -0500
if there are no rows in the range to search, return EmptyPostingsList instead of giving the hnsw search an empty Bits (which is a pathological case for the search)
commit 518c055dfeabacf0fa0863e40bf814058f2ad981
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:28:57 2023 -0500
cleanup
commit e922b03ffe5436dc1341dbe20ee7aaca9901058f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:27:53 2023 -0500
failing tests for primary key search
commit 428e4b713e25a5820d02ca790c30e9f1477c0f44
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:21:59 2023 -0500
cleanup
commit 5319bcdf3b694e861509b2a9f549f39bf51cd253
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:20:02 2023 -0500
move testInvalidColumnNameWithAnn to VectorInvalidQueryTest
commit d8b64845e59c0bd3658a16df21dcc75564417e3b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:12:40 2023 -0500
upgrade lucene to reduce Integer boxing on build path
commit a62356e49c0a32f61732b8525d2bb655f46dd767
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 10:11:55 2023 -0500
cleanup
commit b754eb7f021d10d0ae8de83ebdf2ce2f19ce59d8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 9 08:29:34 2023 -0500
replace CHM with NBHMLong to avoid boxing
commit 53d232ad2bcddefd72b10542f703cf36728c05f5
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Fri Jun 9 12:17:46 2023 +0200
STAR-550 Handle SAI AbortedOperationException
AbortedOperationException is thrown by SAI when index search
hits a timeout. Now instead of allowing this exception to
bubble up to the top and be eventually logged as error, we catch
and swallow it in the InboundSink after creating the query response.
Additionally now we also we set a proper error code (TIMEOUT)
in the response, so the client has a hint on what happened.
commit 5e0c5eb3c54b58002162ce3e39d0891643e1f663
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Fri Jun 9 08:44:10 2023 +0100
CNDB-7037: Check that memtable exists before flushing and avoid IOOBE (#664)
For offline services such as the compactor it's possible to not have
live memtables. Account for this during flushing after removing indexes to
avoid triggering an IndexOutOfBoundsException:
Error happened while updating the schema
java.lang.IndexOutOfBoundsException: Index: -1
at java.base/java.util.Collections$EmptyList.get(Collections.java:4483)
at org.apache.cassandra.db.lifecycle.View.getCurrentMemtable(View.java:106)
at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:1115)
at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:552)
at org.apache.cassandra.db.SystemKeyspace.setIndexRemoved(SystemKeyspace.java:576)
at org.apache.cassandra.index.SecondaryIndexManager.markIndexRemoved(SecondaryIndexManager.java:837)
at org.apache.cassandra.index.SecondaryIndexManager.removeIndex(SecondaryIndexManager.java:420)
commit c241f7d41ee46c28f60d691611f32e071dac6684
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 19:19:09 2023 -0500
read neighbors using .intBuffer since we know that the HNSW searcher always reads all the neighbors
commit 32f021a4f0623e57d82d70c54350eab6f0f57db7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 18:08:30 2023 -0500
VECTOR-51 leave vectors as ByteBuffers during compaction to avoid 2x memory usage incurred by keeping them around as float[] as well
commit 32d8cbbd19301228530d3d84f7a7e4120b4d4422
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 17:37:08 2023 -0500
r/m node cache from query metrics since there is nothing the operator can act on there
commit 890df8e6c36c5c9e1314dbcc0a681415401120dd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 14:15:25 2023 -0500
add more information to exception when reading row offsets goes wrong
commit 5c7fe0cfed1edb10f81b134f8949dd1b5abb2781
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:26:17 2023 -0500
revise ordinals cache as follows:
// cache full levels including neighbors up to neighborsRamBudget, starting with the top level,
// but always cache all levels above the bottom two levels -- this will be ~1% of the graph.
// then on L1, cache at least the offsets
// L0 we do not cache since we only need one extra seek (no bsearch) to read the neighbors offset
commit e131014c5cba59624e0ed9fac142abe5340e2fc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:53:10 2023 -0500
add deletes test
commit 7c5a880f007535d9f03b7c7d9d3e84dcc79ef3d7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 13:34:42 2023 -0500
must use graph.size for bitset size -- not correct to use max rowid, since deleted ordinals will not have a rowid but will still be in the graph
commit 0654fbb61aec86deae69edf83fb1cf653ed7df65
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 8 08:44:28 2023 -0500
info -> debug for validatePerIndexComponents
commit b289db2362b928f29b6c5c6289e09b33f0be1e88
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:08:18 2023 -0500
update lucene
commit 347dae644b0d395014d8faf2affd48a6b2546e96
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 18:23:41 2023 -0500
undo burntest logging config change
commit 87a6a84f38d0a0423acd9a0dda67d386876681b5
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 15:20:06 2023 -0500
add debug logging
commit 20013bc865c7fb2c34111c98362aaf997dc724dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 14:50:39 2023 -0500
add a bit more information to exception when we fail in index construction from disk
commit 86389ceae8188b8e427d8a528fada2f913f1f633
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:25:31 2023 -0500
reduce test vector count from "all of them" to 200
commit 20177a1c84ba34babc82d8c1aa4e3436321b5d9f
Merge: 2e2fce09ee f2c697ac46
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:21:20 2023 -0500
Merge branch 'wip' into vsearch
commit f2c697ac46348301260c5947ade4ebed7dee90ee
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:28 2023 -0500
multipleSegmentsMultiplePostingsTest
commit 59e4a1c53e1c19a7ebba4a0834229e61f9e4d3ce
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 12:16:04 2023 -0500
cleanup
commit 2e2fce09eeb5bef4780cecf7c7b5e492a810f67a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 23:33:00 2023 +0800
fix OnDiskOrdinalsMap to seek to segment offset before reading (#661)
commit 9502f1040229f2be6619f7e4497dc25ca5126b39
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 10:04:38 2023 -0500
fix build
commit 3535444b57366fe705e0fd8c1ca095e60ed2a706
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 09:19:41 2023 -0500
add write-only workload
commit a112b0965cb298f293b6e384bafc92a1d5a1fb8f
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:15:53 2023 +0800
VECTOR-44: improve in-memory partition-restricted query perf (#660)
* VECTOR-44: improve in-memory partition-restricted query perf
- using post-filter top-k processor instead of ANN: 14x improvement on partition-restricted query in LongVectorTest
commit cf1ec3be50ddbe40ae99dfc34eea51b9fc5c2648
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed Jun 7 21:06:37 2023 +0800
Vector-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment (#659)
* VECTOR-45: fix bitsetForShadowedPrimaryKeys to skip shadowed primary keys outside of current sstable/segment
* return empty iterator if results are empty instead of ReorderingRangeIterator
commit 87217b4f49935fbbd13c73f4d0aadb84836696a7
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed Jun 7 07:31:46 2023 -0500
don't make areL0ShardsEnabled final, it breaks mocks
commit 2bb3a299df43561369544b662674872575975b7c
Author: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed Jun 7 10:56:08 2023 +0100
VECTOR-42: Check columns exist to avoid NPE during ANN expr binding (#657)
Using an non-existent column with an ANN expression triggers an NPE like
so,
java.lang.NullPointerException: null
at org.apache.cassandra.cql3.ArrayLiteral.forReceiver(ArrayLiteral.java:43)
at org.apache.cassandra.cql3.ArrayLiteral.prepare(ArrayLiteral.java:54)
at org.apache.cassandra.cql3.Ordering$Raw$Ann.bind(Ordering.java:178)
at org.apache.cassandra.cql3.Ordering$Raw.bind(Ordering.java:139)
Use TM.getExistingColumn() which throws InvalidRequestException if the
column is undefined.
commit f29d4528fc43f34889e560acafa810eee1b88ba9
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 21:18:47 2023 -0500
restore query timeouts
commit 6e5734e52b41ab1e355d202ad0433440828fbb75
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 19:08:52 2023 -0500
looking at performance over time in LongVectorTest
commit f96af8f0b9185163a49156bdab107801d2588bf0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:00:23 2023 -0500
log level = info for burn tests
commit 1247cf372927cc185be5c9767a06fdd15f102dbe
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 17:48:00 2023 -0500
write the in-memory deleted ordinals to disk at the start of the postings component
commit e319d71bd6b0d265f616c807dcaf4a9d9fc38aef
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:36:27 2023 -0500
don't allocate unnecessary objects on the happy path of no tombstones
commit 6d146a8460a17d146de756aab2ccbbf2abdb967a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:19:56 2023 -0500
VECTOR-23: force UCS to not shard L0
commit 79d6e177093312bc1b951d6740ad45ce8a6c5875
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 16:01:05 2023 -0500
mark AutoCloseable
commit 5ec2f7d7f792bfa2dcf582ea4eed3fa8745fdaff
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 14:37:15 2023 -0500
Support string literals as vectors
Co-authored-by: Andrés de la Peña <a.penya.garcia@gmail.com>
commit 73a53c37536835e462e3cf17c452027c7aa591ed
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 3, this time with the code changes) fix race conditions across concurrent inserts + searches in memtable
commit 51f4f419d31916a81088f171b158f1e002b5c800
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
(take 2) fix race conditions across concurrent inserts + searches in memtable
commit 94097a1f7cf1831c03e52b1f539bdc2e42fa038b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:35:32 2023 -0500
Revert "fix race conditions across concurrent inserts + searches in memtable"
This reverts commit c4d41b492190fb2644f83bf4902acf71d1e4f891.
commit c4d41b492190fb2644f83bf4902acf71d1e4f891
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 12:25:39 2023 -0500
fix race conditions across concurrent inserts + searches in memtable
commit 22143a8c5db160ef0fa7687ff3e8cb227de160f0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:28:30 2023 -0500
fix AOOB in bruteForceRows logic
commit b5624f62c88378e03260399440edf4d335ccc3af
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 10:12:53 2023 -0500
call deserializeFloatArray (which gives float[]) instead of deserialize (which gives List<Float>)
commit 13ad40718737b54652c875d81ec20992b5828777
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:36:32 2023 -0500
create CassandraHnswGraphBuilder with concurrent and serial implementations, so that single-threaded compaction doesn't have to pay the concurrency overhead
commit b83f4e4007e3d597963023b8ae32f4d0934b792d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 08:00:44 2023 -0500
add ASL header to injections.md to make CI happy
commit 16074336b1322d4bade40eb71562ca50221399dc
Merge: 227c6c13a3 1ff6992354
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:48:04 2023 -0500
Merge branch 'VECTOR-37' into vsearch
commit 227c6c13a3cfe42484fc560d629d54a93f8265e0
Author: Jaroslaw Grabowski <jaroslaw.grabowski@datastax.com>
Date: Tue Jun 6 13:02:37 2023 +0200
CNDB-7007 return expired tables level from getLevels
commit 312d07de8d297cdede39017354b678f2fb1b1006
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue Jun 6 07:31:26 2023 -0500
randomize our brute force threshold, which will get the actual index scans exercised more
commit 0b897961869965629893c94250b7d4b1fb7c0f86
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 13:44:01 2023 +0800
Reference ANN sstable indexes in case of ann hybrid search (#653)
* Reference ANN sstable indexes in case of ann hybrid search
* simplify
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
commit 1ff6992354e8411f9f7c48a76f0aadb4a00f2789
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:55:14 2023 -0500
per-query hnsw metrics
commit d3d6ae107737020eb0ca3b00c117d5bc84027dc0
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:16:14 2023 -0500
reduce test size to prevent Jenkins OOM
commit 7765ac73fd29c9664b6f8e946c067d6f1b0a9a03
Merge: 5a04dbcf67 c5cd09cebf
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 21:15:44 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit c5cd09cebf0c63f7bafcd2f4548395efe68b12cd
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Tue Jun 6 09:38:42 2023 +0800
Skip warning log about receiving a range that is not owned by the current replica, see VECTOR-30
commit 5a04dbcf6738d7681732e28518d6eb37f74357b8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 11:21:15 2023 -0500
add injections.md from bdp repo
commit baf9cce301a39f9506bd60c5a75d2011150f2fea
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 10:02:02 2023 -0500
VECTOR-36 fix looking up indexes and SSTableContext by SSTableReader, because the SSTR will be removed from internal structures once it's compacted
commit e7fdc8454e750e2ee886c4ab27fb09abb46303fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Mon Jun 5 09:53:51 2023 -0500
limitToTopResults should skip rows that are not in the current segment (or more preciesely, have no vectors that were indexed)
commit 2f64695a90651d9e76eee0a3399032ea241c983c
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 14:55:20 2023 +0800
Vector 6 take 2: reapply Vector-6 commit and fix NPE in SelectStatement (#648)
* Revert "Revert VECTOR-6"
This reverts commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8.
* Vector-6 take 2:
- fix NPE in SelectStatement by skipping reversed() for null ColumnComparator
- fix getOrderingColumns() to return LinkedHashMap to preverse ordering columns' order
- fix VectorLocalTest compilation
- remove debug log in CassandraOnHeapHnsw
commit 9da51b315d207417de4f3e005616e275c62c74cb
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Mon Jun 5 12:38:21 2023 +0800
fix SAI test failures in vsearch branch (#649)
- BatchlogEndpointFilterTest
- rangeRestrictedTest
- SegmentFlushTest
- SegmentMergerTest
- OperationTest
- RangeIntersectionIteratorTest
commit dc53d7db5f0f8fe1c780f6d2b7580794f7b6b5fb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 13:35:11 2023 -0500
create views for ordinals map so we don't have to open a new Reader for each method call
commit c6385022ff2949c23e0dd0161a043cf4061f050c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 08:19:32 2023 -0500
add assert sstableContext != null
commit f340e7590db578b71201326e82006f72dd491047
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:36 2023 -0500
on-disk Searcher bits should not need to be growable
commit fd5cc7f989a47d03689dd29495b974697ad1338b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:37:18 2023 -0500
add asserts
commit 6436e5f4ad49881a62c1442692edfedfe31968dc
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:21:50 2023 -0500
cleanup
commit 32c8e005740d3c7ec0b1008091728e12260aecdd
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:57:18 2023 -0500
comment
commit 091d96832ff9a68f6ea7857eafd2ec7ee0c09a0b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 13:32:24 2023 -0500
r/m unused search method returning iterators over PrimaryKey
commit 4e78a4b429b0ba6690871df9a91c8ffd8d2e53fa
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:25:30 2023 -0500
we can use the slightly more lightweight RangeConcatIterator instead of RangeUnionIterator when combining results from different Segments
commit e7a9186bf3426ad025e809481e47db85a7bf7190
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:22:35 2023 -0500
r/m obsolete FIXME
commit c76ee9ea0cab5b70c535de3f50b81cf333bf94a2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 07:12:43 2023 -0500
add testAppendedGraphs
commit d00bdbfb6acf9f08d2bdfa2ef06e833e567cf9eb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:57:37 2023 -0500
move write-to-File to test class
commit b10fd1100daf4cde0291b3b4ef06f5ceb3ca650c
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:46:01 2023 -0500
fix Ref leaks in test
commit 0656e558bb51172df4e0cde1ebe92946d4dcec8b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Sat Jun 3 06:26:05 2023 -0500
fix tests to use View
commit 92797ced03ed26f9a5cce398504d6f0de519b899
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Sat Jun 3 19:19:38 2023 +0800
VECTOR-35: fix vector on-disk writer to append segments (#647)
commit 22f5c01860367586d0cc39def7d975a6d6bae784
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 18:54:41 2023 -0500
add "Vector indexes only support ANN queries" check
commit bfc0b056dbacd0a1eca86020e901f545dca7670e
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:34:24 2023 -0500
split invalid requests into separate test class
commit f66768c35920101382c1975c9d0ba0e3301b7c61
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 17:28:56 2023 -0500
add specific error message when trying to do ANN without an index
commit 63a6ff609a02e4f7d9f5499f1af48ca5e58f63a8
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 12:31:22 2023 -0500
Revert VECTOR-6
This reverts commits:
a37009c187edeba68389d239dc1b9f40519b1187
5565690fbe1056d4c159ddbe233fa22c7695320a
e7733bb8f858a16b082b8a5c64d0322db6f6271a
commit 3d496662b259470b505d88212074c436660c27ad
Merge: fab67bb134 4bdae7e362
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 11:05:53 2023 -0500
Merge branch 'vsearch' of github.com:datastax/cassandra into vsearch
commit 4bdae7e36213f5112efce085696dd24aaa0adfad
Author: Mike Adamson <madamson@datastax.com>
Date: Fri Jun 2 14:36:01 2023 +0100
Stabilise random tests using word2vec model vectors
commit 2ddc3ec6b11f6810f6c4cbaf6b62bfe46d2d1b6c
Merge: 8e04280312 1f9179002a
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 10:05:15 2023 -0500
Merge remote-tracking branch 'datastax/ds-trunk' into vsearch
commit 8e04280312583a18085e7e7b9d31810790b039f4
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Fri Jun 2 09:51:06 2023 -0500
Revert this after CNDB-6974 is fixed.
commit fab67bb13499691f813c893cd39a3bbd406653d2
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri Jun 2 08:59:12 2023 -0500
vector cache defaulting to 1MB per segment
commit 67a2b7eba3c957a993726d37706643979e92a3a3
Author: Jeremiah D Jordan <jeremiah@datastax.com>
Date: Thu Jun 1 17:49:35 2023 -0500
Revert this after CNDB-6974 is fixed.
commit 4538956b31d8c40d2e9b603b3b0a3392343e1853
Merge: 7518680a4a af4c83aef4
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:17:14 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7518680a4a17cb2589cf06a9175befc10c9eab1a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 16:14:33 2023 -0500
re-use float[] across calls to vectorValue to avoid allocation overhead (credit to Jake for the idea)
commit d20bd8aedeb6232358dd16f901e2708f217b8108
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 15:35:17 2023 -0500
optimize vectorValue() with direct access to the mmap-ed region
commit 24572025af5b7893279b5b03c58555b682f3abdb
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:33:55 2023 -0500
decomposeVector does not modify the underlying buffer, so no need to duplicate() here
commit efef35d47a24222fae300d1cbd5a801e9ed79442
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 14:29:30 2023 -0500
specialize deserializeFloatArray for ByteBuffer and FloatBuffer. this saves about 3% total CPU on search workloads
commit 63c27f5550d6c3e22d960b9a42b9097029e3b760
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 11:04:23 2023 -0500
default target size of 5GB
commit 69c55c8d07c6e1d58b4829c153f5ebf8a4343669
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Thu Jun 1 07:20:38 2023 -0700
make the on-disk hnsw code threadsafe using FileHandle.createReader (#641)
commit b194cae2e56133b946d0231768254064a47f01b6
Merge: 7ee6422084 23c2891e7a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Thu Jun 1 06:22:17 2023 -0500
Merge branch 'ds-trunk' into vsearch
commit 7ee642208485c9e2ae62aa662dbbddc7b295bae1
Merge: c286ec0ee2 a37009c187
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:41:48 2023 -0500
Merge branch 'VECTOR-6' into vsearch
commit c286ec0ee231ededded18e81357edebe72064c59
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 20:40:21 2023 -0500
add testLargeGraph
commit a37009c187edeba68389d239dc1b9f40519b1187
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Thu Jun 1 08:50:07 2023 +0800
cleanup unused code
commit 6246dbe3a970a0f4672701632003cdf576705c24
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:42 2023 -0500
comment
commit c7420440c7d3c81eac96e716148d570ae3cceb1d
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 16:43:37 2023 -0500
encapsulate shadowedPrimaryKey better
commit a1a25c33b8bec9d78354dd2c1d992444db2bf76f
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 15:57:52 2023 -0500
make hnsw cache size configurable, and make the default 128KB (roughly the size of the bloom filter for a 1GB sstable)
commit 55747d2351e915645697fa8dec404c749540456f
Author: Andrés de la Peña <a.penya.garcia@gmail.com>
Date: Wed May 31 19:28:47 2023 +0100
Replace FunctionParameter.sameAsFirst by FunctionParameter.sameAs, allowing type inferrence in both directions for vector functions
commit 5565690fbe1056d4c159ddbe233fa22c7695320a
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 12:59:42 2023 -0500
cleanup and fix
commit e7733bb8f858a16b082b8a5c64d0322db6f6271a
Author: Zhao Yang <zhaoyangsingapore@gmail.com>
Date: Wed May 24 16:01:26 2023 +0800
VECTOR-6: return vector results to client in ANN order instead of token order
commit 771067d1475c94484da178236928d2bad78e00b1
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:33:55 2023 +0100
Fix annOrderingMustHaveLimit test
commit 8dd76a73541e70da0da022dd463e6711a315f971
Author: Mike Adamson <madamson@datastax.com>
Date: Wed May 31 18:30:49 2023 +0100
Improve the validation of ORDER BY <column> ANN OF (#639)
* Improve the validation of ORDER BY <column> ANN OF
* Change to hasNonClusteredOrdering and improve limit message
commit 1ef6f2e3418db19af23f0eae24cd344d72290455
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Fri May 26 18:34:25 2023 -0500
add vector similarity functions
commit f4e3a39a8ece623647ba7891981e7c5ee10ed96b
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Wed May 31 09:39:45 2023 -0500
pull in the smallest set of changes possible from 93e0ae9a to get FunctionParameter.sameAsFirst
commit 0ef2614346c9426a966004df220221f74353de70
Author: Jonathan Ellis <jbellis@gmail.com>
Date: Wed May 31 06:22:41 2023 -0700
Add support for updates + deletes (#636)
commit 5da9fefe1e18f733bb8ffd0a35ef2e1ac3cbd975
Author: Jonathan Ellis <jbellis@datastax.com>
Date: Tue May 30 16:32:02 2023 -0500
rename SegmentOrdering.reorderOneComponent to limitToTopResults to align with MemtableOrdering
commit caf6eb2154ddf57c88d69c6246404b65e183057d
Author: Mike Adamson <madamson@datastax.com>
Date: Tue May 30 22:41:29 2023 +0100
Fix V1SearchableIndex.reorderOneComponent to use segments (#635)
commit 72cfa94e9389d0078d64aee60d26b8b01c6ca22f
Author: Piotr Kołaczkowski <pkolaczk@datastax.com>
Date: Mon May 29 20:17:44 2023 +0200
VECTOR-14: Move ANN OF expressions from WHERE to ORDER BY
Before:
SELEC…
…n frozen columns