STAR-1693: Changes from OSS - DO NOT MERGE#552
Closed
jacek-lewandowski wants to merge 372 commits intods-trunkfrom
Closed
STAR-1693: Changes from OSS - DO NOT MERGE#552jacek-lewandowski wants to merge 372 commits intods-trunkfrom
jacek-lewandowski wants to merge 372 commits intods-trunkfrom
Conversation
Add page size in bytes flag to protocol Introduce PageSize object Protocol version changes No support for describe statement yet Simplify SecondaryIndexManager page calculation Add page size in bytes to DataLimits Refactor pagers Add / pull some tests Add some toString implementations Add PageSize to expected classes in DatabaseDescriptorRefTest Fix AggregationPartitionIterator So far we were passing the main page size to the AggregationPartitionIterator, which: - was pointless because there is no paging when we aggregate everything - it was actually harmful because AggregationPartitionIterator is a subclass of GroupByPartitionIterator and the later updates the subPager's limits with the minimum count of main page size and the number of remaining. It is correct if we use grouping aware limits, where count applies to the whole groups. But when we do aggregate everything, simply CQL limits are used and count limit applies to rows. Concluding, without fixing that we would limit the number of aggregated rows to the main page size which is not what we want (cherry picked from commit e11d716) (cherry picked from commit 13d4569) (cherry picked from commit 4f65564)
This was failing because off-heap native clustering keys were used in stats metadata without being copied, referencing memory that could be overwritten. Also fixes a problem creating retainable/minimized versions of clustering bounds and boundaries. (cherry picked from commit f00e340) (cherry picked from commit f0904a3) (cherry picked from commit 80a0383)
STAR-823: Refactor background compactions CompactionManager.BackgroundCompactionCandidate task is scheduled on compaction executor and when it detects there are compaction tasks to run, it starts each compaction task as a separate job on the same compaction executor and blocks until all tasks are finished. When the pool size for the executor is n, and there are n background tasks submitted in parallel, and all of them find that there are some compactions to run, they will schedule them and block until the tasks are finished. Though, the tasks cannot start because the pool is full - we have n background tasks there waiting for the compaction tasks that cannot start. Another thing (perhaps minor) is that we use getActiveCount() on the executor to check how many tasks it is currently running, and based on that information either schedule new tasks or not. The problem with this method is that it returns an approximate result and should not be used for making such decisions. To address those problems, running of background compactions was refactored. The whole logic for background compactions was extracted into a distinct class BackgroundCompactionsRunner. It allows flagging CFSs for compaction and schedules scans through the flagged CFSs on a dedicated executor so that scans and compaction tasks are no longer sharing the same executor. Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (cherry picked from commit 96bf61c) (cherry picked from commit 804b885) (cherry picked from commit 2c0dcd7)
added more language tests added brazilian cql test passes added support for setting a lucene analyzer cql json test passes fixed up some things cleanup added query analyzer cleanup; added constants added exception handling in unit test added bad options unit tests added char filter removed comments and extra code added illegal arg ex to LuceneAnalyzer#hasNext added stop word support; prior to removal reworked, no more stop words added lowercase filter test added ngram filter test added simplepattern test; snowball off added czech and porter fixed alloc removed commented out code removed extra code fixed minor issues maybe fixex setMinMax cleanup reverted reverted to a new byte[] per tokenized term cleanup cleanup fixed sasi test fixed unit test bug cleanup refactored for npe addressed review comments fixed npe bug fixed a couple of bugs removed json_ from options names; applied sonar comments fixed sonar comments fixed unit test bug changed exception thrown get -> create fixed minor issue (cherry picked from commit 3227a57) (cherry picked from commit add6b8d) (cherry picked from commit 1c60b2d)
Partition key ByteBuffer and columns btree were not taken into account and some ByteBuffers were not measured correctly. Also fixes flakes in MemtableSizeTest caused by including allocator pool in measurements and updates it to test all memtable allocation types. (cherry picked from commit d8d3e8b) (cherry picked from commit f8963ca)
* STAR-865: Porting metrics from cndb-884, riptano/bdp@03b23db6a5697baaf71d46d661c0ac1c908bc33e and riptano/bdp/#19515 Co-authored-by: Zhao Yang <jasonstack.zhao@gmail.com> Co-authored-by: Jake Luciani <tjake@users.noreply.github.com> * STAR-865: Porting MicrometerChunkCacheMetrics from: CNDB-161 Add MicrometerMetrics class CNDB-780 Add Micrometer metrics for the chunk cache Co-authored-by: Stefania Alborghetti <stefania.alborghetti@datastax.com> (cherry picked from commit 5e0d889) fix ConcurrencyFactorTest the metrics require reset if we want to measure, especially the max values (cherry picked from commit 3f89514)
This is a port of - https://github.com/riptano/bdp/commit/b6f0a18cb832c62f05cdcbd9cdcc2923f2fa727f - https://github.com/riptano/bdp/pull/19468 The first change set introduces QueryInfoTracker (QIT) interface and hooks it to StorageProxy. The second adds ClientState to the interface. The original QIT utilizes ReadReconciliationObserver in the ReadTracker paths. Only onRow, onPartition and queried callbacks are utilized by CNDB and thus only these methods are ported to Converged Cassandra (CC). The callbacks are a bit different tho: - The callback methods are added directly to ReadTracker as CC doesn't have ReadReconciliationObserver. The class was added as a part of NodeSync effort and it is rather superfluous. Porting the whole class would add unnecessary complexity. Adding the required methods directly to the ReadTracker makes the interface cleaner and easier to understand. - CC operates on ReplicaPlans instead of plain host lists, that is why queried was changed to onReplicaPlan. (cherry picked from commit c32e91f) (cherry picked from commit 093bc63)
FailingRepairTest uses serialization to pass Verbs back and forth between the nodes during the test. Unforuntately, Verbs aren't serializable anymore because they're no longer enums and this broke the test. Instead of passing a verb around, pass the verb Id and lookup the verb inside of the test method. (cherry picked from commit 47d0719) (cherry picked from commit 6ed9a94)
The main objective of this refactoring is to enable compaction strategies to operate on a lean abstraction of an sstable and the compaction space instead of the full-blown open SSTableReader and ColumnFamilyStore. The compaction process itself must still operate on SSTableReaders which provide the mechanisms for reading the data; switching between the two representations is done when compaction signals it is ready to start compaction on a set of sstables via the realm's tryModify method. Most files in the compaction package have been changed to rely solely on CompactionSSTable and CompactionRealm, with the exception of CompactionManager and BackgroundCompactionsRunner, which are part of the CFS implementation. Also does some small fixes and simplifications identified during the refactoring: - Fixes bloom filter size in Upgrader calculated for splitting to compaction strategy's sstable size limit while files weren't actually split. - Stops checking an sstable's bloom filter if its minTimestamp is already above the current min for purge functions. - Some collection construction/processing simplifications. - Breaks up compaction -> CFS -> compaction reference cycles. - Refactors some method to lower their complexity as requested by sonarcloud. - Changes some remaining ...LatencyPerKb names to ...TimePerKb. (cherry picked from commit 943ae99) (cherry picked from commit e0fd645)
This replaces the Node-based walks and transformations. The result is drastically less intermediate object creation, improved performance and somewhat simpler code at the expense of the concept being a little harder to understand initially. Adds further documentation and expands tests for sliced tries. (cherry picked from commit 2b3c4c5) (cherry picked from commit 43c5206)
(cherry picked from commit e0982c6)
* LogTransaction: add ILogTransactionsFactory to provide custom log transaction * UCS: Port CNDB-2134 to disable shards on UCS L0 * UCS: add CompactionAggregatePrioritizer to prioritize sstables based on remote file cache * NativeLibrary: Add INativeLibrary interface to provide custom implementation * SSTableWatcher: to discover custom component before opening sstables * StorageProvider: support custom file system and change Descriptor to use URI * StorageFeatureFlags: disable features that are not supported by custom file system * StorageHandler: to reload sstable from custom file system (cherry picked from commit e27ee69) (cherry picked from commit e98d05a)
Add methods to PathUtils: * deleteContent method that recursively deletes the contents of a directory, leaving the directory empty; * listPaths methods to list all the paths in a directory, optionally using a provided filter. Add method to Descriptor: * validFilenameWithComponent to return the Component from an sstable file name (cherry picked from commit 4f1c86b) (cherry picked from commit ef840e5)
Extends the API of NativeLibrary to create a directory by providing the path as string, so specialized implementations of a file system don't to do additional conversion into Cassandra File, which is then converted into string representation. (cherry picked from commit 3ba1a16) (cherry picked from commit aff175b)
In STAR-1335 we ported most of CNDB-4090 but we missed out a call to StorageProvider.invalidateFileSystemCache() which is required to invalidate the remote storage cache in CNDB whenever we encounter corruption. This was discovered because the RemoteFileCacheCorruptedPageTest unit test is failing.
Co-authored-by: Stefania Alborghetti <stef1927@users.noreply.github.com>
e8758e5 to
9b7dc0d
Compare
#555) * STAR-1697: Port keyspace renaming (KeyspaceMetadata::rename) from DB-3896 Required by CNDB-3170 and CNDB-4909.
This patch adds a way to customize the compaction overhead, i.e. the transient amount of space required by a compaction whilst both input and output sstables are present. In BDP this is just estimated to be the size of the input sstables. It's unclear if we can improve this in CNDB, but I kept the refactoring because initially I got confused thinking that in CNDB we could just waive this requirement since the input sstables are in the file cache. So I think it's good to spell out why we use the input sstable sizes by encapsulating the calculation in a method with javadoc. The patch also adds a warning to the logs: if a compaction cannot be performed because the space overhead is larger than the space available, then the logs now contain this information. Without this, troubleshooing why compaction tasks are skipped is quite hard. This warning was already present in BDP but was missing for CNDB. Port CNDB-4385
…fication This commit changes the API of UCS as follows: - The Bucket inner class is now public - The method for extracting shards with buckets is now public, and it accepts a custom list of sstables These changes are required for CNDB, so that we can classify all live sstables and visualize their corresponding shards and buckets in a diagnostic tool such as Autobot. The comments for warnIfSizeAbove have been clarified and moved to the method Javadoc. Port of CNDB-4385
Port of CNDB-5113 Fixed DroppedColumn#toCQLString by using the CQL String version of the column name, which also double quotes the name if it's in mixed case. Co-authored-by: Massimiliano Tomassi <max.tomassi@datastax.com>
Co-authored-by: Stefania Alborghetti <stef1927@users.noreply.github.com>
…arn about it (#558) Co-authored-by: Matt Fleming <mfleming@users.noreply.github.com>
…le txn bug Port BDP part of CNDB-4035: restore sstables if they cannot be dropped and fix lifecycle txn bug so that we SSTables are added back to the live set if we fail to drop them.
…n region comes online (#551)
Port CNDB-4855 Fixed streaming to connect back using peer preferred address instead of Channel#remoteAddress
9b7dc0d to
6a4c3b2
Compare
There are a couple of things here: - `unsafeFree` method in `BufferPool` did not do what it was probably expected to do - that is, the direct buffer was not released properly because when it is allocated in `allocateDirectAligned` the method actually returns a slice of the original buffer, while the only reference to the original buffer is in `attachment` field of the returned buffer. It was mitigated by using a new method to clean, which can release the parent buffer by recursively go through the attachment hierarchy - for in-jvm dtests, releasing of all buffers in buffer pools was added as the very last step of instance shutdown; it fixes memory leaking between the subsequent instance restarts; in production run, we just stop the JVM and the buffers are lost, but in case of those dtest we need to deal with that explicitly
6a4c3b2 to
55301ae
Compare
blambov
approved these changes
Oct 17, 2022
blambov
left a comment
There was a problem hiding this comment.
The Allocator/Cloner changes look good to me.
55301ae to
0ac29ec
Compare
2e521cf to
dfae73c
Compare
|
Kudos, SonarCloud Quality Gate passed! |
michaeljmarshall
added a commit
that referenced
this pull request
Feb 20, 2026
…2042) ### What is the issue Fixes: https://github.com/riptano/cndb/issues/15527 CNDB test PR: https://github.com/riptano/cndb/pull/16797 ### What does this PR fix and why was it fixed This PR upgrades jvector, which brings several improvements. Here are the git commits brought in: ``` 8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627) 9d0488e5 release 4.0.0-rc.8 (#626) 570bd118 Refactor parallel writer (#608) 20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607) d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610) d663b4f7 add config options for regression testing (#609) 7e493eee On-disk index cache for the Grid benchmark harness (#612) e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613) 6b235ce7 bump to next SNAPSHOT (#605) 84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604) fceeb885 release 4.0.0-rc.7 (#603) 51807cba add protection against bad ordinal mappings (#602) 6ca3b5e2 adding memory and disk usage stats to bench tests (#591) a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588) 0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577) a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592) 977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598) fa808d69 adding average nodes visited to benchmark tests (#552) 3bd15e70 Virtualize and Modularize DataSetLoader logic (#593) 42259e9f Speed up ivec reads by buffering (#584) f967f1c9 virtualize DataSet (#589) 55f902f4 turn off parallel writes in grid (#582) 019a241d Parallelize graph writes (#542) 02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574) 32a51821 javadoc for base [graph] (#548) 4eb607f8 javadoc for base [disk,exceptions] (#547) 30e8932c Enable the fused graph index (#561) d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573) c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572) 214b7c20 release 4.0.0-rc.6 (#571) e3686999 fix javadoc error (#570) 88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569) 29a943e1 Computation of reconstruction errors for vector compressors (#567) d8e9cb16 Add NVQ paper in README (#560) d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563) b484dae2 Harden tests for heap graph reconstruction (#543) 9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559) 21e4a226 Begin development on 4.0.0-rc.6 (#558) 4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT" fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT ``` ### SAI Version Bump Adds a new sai on disk version: `fa` ### Fused PQ With this version, we are adding a new, experimental feature to write PQ vectors fused into the graph. In doing so, we are able to skip writing the PQ vectors to the PQ file, which results in significant memory savings since the PQ vectors in the `CassandraDiskAnn` graph searcher consumers `O(n)` memory based on the number of vectors and their quantized size. The fused pq vectors mostly fit within the page cache as we read the node and its neighbors from disk, so we see minimal latency reduction due to this change, though further testing is required to see the real impact. In order to enable fused pq, the runtime needs `cassandra.sai.latest.version=fa` or greater and `cassandra.sai.vector.enable_fused=true`. Note that because this feature is still experimental, `cassandra.sai.vector.enable_fused` defaults to `false`. Another experimental feature introduced in this commit via the jvector upgrade is parallel graph encoding and writing to disk. Writing the fused graph requires increased CPU time to encode the graph node and we write more bytes to disk, so this parallelism is likely necessary to keep vector index creation/compaction times down. The key configurations available with their associated defaults: ```java // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes. // This feature is experimental, so defaults to false. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"), // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning // use all available processors as reported by the JVM. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"), // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size // of the encoded graph node at layer 0, which varies based on graph feature settings. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"), ``` ### OnDiskVectorValues and OnDiskVectorValuesWriter `OnDiskVectorValues` is now in its own file and is now thread safe in order to account for some necessary implementation details within jvector. Added `OnDiskVectorValuesWriter` to improve test coverage and to abstract away the flush issues associated with `BufferedRandomAccessWriter` as described in datastax/jvector#562. ### Verification This PR also introduces new benchmarks as well as improved unit testing. The new benchmarks verify the performance of the `OnDiskVectorValues` and `OnDiskVectorValuesWriter` to confirm (at least directionally) the time associated with read and write operations. New tests have been added to verify that when we iterate over an sstable's rows, we are able to assert that the sstable's vector value's similarity to the one stored in the vector graph is ~1. This testing is valuable in that it confirms the row id to ordinal mapping is correct at every node. Previously, we relied on recall results to verify this for us. This new pattern allows us to confirm _every_ node, which is more thorough and removes most edge cases that might have led to partially correct graphs that may have achieved acceptable recall.
driftx
pushed a commit
that referenced
this pull request
Apr 27, 2026
…2042) Fixes: riptano/cndb#15527 CNDB test PR: riptano/cndb#16797 This PR upgrades jvector, which brings several improvements. Here are the git commits brought in: ``` 8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627) 9d0488e5 release 4.0.0-rc.8 (#626) 570bd118 Refactor parallel writer (#608) 20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607) d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610) d663b4f7 add config options for regression testing (#609) 7e493eee On-disk index cache for the Grid benchmark harness (#612) e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613) 6b235ce7 bump to next SNAPSHOT (#605) 84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604) fceeb885 release 4.0.0-rc.7 (#603) 51807cba add protection against bad ordinal mappings (#602) 6ca3b5e2 adding memory and disk usage stats to bench tests (#591) a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588) 0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577) a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592) 977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598) fa808d69 adding average nodes visited to benchmark tests (#552) 3bd15e70 Virtualize and Modularize DataSetLoader logic (#593) 42259e9f Speed up ivec reads by buffering (#584) f967f1c9 virtualize DataSet (#589) 55f902f4 turn off parallel writes in grid (#582) 019a241d Parallelize graph writes (#542) 02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574) 32a51821 javadoc for base [graph] (#548) 4eb607f8 javadoc for base [disk,exceptions] (#547) 30e8932c Enable the fused graph index (#561) d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573) c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572) 214b7c20 release 4.0.0-rc.6 (#571) e3686999 fix javadoc error (#570) 88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569) 29a943e1 Computation of reconstruction errors for vector compressors (#567) d8e9cb16 Add NVQ paper in README (#560) d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563) b484dae2 Harden tests for heap graph reconstruction (#543) 9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559) 21e4a226 Begin development on 4.0.0-rc.6 (#558) 4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT" fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT ``` Adds a new sai on disk version: `fa` With this version, we are adding a new, experimental feature to write PQ vectors fused into the graph. In doing so, we are able to skip writing the PQ vectors to the PQ file, which results in significant memory savings since the PQ vectors in the `CassandraDiskAnn` graph searcher consumers `O(n)` memory based on the number of vectors and their quantized size. The fused pq vectors mostly fit within the page cache as we read the node and its neighbors from disk, so we see minimal latency reduction due to this change, though further testing is required to see the real impact. In order to enable fused pq, the runtime needs `cassandra.sai.latest.version=fa` or greater and `cassandra.sai.vector.enable_fused=true`. Note that because this feature is still experimental, `cassandra.sai.vector.enable_fused` defaults to `false`. Another experimental feature introduced in this commit via the jvector upgrade is parallel graph encoding and writing to disk. Writing the fused graph requires increased CPU time to encode the graph node and we write more bytes to disk, so this parallelism is likely necessary to keep vector index creation/compaction times down. The key configurations available with their associated defaults: ```java // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes. // This feature is experimental, so defaults to false. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"), // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning // use all available processors as reported by the JVM. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"), // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size // of the encoded graph node at layer 0, which varies based on graph feature settings. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"), ``` `OnDiskVectorValues` is now in its own file and is now thread safe in order to account for some necessary implementation details within jvector. Added `OnDiskVectorValuesWriter` to improve test coverage and to abstract away the flush issues associated with `BufferedRandomAccessWriter` as described in datastax/jvector#562. This PR also introduces new benchmarks as well as improved unit testing. The new benchmarks verify the performance of the `OnDiskVectorValues` and `OnDiskVectorValuesWriter` to confirm (at least directionally) the time associated with read and write operations. New tests have been added to verify that when we iterate over an sstable's rows, we are able to assert that the sstable's vector value's similarity to the one stored in the vector graph is ~1. This testing is valuable in that it confirms the row id to ordinal mapping is correct at every node. Previously, we relied on recall results to verify this for us. This new pattern allows us to confirm _every_ node, which is more thorough and removes most edge cases that might have led to partially correct graphs that may have achieved acceptable recall.
driftx
pushed a commit
that referenced
this pull request
Apr 28, 2026
…2042) Fixes: riptano/cndb#15527 CNDB test PR: riptano/cndb#16797 This PR upgrades jvector, which brings several improvements. Here are the git commits brought in: ``` 8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627) 9d0488e5 release 4.0.0-rc.8 (#626) 570bd118 Refactor parallel writer (#608) 20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607) d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610) d663b4f7 add config options for regression testing (#609) 7e493eee On-disk index cache for the Grid benchmark harness (#612) e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613) 6b235ce7 bump to next SNAPSHOT (#605) 84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604) fceeb885 release 4.0.0-rc.7 (#603) 51807cba add protection against bad ordinal mappings (#602) 6ca3b5e2 adding memory and disk usage stats to bench tests (#591) a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588) 0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577) a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592) 977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598) fa808d69 adding average nodes visited to benchmark tests (#552) 3bd15e70 Virtualize and Modularize DataSetLoader logic (#593) 42259e9f Speed up ivec reads by buffering (#584) f967f1c9 virtualize DataSet (#589) 55f902f4 turn off parallel writes in grid (#582) 019a241d Parallelize graph writes (#542) 02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574) 32a51821 javadoc for base [graph] (#548) 4eb607f8 javadoc for base [disk,exceptions] (#547) 30e8932c Enable the fused graph index (#561) d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573) c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572) 214b7c20 release 4.0.0-rc.6 (#571) e3686999 fix javadoc error (#570) 88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569) 29a943e1 Computation of reconstruction errors for vector compressors (#567) d8e9cb16 Add NVQ paper in README (#560) d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563) b484dae2 Harden tests for heap graph reconstruction (#543) 9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559) 21e4a226 Begin development on 4.0.0-rc.6 (#558) 4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT" fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT ``` Adds a new sai on disk version: `fa` With this version, we are adding a new, experimental feature to write PQ vectors fused into the graph. In doing so, we are able to skip writing the PQ vectors to the PQ file, which results in significant memory savings since the PQ vectors in the `CassandraDiskAnn` graph searcher consumers `O(n)` memory based on the number of vectors and their quantized size. The fused pq vectors mostly fit within the page cache as we read the node and its neighbors from disk, so we see minimal latency reduction due to this change, though further testing is required to see the real impact. In order to enable fused pq, the runtime needs `cassandra.sai.latest.version=fa` or greater and `cassandra.sai.vector.enable_fused=true`. Note that because this feature is still experimental, `cassandra.sai.vector.enable_fused` defaults to `false`. Another experimental feature introduced in this commit via the jvector upgrade is parallel graph encoding and writing to disk. Writing the fused graph requires increased CPU time to encode the graph node and we write more bytes to disk, so this parallelism is likely necessary to keep vector index creation/compaction times down. The key configurations available with their associated defaults: ```java // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes. // This feature is experimental, so defaults to false. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"), // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning // use all available processors as reported by the JVM. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"), // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size // of the encoded graph node at layer 0, which varies based on graph feature settings. SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"), ``` `OnDiskVectorValues` is now in its own file and is now thread safe in order to account for some necessary implementation details within jvector. Added `OnDiskVectorValuesWriter` to improve test coverage and to abstract away the flush issues associated with `BufferedRandomAccessWriter` as described in datastax/jvector#562. This PR also introduces new benchmarks as well as improved unit testing. The new benchmarks verify the performance of the `OnDiskVectorValues` and `OnDiskVectorValuesWriter` to confirm (at least directionally) the time associated with read and write operations. New tests have been added to verify that when we iterate over an sstable's rows, we are able to assert that the sstable's vector value's similarity to the one stored in the vector graph is ~1. This testing is valuable in that it confirms the row id to ordinal mapping is correct at every node. Previously, we relied on recall results to verify this for us. This new pattern allows us to confirm _every_ node, which is more thorough and removes most edge cases that might have led to partially correct graphs that may have achieved acceptable recall.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.









No description provided.