Conversation
driftx
commented
Feb 10, 2026
…ost (#2189) ### What is the issue Fixes: https://github.com/riptano/cndb/issues/16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ```
Checklist before you submit for review
|
❌ Build ds-cassandra-pr-gate/PR-2226 rejected by Butler3 regressions found Found 3 new test failures
Found 4 known test failures |
Member
|
Possible regression in TimeWindowCompactionStrategyTest? This is a port from main? I added a comment pointing to this PR and CNDB-16680 to https://github.com/riptano/cndb/issues/16713 since I was adding a bunch of new tickets for porting from main. |
Author
Doesn't reproduce for me locally.
Thanks. Yes, I started fixing the vector compaction test and discovered CNDB-16350 solves it, so I did the porting. |
djatnieks
approved these changes
Feb 13, 2026
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 5, 2026
``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 25, 2026
``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 27, 2026
``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Apr 14, 2026
``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Apr 15, 2026
``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.