Backport CNDB-16350 7ffbbbc2e907 to main-5.0 by driftx · Pull Request #2226 · datastax/cassandra

driftx · 2026-02-10T20:42:25Z

CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: https://github.com/riptano/cndb/issues/16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

 [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt      Score     Error  Units
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    271.569 ±   3.473  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5   5452.393 ± 227.905  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5   1392.607 ±  30.388  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  11496.696 ± 345.886  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    242.049 ±  20.708  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5   2365.691 ±  84.173  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    265.395 ±   4.167  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5   3641.557 ± 130.649  ms/op

after change:

 [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt    Score    Error  Units
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    5.721 ±  1.727  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5  124.536 ± 22.464  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5    5.662 ±  0.610  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  122.671 ±  3.343  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    5.364 ±  1.194  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5  119.449 ±  4.809  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    5.379 ±  0.552  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5  121.293 ±  3.040  ms/op

…ost (#2189) ### What is the issue Fixes: https://github.com/riptano/cndb/issues/16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ```

github-actions · 2026-02-10T20:42:51Z

cassci-bot · 2026-02-10T21:49:11Z

❌ Build ds-cassandra-pr-gate/PR-2226 rejected by Butler

3 regressions found
See build details here

Found 3 new test failures

Test	Explanation	Runs	Upstream
o.a.c.db.compaction.TimeWindowCompactionStrategyTest.testPrepBucket (compression)	REGRESSION	🔴	0 / 13
o.a.c.distributed.test.repair.ForceRepairTest.terminated successfully ()	NEW	🔴	2 / 13
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompactionTooManyHoles[db true] ()	NEW	🔴	0 / 13

Found 4 known test failures

djatnieks · 2026-02-12T01:39:32Z

Possible regression in TimeWindowCompactionStrategyTest?

This is a port from main? I added a comment pointing to this PR and CNDB-16680 to https://github.com/riptano/cndb/issues/16713 since I was adding a bunch of new tickets for porting from main.

driftx · 2026-02-12T12:31:19Z

Possible regression in TimeWindowCompactionStrategyTest?

Doesn't reproduce for me locally.

This is a port from main? I added a comment pointing to this PR and CNDB-16680 to https://github.com/riptano/cndb/issues/16713 since I was adding a bunch of new tickets for porting from main.

Thanks. Yes, I started fixing the vector compaction test and discovered CNDB-16350 solves it, so I did the porting.

``` CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost ### What is the issue Fixes: riptano/cndb#16350 ### What does this PR fix and why was it fixed ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change: * Removes unnecessary iteration leveraging the fact that compaction is additive * Replaces `forEach` with `forEachEntry`, which gives better semantics * Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction. Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs. benchmark results before change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 271.569 ± 3.473 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 5452.393 ± 227.905 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 1392.607 ± 30.388 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 11496.696 ± 345.886 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 242.049 ± 20.708 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 2365.691 ± 84.173 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 265.395 ± 4.167 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 3641.557 ± 130.649 ms/op ``` after change: ``` [java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 100000 avgt 5 5.721 ± 1.727 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768 1000000 avgt 5 124.536 ± 22.464 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 100000 avgt 5 5.662 ± 0.610 ms/op [java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536 1000000 avgt 5 122.671 ± 3.343 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 100000 avgt 5 5.364 ± 1.194 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768 1000000 avgt 5 119.449 ± 4.809 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 100000 avgt 5 5.379 ± 0.552 ms/op [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536 1000000 avgt 5 121.293 ± 3.040 ms/op ``` Co-authored-by: Michael Marshall <michael.marshall@datastax.com>

djatnieks approved these changes Feb 13, 2026

View reviewed changes

driftx merged commit 6cf698b into main-5.0 Feb 13, 2026
2 of 4 checks passed

driftx deleted the CNDB-16680 branch February 13, 2026 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport CNDB-16350 7ffbbbc2e907 to main-5.0#2226

Backport CNDB-16350 7ffbbbc2e907 to main-5.0#2226
driftx merged 1 commit intomain-5.0from
CNDB-16680

driftx commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

cassci-bot commented Feb 10, 2026

Uh oh!

djatnieks commented Feb 12, 2026 •

edited

Loading

Uh oh!

driftx commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

driftx commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Checklist before you submit for review

Uh oh!

cassci-bot commented Feb 10, 2026

❌ Build ds-cassandra-pr-gate/PR-2226 rejected by Butler

Found 3 new test failures

Found 4 known test failures

Uh oh!

djatnieks commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

driftx commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

djatnieks commented Feb 12, 2026 •

edited

Loading