Skip to content

Backport CNDB-16350 7ffbbbc2e907 to main-5.0#2226

Merged
driftx merged 1 commit intomain-5.0from
CNDB-16680
Feb 13, 2026
Merged

Backport CNDB-16350 7ffbbbc2e907 to main-5.0#2226
driftx merged 1 commit intomain-5.0from
CNDB-16680

Conversation

@driftx
Copy link
Copy Markdown

@driftx driftx commented Feb 10, 2026

CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: https://github.com/riptano/cndb/issues/16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

 [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt      Score     Error  Units
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    271.569 ±   3.473  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5   5452.393 ± 227.905  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5   1392.607 ±  30.388  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  11496.696 ± 345.886  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    242.049 ±  20.708  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5   2365.691 ±  84.173  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    265.395 ±   4.167  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5   3641.557 ± 130.649  ms/op
after change:

 [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt    Score    Error  Units
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    5.721 ±  1.727  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5  124.536 ± 22.464  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5    5.662 ±  0.610  ms/op
 [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  122.671 ±  3.343  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    5.364 ±  1.194  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5  119.449 ±  4.809  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    5.379 ±  0.552  ms/op
 [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5  121.293 ±  3.040  ms/op

…ost (#2189)

### What is the issue
Fixes: https://github.com/riptano/cndb/issues/16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid
deserializing keys/values and the associated allocation that comes with
them. The first key thing to mention is that iteration is very expensive
as these maps get big, so we want to avoid it if possible. The second is
that if we use the typical map iteration methods, they deserialize the
key and the value eagerly. Since the key is typically a high dimensional
vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is
additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector
key twice by using the `searchContext`. The `ChronicleMap#put` method
uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench`
doesn't seem to register the benefit of the ChronicleMap. I am leaving
`VectorCompactionBench` in place since it is still useful. Likely, this
is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x
improvement. The improvement seems to increase as we build larger
graphs.

benchmark results before change:

```
     [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt      Score     Error  Units
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    271.569 ±   3.473  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5   5452.393 ± 227.905  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5   1392.607 ±  30.388  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  11496.696 ± 345.886  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    242.049 ±  20.708  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5   2365.691 ±  84.173  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    265.395 ±   4.167  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5   3641.557 ± 130.649  ms/op
```
after change:

```
     [java] Benchmark                                                   (dimension)  (numVectors)  Mode  Cnt    Score    Error  Units
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768        100000  avgt    5    5.721 ±  1.727  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping            768       1000000  avgt    5  124.536 ± 22.464  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536        100000  avgt    5    5.662 ±  0.610  ms/op
     [java] V5VectorPostingsWriterBench.createGenericIdentityMapping           1536       1000000  avgt    5  122.671 ±  3.343  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768        100000  avgt    5    5.364 ±  1.194  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany          768       1000000  avgt    5  119.449 ±  4.809  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536        100000  avgt    5    5.379 ±  0.552  ms/op
     [java] V5VectorPostingsWriterBench.describeForCompactionOneToMany         1536       1000000  avgt    5  121.293 ±  3.040  ms/op
```
@github-actions
Copy link
Copy Markdown

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@cassci-bot
Copy link
Copy Markdown

❌ Build ds-cassandra-pr-gate/PR-2226 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
o.a.c.db.compaction.TimeWindowCompactionStrategyTest.testPrepBucket (compression) REGRESSION 🔴 0 / 13
o.a.c.distributed.test.repair.ForceRepairTest.terminated successfully () NEW 🔴 2 / 13
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompactionTooManyHoles[db true] () NEW 🔴 0 / 13

Found 4 known test failures

@djatnieks
Copy link
Copy Markdown
Member

djatnieks commented Feb 12, 2026

Possible regression in TimeWindowCompactionStrategyTest?

This is a port from main? I added a comment pointing to this PR and CNDB-16680 to https://github.com/riptano/cndb/issues/16713 since I was adding a bunch of new tickets for porting from main.

@driftx
Copy link
Copy Markdown
Author

driftx commented Feb 12, 2026

Possible regression in TimeWindowCompactionStrategyTest?

Doesn't reproduce for me locally.

This is a port from main? I added a comment pointing to this PR and CNDB-16680 to https://github.com/riptano/cndb/issues/16713 since I was adding a bunch of new tickets for porting from main.

Thanks. Yes, I started fixing the vector compaction test and discovered CNDB-16350 solves it, so I did the porting.

@driftx driftx merged commit 6cf698b into main-5.0 Feb 13, 2026
2 of 4 checks passed
@driftx driftx deleted the CNDB-16680 branch February 13, 2026 21:43
michaelsembwever pushed a commit that referenced this pull request Mar 5, 2026
```
CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: riptano/cndb#16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 271.569 ± 3.473 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 5452.393 ± 227.905 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 1392.607 ± 30.388 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 11496.696 ± 345.886 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 242.049 ± 20.708 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 2365.691 ± 84.173 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 265.395 ± 4.167 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 3641.557 ± 130.649 ms/op
```
after change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 5.721 ± 1.727 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 124.536 ± 22.464 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 5.662 ± 0.610 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 122.671 ± 3.343 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 5.364 ± 1.194 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 119.449 ± 4.809 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 5.379 ± 0.552 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 121.293 ± 3.040 ms/op
```

Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Mar 25, 2026
```
CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: riptano/cndb#16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 271.569 ± 3.473 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 5452.393 ± 227.905 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 1392.607 ± 30.388 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 11496.696 ± 345.886 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 242.049 ± 20.708 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 2365.691 ± 84.173 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 265.395 ± 4.167 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 3641.557 ± 130.649 ms/op
```
after change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 5.721 ± 1.727 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 124.536 ± 22.464 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 5.662 ± 0.610 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 122.671 ± 3.343 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 5.364 ± 1.194 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 119.449 ± 4.809 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 5.379 ± 0.552 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 121.293 ± 3.040 ms/op
```

Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Mar 27, 2026
```
CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: riptano/cndb#16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 271.569 ± 3.473 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 5452.393 ± 227.905 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 1392.607 ± 30.388 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 11496.696 ± 345.886 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 242.049 ± 20.708 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 2365.691 ± 84.173 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 265.395 ± 4.167 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 3641.557 ± 130.649 ms/op
```
after change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 5.721 ± 1.727 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 124.536 ± 22.464 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 5.662 ± 0.610 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 122.671 ± 3.343 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 5.364 ± 1.194 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 119.449 ± 4.809 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 5.379 ± 0.552 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 121.293 ± 3.040 ms/op
```

Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Apr 14, 2026
```
CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: riptano/cndb#16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 271.569 ± 3.473 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 5452.393 ± 227.905 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 1392.607 ± 30.388 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 11496.696 ± 345.886 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 242.049 ± 20.708 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 2365.691 ± 84.173 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 265.395 ± 4.167 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 3641.557 ± 130.649 ms/op
```
after change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 5.721 ± 1.727 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 124.536 ± 22.464 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 5.662 ± 0.610 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 122.671 ± 3.343 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 5.364 ± 1.194 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 119.449 ± 4.809 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 5.379 ± 0.552 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 121.293 ± 3.040 ms/op
```

Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Apr 15, 2026
```
CNDB-16350: Optimize ChronicleMap access, iteration to reduce serde cost 

### What is the issue
Fixes: riptano/cndb#16350

### What does this PR fix and why was it fixed

ChronicleMap gives us several ways to use lower level APIs to avoid deserializing keys/values and the associated allocation that comes with them. The first key thing to mention is that iteration is very expensive as these maps get big, so we want to avoid it if possible. The second is that if we use the typical map iteration methods, they deserialize the key and the value eagerly. Since the key is typically a high dimensional vector, it is valuable to avoid such deserialization. This change:

* Removes unnecessary iteration leveraging the fact that compaction is additive
* Replaces `forEach` with `forEachEntry`, which gives better semantics
* Updates the `maybeAddVector` method to avoid serializing the vector key twice by using the `searchContext`. The `ChronicleMap#put` method uses this pattern internally

I added two sets of benchmarks, however the `VectorCompactionBench` doesn't seem to register the benefit of the ChronicleMap. I am leaving `VectorCompactionBench` in place since it is still useful. Likely, this is because ChronicleMap's cost isn't as expensive as graph construction.

Here are some of the benchmark results. They show between a 50x and 100x improvement. The improvement seems to increase as we build larger graphs.

benchmark results before change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 271.569 ± 3.473 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 5452.393 ± 227.905 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 1392.607 ± 30.388 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 11496.696 ± 345.886 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 242.049 ± 20.708 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 2365.691 ± 84.173 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 265.395 ± 4.167 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 3641.557 ± 130.649 ms/op
```
after change:

```
[java] Benchmark (dimension) (numVectors) Mode Cnt Score Error Units
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
100000 avgt 5 5.721 ± 1.727 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 768
1000000 avgt 5 124.536 ± 22.464 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
100000 avgt 5 5.662 ± 0.610 ms/op
[java] V5VectorPostingsWriterBench.createGenericIdentityMapping 1536
1000000 avgt 5 122.671 ± 3.343 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
100000 avgt 5 5.364 ± 1.194 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 768
1000000 avgt 5 119.449 ± 4.809 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
100000 avgt 5 5.379 ± 0.552 ms/op
[java] V5VectorPostingsWriterBench.describeForCompactionOneToMany 1536
1000000 avgt 5 121.293 ± 3.040 ms/op
```

Co-authored-by: Michael Marshall <michael.marshall@datastax.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants