Remove query-time usage of ByteSequence::slice in PQVectors to reduce object allocations#403
Merged
marianotepper merged 1 commit intodatastax:mainfrom Mar 21, 2025
Conversation
Member
Author
|
I keep seeing failures for:
This doesn't seem related and feels potentially flaky since the tests pass on retry. |
Contributor
|
Looks reasonable to me -- I agree that this is a bit brittle and suggests that we might want to revisit the API in the future, but I wouldn't call it disproportionately awkward/brittle compared to other performance-based sacrifices we've made around these code paths. |
jkni
approved these changes
Mar 21, 2025
Contributor
|
I agree the comments above and the PR looks good to me. I suggest we open an issue about the interface so that we do not forget. |
marianotepper
approved these changes
Mar 21, 2025
michaeljmarshall
added a commit
to datastax/cassandra
that referenced
this pull request
Mar 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With #370, I introduced the
ByteSequence::slicemethod and started using it on the query path. The PR replaces long lived pq vector ByteSequences with short lived slices of larger ByteSequences created at query time. However, profiling reveals that these objects get created very often for certain workloads, especially brute force workloads in Cassandra. After analyzing the code, it looks trivial to migrate to a solution where we do not create a slice object per access to a pq vector. The main downside is the addition of anoffsetand alengthargument to many methods and the fact that these changes feel brittle, if we want to maintain theslice()construct, which I think we do for the simplicity of non-query path code. However, given the performance implications and the fact that the interface changes are all internal to jvector, I think this change is worth considering.