Skip to content

Conversation

@maedhroz
Copy link
Contributor

No description provided.

@maedhroz
Copy link
Contributor Author

blambov pushed a commit to blambov/cassandra that referenced this pull request Nov 29, 2023
* Optimize vector searcher limitToTopResults

* Only set ceiling objects when the PK is not found

* Add test

This test fails for the first commit but passes with the fix in
the second commit.

* Use binary search instead of iterating through PKs

* refactor

* make callers pass ArrayList to MergePostingList so we can iterate through it more efficiently

* r/m expensive assert

* don't execute a search for zero candidate keys (nbd for brute force but graph will visit every node)

* evaluate maxBruteForceRows after we have an accurate key count

* refactor to reduce nesting

* fix updateExpectedNodes to use the observed row count instead of segment id (!)

* use sublist to represent keys per segment instead of copying them

* cleanup

* r/m unused TermsIteratorMerger

* clarify javadoc

* Add exactRowIdOrCeiling method to PKM

* switch from bsearch to seq scan when lots of rows match

* use sliding window histogram to adapt scan vs bsearch as contents change

* cleanup and comment

* comment

* merge nextAfter into ceiling

* Fix logic when clustering columns present; improve method name

* Do not share reader for basic ceiling tests

* Add warning to javadoc

* Update SortedTermsTest to use ceiling at most once

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
ekaterinadimitrova2 pushed a commit to ekaterinadimitrova2/cassandra that referenced this pull request Jun 3, 2024
* Optimize vector searcher limitToTopResults

* Only set ceiling objects when the PK is not found

* Add test

This test fails for the first commit but passes with the fix in
the second commit.

* Use binary search instead of iterating through PKs

* refactor

* make callers pass ArrayList to MergePostingList so we can iterate through it more efficiently

* r/m expensive assert

* don't execute a search for zero candidate keys (nbd for brute force but graph will visit every node)

* evaluate maxBruteForceRows after we have an accurate key count

* refactor to reduce nesting

* fix updateExpectedNodes to use the observed row count instead of segment id (!)

* use sublist to represent keys per segment instead of copying them

* cleanup

* r/m unused TermsIteratorMerger

* clarify javadoc

* Add exactRowIdOrCeiling method to PKM

* switch from bsearch to seq scan when lots of rows match

* use sliding window histogram to adapt scan vs bsearch as contents change

* cleanup and comment

* comment

* merge nextAfter into ceiling

* Fix logic when clustering columns present; improve method name

* Do not share reader for basic ceiling tests

* Add warning to javadoc

* Update SortedTermsTest to use ceiling at most once

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
michaelsembwever pushed a commit to thelastpickle/cassandra that referenced this pull request Jan 7, 2026
* Optimize vector searcher limitToTopResults

* Only set ceiling objects when the PK is not found

* Add test

This test fails for the first commit but passes with the fix in
the second commit.

* Use binary search instead of iterating through PKs

* refactor

* make callers pass ArrayList to MergePostingList so we can iterate through it more efficiently

* r/m expensive assert

* don't execute a search for zero candidate keys (nbd for brute force but graph will visit every node)

* evaluate maxBruteForceRows after we have an accurate key count

* refactor to reduce nesting

* fix updateExpectedNodes to use the observed row count instead of segment id (!)

* use sublist to represent keys per segment instead of copying them

* cleanup

* r/m unused TermsIteratorMerger

* clarify javadoc

* Add exactRowIdOrCeiling method to PKM

* switch from bsearch to seq scan when lots of rows match

* use sliding window histogram to adapt scan vs bsearch as contents change

* cleanup and comment

* comment

* merge nextAfter into ceiling

* Fix logic when clustering columns present; improve method name

* Do not share reader for basic ceiling tests

* Add warning to javadoc

* Update SortedTermsTest to use ceiling at most once

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants