-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-16209 Log Warning Rather than Verbose Trace when Preview Repair Validation Conflicts with Incremental Repair #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…onflicts w/ a pending incremental repair.
cb713c7 to
11dabad
Compare
maedhroz
commented
Oct 13, 2020
| SimpleCondition continuePreviewRepair = new SimpleCondition(); | ||
| // this pauses the validation request sent from node1 to node2 until the inc repair below has run | ||
| cluster.filters() | ||
| .outbound() |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: If we don't use outbound() explicitly here, we'll deadlock, with the single-threaded anti-entropy stage blocked waiting for the incremental repair propose message, which itself has to be processed on the same thread.
adelapena
pushed a commit
to adelapena/cassandra
that referenced
this pull request
Oct 13, 2023
* Revert "Skip similarity score caching when using brute force (apache#769)" This reverts commit ca7eea6. * Revert " Compute similarity to query only once per indexed vector (per replica) (apache#723)" This reverts commit 692491f. * Add test that exposed score caching impl flaw * Do not revert fix to VectorTester Similarity score caching does not currently work when there are updates to vectors. The added test shows the issue. Conceptually, the problem materializes when we observe an earlier instance of a row with a close score and don't observe the row's later instance of a low score. The error is ranking some rows higher than than they should be. The problem stems from updates to vectors. If we are searching for vector `v` and assume we have a row in sstable `a` that is close to `v` and an update to that row in sstable `b` that is far from `v`, the graph search will only find the version of the row in `a`. Then, the score caching will only observe `a`'s close score and then in `VectorTopKProcessor` we will assume that the row's score is for vector `a`, but that is out of date. As far as I understand, we don't have a way to know the vector for which we scored against, which means we don't have a way to verify that the vector in `VectorTopKProcessor` (the one read from storage) is the same as the one from the index.
ekaterinadimitrova2
pushed a commit
to ekaterinadimitrova2/cassandra
that referenced
this pull request
Jun 3, 2024
* Revert "Skip similarity score caching when using brute force (apache#769)" This reverts commit ca7eea6. * Revert " Compute similarity to query only once per indexed vector (per replica) (apache#723)" This reverts commit 692491f. * Add test that exposed score caching impl flaw * Do not revert fix to VectorTester Similarity score caching does not currently work when there are updates to vectors. The added test shows the issue. Conceptually, the problem materializes when we observe an earlier instance of a row with a close score and don't observe the row's later instance of a low score. The error is ranking some rows higher than than they should be. The problem stems from updates to vectors. If we are searching for vector `v` and assume we have a row in sstable `a` that is close to `v` and an update to that row in sstable `b` that is far from `v`, the graph search will only find the version of the row in `a`. Then, the score caching will only observe `a`'s close score and then in `VectorTopKProcessor` we will assume that the row's score is for vector `a`, but that is out of date. As far as I understand, we don't have a way to know the vector for which we scored against, which means we don't have a way to verify that the vector in `VectorTopKProcessor` (the one read from storage) is the same as the one from the index.
michaelsembwever
pushed a commit
to thelastpickle/cassandra
that referenced
this pull request
Jan 7, 2026
* Revert "Skip similarity score caching when using brute force (apache#769)" This reverts commit ca7eea6. * Revert " Compute similarity to query only once per indexed vector (per replica) (apache#723)" This reverts commit 692491f. * Add test that exposed score caching impl flaw * Do not revert fix to VectorTester Similarity score caching does not currently work when there are updates to vectors. The added test shows the issue. Conceptually, the problem materializes when we observe an earlier instance of a row with a close score and don't observe the row's later instance of a low score. The error is ranking some rows higher than than they should be. The problem stems from updates to vectors. If we are searching for vector `v` and assume we have a row in sstable `a` that is close to `v` and an update to that row in sstable `b` that is far from `v`, the graph search will only find the version of the row in `a`. Then, the score caching will only observe `a`'s close score and then in `VectorTopKProcessor` we will assume that the row's score is for vector `a`, but that is out of date. As far as I understand, we don't have a way to know the vector for which we scored against, which means we don't have a way to verify that the vector in `VectorTopKProcessor` (the one read from storage) is the same as the one from the index.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.