-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic sorting (ORDER BY) on non-clustering columns using SAI #1073
Conversation
Key changes in this commit: * Replace ScorePrimaryKey with PrimaryKeyWithSortKey. This is an abstract class that currently implements PrimaryKey and allows for a subclass to supply a sort key used for both sorting peer PrimaryKeys and will eventually support validating rows read from storage. * Similarly, replace ScoredRowId with RowIdWithMeta, though the name meta might become SortKey in a later commit. This commit does not introduce any new feature and all tests should still pass.
This commit does not work, but I am adding it to save my work.
* Also adds more test coverage, the tests found some issues and I fixed those. * I still have some open questions about the correct ByteComparable implementation details. I am going to rely on tests to catch issues.
Vector uses a similarity score calculation to ensure correctness of order. However, such computation is not possible if do not have the source value stored in the index or if we're unable to efficiently find the source value. Instead, we can use the source value for a given cell to determine if the index's source is the same. When the materialized cell and the index's value come from the same ss or memtable, we know that the value was not subsequently updated and we therefore know that the index's ordering is correct. If they do not match, the ordering correctness is not known (though it could still be correct if overwritten by the same value), so we throw out the row to ensure absolute correctness. Future work will remove our reliance on scores and ByteComparable objects and we'll replace them with an overquery of at least one extra row per sstable and memtable. This is required because switching to validating source reference equality renders index ordering results unable to be compared. However, the one benefit is that we will not need to do any computations or byte buffer comparisons to determine order validity. Thus, the new design will have some benefits and some drawbacks. The largest benefit is supporting a completely new class of types that are eligible for ORDER BY clauses.
There are several valid scenarios where the source tables are not equal, but the rows could have scores indicating the row is valid. See several tests in VectorUpdateDeleteTest that fail on the last commit. With this change, those tests are passing. I include stricter assertions on row order to ensure we do not have a subtle regression. Note that the cases where the score would otherwise indicate the row is valid are cases where we are doing something suboptimal. However, these are unlikely scenarios and not ones we need to optimize for. Instead, we're prioritizing supporting more generic order by.
So far, everything is ascending and a bit hacky. I need to write more tests. Now that we're ordering a bunch of types, it is relevant to get a bit more of a framework of tests.
With this change, we can do hybrid queries on non-ann order by queries.
SimpleDateType is serialized incorrectly for ordering. InetAddressType might be sortable, but I am skipping for now. UUIDType does not seem meaningful to sort, but I could be mistaken.
This commit is a hack to prevent issues related to using an empty byte buffer in the SimpleExpression class. I tried to find a way to break orderings out of the current framework, but it is not easy to do, so for now, we continue to deepen the hack...
Because we are using memtable references and sstable ids to validate that a given cell in an ORDER BY result out of an SAI index is valid, we need to make sure that we have a single, consistent view of the indexes and their associated mem/ss tables. The abstraction could probably improve, but the underlying design seems to work. We do not use the lock on the boolean predicates because those are validated using their actual predicate logic, and therefore, we can skip any extra work to use the ORDER BY view for the WHERE clauses.
This is still a WIP. The abstraction is getting closer though.
All of the failing tests were related to merging SingleColumnRestrictions. Because I moved the logic to add ordering restrictions earlier in the StatementRestrictions#doBuild method, I changed the order that we can expect restrictions to get merged. This simplifies the logic and keeps the complexity all within the ordering restrictions, which might someday get moved out to their own abstraction. Note: I still need to fix SelectOrderByTest.
(cherry picked from commit 1e546f9)
I had removed the ORDER BY predicates, but they were actually doing some work filtering out tombstoned rows.
ORDER BY
) on non-clustering columns using SAI
ORDER BY
) on non-clustering columns using SAIORDER BY
on non-clustering columns using SAI
ORDER BY
on non-clustering columns using SAIThe merge broke the two classes because their posting lists file handles got closed too early.
src/java/org/apache/cassandra/index/sai/plan/QueryViewBuilder.java
Outdated
Show resolved
Hide resolved
@eolivelli @pkolaczk - I think this PR is very close (I know we want it merged as soon as possible), but I reviewed some of the TODOs left in the code, and I think it might be worth either another pass before merging or shortly after merging. All of the tests are passing, which is a great sign, but the TODOs are some open questions I had when writing the PR and I still don't seem to have answers yet. |
I found a bug:
ReproductionLoad data with Do not flush after loading. Issue the query:
It works fine after flushing. I mean, almost - because it is still slow (takes 3 seconds!) |
@pkolaczk - nice catch. I introduced that bug while resolving merge conflicts. |
Retested it and now it works fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overall great new feature!
I'm conditionally approving this, provided the following issues are fixed:
-
SELECT * FROM tab WHERE a > const ORDER BY a
queries currently don't work because of huge performance cost; neither for on-disk or in-memory indexes. Either let's fix it or at least disable it? -
Don't forget to merge the PR for lazy memory index range scans as that PR solves the first performance issues I mentioned in the doc.
Quality Gate passedIssues Measures |
Reviewed the most recent five commits, particularly the CANONICAL filter change. The CANONICAL change resolves all my local reproducers for issues with source Object mismatch when using LIVE sstables containing preemptively opened readers. LGTM! |
This test started failing when I inverted the compareTo for the PrimaryKeyWithScore.
Quality Gate passedIssues Measures |
Merging this PR with the understanding that we will follow up with individual PRs to address these issues. |
…1073) Add support for `ORDER BY` via SAI. The PR essentially makes the ANN vector ordering logic more generic by adding a few abstractions. This PR adds support for ordering all existing BA and CA indexes without writing any new index files. AA indexes cannot be sorted because the primary key map is based on partitions, not primary keys. * `PrimaryKeyWithSortKey` - this is a primary key that has some metadata that can be used for sorting. The two implementations at the moment are a `float` used for vector similarity score and a `ByteComparable` object used from numeric and text indexes. * `RowWithSourceTable` and `CellWithSourceTable` - this abstraction gives us a way to validate the source of a given cell. That source is then used to confirm whether an index's ranking of the data is valid. * `ORDER BY` on SAI requires CL=1. * Updates on same memtable: unlike the `VectorMemtableIndex`, the `TrieMemoryIndex` does not override `MemtableIndex#update`. Instead, an update is handled as an insertion. That presents two problems. First, the memtable has both values, and second, the resulting sstable index has both values. This PR addresses that edge case by comparing the index's view of the value to memtable/sstable's live view of the value for the row. If implement `MemtableIndex#update`, we can remove the `PrimaryKeyWithSortKey#isIndexDataEqualToLiveData` guard. * The `RowWithSourceTable` and `CellWithSourceTable` are used by vector ordering to prevent re-calculating the score in the `TopKProcessor`. Once we solve the update problem for numeric and text indexes, we will do the same for those indexes. * In order for the "source table" logic to work, we must acquire a single view of the sstables and memtables and their associated indexes. Otherwise, we risk classifying a valid row as "updated". In the `StorageAttachedIndexSearcher`, we get this view and hold it for the duration of the query. Because hyrbid queries do not need to have the source table validated, they are not included in the view logic, which simplifies determining which indexes need to be locked. * Hybrid queries are only weakly index based. Because we do not have an easy way to map from `PrimaryKey` to `value` in our numeric and text indexes, the quickest solution to implement was to read the value from the sstable and then sort in the index. * @pkolaczk updated the cost based query planner for kd tree and trie based ordering. They might need tweaking in subsequent testing. Currently, there are 3 types that are not sortable via SAI: * `InetAddressType` because we need to add decoding logic based on SAI's TypeUtil.encode method, and because SAI sorts `InetAddress` objects differently that Cassandra's `InetAddressType` comparator sorts them * `DecimalType` because SAI truncates the value to 24 bytes * `IntegerType` because SAI truncates to 20 bytes The first one could be fixed based in the current paradigm, and the last two will likely require a change to the index format. Technically, we could optionally pull out all of the values that have more than 20 or 24 bytes, but I think that isn't worth the effort. --------- Co-authored-by: Piotr Kołaczkowski <pkolaczk@datastax.com>
High Level Design
Add support for
ORDER BY
via SAI. The PR essentially makes the ANN vector ordering logic more generic by adding a few abstractions. This PR adds support for ordering all existing BA and CA indexes without writing any new index files. AA indexes cannot be sorted because the primary key map is based on partitions, not primary keys.New Abstractions
PrimaryKeyWithSortKey
- this is a primary key that has some metadata that can be used for sorting. The two implementations at the moment are afloat
used for vector similarity score and aByteComparable
object used from numeric and text indexes.RowWithSourceTable
andCellWithSourceTable
- this abstraction gives us a way to validate the source of a given cell. That source is then used to confirm whether an index's ranking of the data is valid.Notable design details
ORDER BY
on SAI requires CL=1.VectorMemtableIndex
, theTrieMemoryIndex
does not overrideMemtableIndex#update
. Instead, an update is handled as an insertion. That presents two problems. First, the memtable has both values, and second, the resulting sstable index has both values. This PR addresses that edge case by comparing the index's view of the value to memtable/sstable's live view of the value for the row. If implementMemtableIndex#update
, we can remove thePrimaryKeyWithSortKey#isIndexDataEqualToLiveData
guard.RowWithSourceTable
andCellWithSourceTable
are used by vector ordering to prevent re-calculating the score in theTopKProcessor
. Once we solve the update problem for numeric and text indexes, we will do the same for those indexes.StorageAttachedIndexSearcher
, we get this view and hold it for the duration of the query. Because hyrbid queries do not need to have the source table validated, they are not included in the view logic, which simplifies determining which indexes need to be locked.PrimaryKey
tovalue
in our numeric and text indexes, the quickest solution to implement was to read the value from the sstable and then sort in the index.Unsupported data types
Currently, there are 3 types that are not sortable via SAI:
InetAddressType
because we need to add decoding logic based on SAI's TypeUtil.encode method, and because SAI sortsInetAddress
objects differently that Cassandra'sInetAddressType
comparator sorts themDecimalType
because SAI truncates the value to 24 bytesIntegerType
because SAI truncates to 20 bytesThe first one could be fixed based in the current paradigm, and the last two will likely require a change to the index format. Technically, we could optionally pull out all of the values that have more than 20 or 24 bytes, but I think that isn't worth the effort.
Dependencies
The following PRs originally had the following dependencies merged to vsearch and then into this branch: