-
Notifications
You must be signed in to change notification settings - Fork 2.5k
ART prefix refactor #7930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ART prefix refactor #7930
Conversation
# Conflicts: # src/execution/index/art/leaf.cpp # src/execution/index/art/leaf_segment.cpp # src/execution/index/art/node.cpp # src/execution/index/art/prefix.cpp # src/include/duckdb/execution/index/art/fixed_size_allocator.hpp # src/include/duckdb/execution/index/art/node.hpp
# Conflicts: # test/sql/copy/parquet/parquet_glob.test
# Conflicts: # extension/json/json_functions.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good. I just had a small comment about one of the functions.
In general, the refactored code for the iterator looks much better.
One thing I wonder is if this has any impact on performance, I understand that with this change, our ART is potentially smaller since it won't allocate prefix memory if it's not necessary, but I guess it will increase pointer chasing, especially if every node has a prefix before it.
|
That's a fair point! To investigate this possible overhead, I can add some benchmarks for |
# Conflicts: # test/api/test_reset.cpp
|
I will open a separate PR (once the CI passes locally) to refactor the testing and benchmarking of the ART. That also adds a benchmark for |
# Conflicts: # test/sql/storage/test_index_checkpoint.test
# Conflicts: # .github/config/uncovered_files.csv
|
@pdet local benchmark comparison, since the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for running the benchmarks and adding as regression Tania!
It looks great!
|
Thanks! Looks great |
This PR refactors the prefix handling of our ART index. Additionally, it fully reworks the iterator code and increases the code coverage of the ART.
Before this PR, there were seven ART nodes. Most of these had an inlined
Prefix. That prefix contained acountof 4 bytes, and either 8 bytes of prefix data, or aNodepointer of 8 bytes, so in total 12 bytes.With this PR, we introduce a new prefix node. This prefix node contains data and a node pointer to a subsequent node. The data consists of
Node::PREFIX_SIZE + 1bytes, of which the first byte is the count of that prefix node, and the other bytes are the prefix data. For prefixes exceedingNode::PREFIX_SIZE, the prefix down to a non-prefix node becomes a chain of prefix nodes, otherwise the prefix is stored in a single prefix node. If there is no prefix, then we directly traverse to the subsequent node.This change has two main benefits.
Nodepointer of the previous node, completely omitting the need for a separateLeafnode. [ART] RowID Leaf node specialization #5365This PR slightly improves the memory consumption for longer keys and does not regress in the presence of shorter keys. The ART still has a fairly high memory pressure, and the goal is to further decrease this in future PRs.
Also related to #5865.