-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arrow (Dev)] Refactor arrow scan internals #8430
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this refactor up, I think the code looks way cleaner!
Just added some nitpicks
unique_ptr<ArrowType> dictionary_type; | ||
}; | ||
|
||
using arrow_column_map_t = unordered_map<idx_t, ArrowType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type alias is a bit of an excessive abstraction here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yea I agree, before this refactor it was all over the place, but now it's only in a single location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on further inspection I think it makes the prototypes less bulky and clearly defines the relation between ArrowTableType and the arrow scan functions.
It also allows us to change these things in one place, for example when I made the change to unique_ptr
over move semantics it was really nice to just change the using definition and not have to hunt down the other uses of it.
…that happens, checking if consistent
a51e99e
to
ee66ea7
Compare
Thanks! |
Apply patch from duckdb/duckdb#8430 and update duckdb to latest main
Previously we populated
ArrowConvertData
while scanning the arrow schema and the function returned a DuckDB LogicalType.This has been removed, instead the function now outputs an
ArrowType
, this contains both the LogicalType and the data that would be put into theArrowConvertData
.This allows us to simplify the arrow scan a lot, as we no longer need to pass the column index, the map and the 'arrow_convert_index'
This
arrow_convert_index
consisted of a vector ofArrowConvertDataIndices
, this was state used to remember which column/column child we were scanning.This has now also been removed in its entirety.