[Arrow (Dev)] Refactor arrow scan internals #8430

Tishj · 2023-07-31T11:59:20Z

Previously we populated ArrowConvertData while scanning the arrow schema and the function returned a DuckDB LogicalType.

This has been removed, instead the function now outputs an ArrowType, this contains both the LogicalType and the data that would be put into the ArrowConvertData.

This allows us to simplify the arrow scan a lot, as we no longer need to pass the column index, the map and the 'arrow_convert_index'

This arrow_convert_index consisted of a vector of ArrowConvertDataIndices, this was state used to remember which column/column child we were scanning.
This has now also been removed in its entirety.

…refactor

…mns methods

…refactor

pdet

Thanks for picking this refactor up, I think the code looks way cleaner!

Just added some nitpicks

src/include/duckdb/function/table/arrow/arrow_duck_schema.hpp

pdet · 2023-08-01T11:05:40Z

src/include/duckdb/function/table/arrow/arrow_duck_schema.hpp

+	unique_ptr<ArrowType> dictionary_type;
+};
+
+using arrow_column_map_t = unordered_map<idx_t, ArrowType>;


I think the type alias is a bit of an excessive abstraction here

Hmm yea I agree, before this refactor it was all over the place, but now it's only in a single location

Actually, on further inspection I think it makes the prototypes less bulky and clearly defines the relation between ArrowTableType and the arrow scan functions.

It also allows us to change these things in one place, for example when I made the change to unique_ptr over move semantics it was really nice to just change the using definition and not have to hunt down the other uses of it.

…that happens, checking if consistent

…nsion patches

…refactor

Mytherin · 2023-08-07T12:28:57Z

Thanks!

Apply patch from duckdb/duckdb#8430 and update duckdb to latest main

Tishj added 11 commits July 27, 2023 10:55

update GetArrowLogicalType

8dcc917

add a typedef for the arrow conversion map

5197443

one more change

ab525d7

Merge remote-tracking branch 'upstream/master' into arrow_conversion_…

47bd6ae

…refactor

remove the ArrowConvertData, use ArrowType instead

f169fd9

make the column data of the arrow scan local state private

be9ef8f

dont leave values undefined

c64c087

fix converting from array with dictionary

79d8645

add patches needed for this refactor

24b029d

add APPLY_PATCHES to arrow compilation

fe242b4

remove the ArrowConvertDataIndices struct

832e679

Tishj requested a review from pdet July 31, 2023 11:59

regenerate enum utils

eb1931e

github-actions bot marked this pull request as draft July 31, 2023 12:06

Tishj added 4 commits August 1, 2023 01:04

fix patch for spatial

9d941bd

use ArrowTableType to reduce duplication of the AddColumn and GetColu…

c6bf0ac

…mns methods

update uncovered files

a39e493

Merge remote-tracking branch 'upstream/master' into arrow_conversion_…

948bbfa

…refactor

pdet suggested changes Aug 1, 2023

View reviewed changes

Tishj added 2 commits August 2, 2023 00:14

rerun CI, got timed out on 'test_3324' or 'test_3654', not clear why …

79496a7

…that happens, checking if consistent

remove utility include

ee66ea7

Tishj force-pushed the arrow_conversion_refactor branch from a51e99e to ee66ea7 Compare August 2, 2023 08:45

Tishj added 6 commits August 2, 2023 11:20

use unique_ptr instead of move constructor, update arrow+spatial exte…

906bb89

…nsion patches

comment style

0e86a82

Merge branch 'master' into arrow_conversion_refactor

96fcc1c

update patch

bf4fc65

Merge remote-tracking branch 'upstream/master' into arrow_conversion_…

3e6537b

…refactor

Merge remote-tracking branch 'upstream/master' into arrow_conversion_…

53829da

…refactor

Tishj marked this pull request as ready for review August 5, 2023 17:23

Mytherin merged commit 3ab8d38 into duckdb:master Aug 7, 2023
53 checks passed

Maxxen added a commit to duckdb/duckdb_spatial that referenced this pull request Aug 31, 2023

Merge pull request #121 from carlopi/patches

2f55d5d

Apply patch from duckdb/duckdb#8430 and update duckdb to latest main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arrow (Dev)] Refactor arrow scan internals #8430

[Arrow (Dev)] Refactor arrow scan internals #8430

Tishj commented Jul 31, 2023 •

edited

pdet left a comment

pdet Aug 1, 2023

Tishj Aug 1, 2023

Tishj Aug 2, 2023

Mytherin commented Aug 7, 2023

[Arrow (Dev)] Refactor arrow scan internals #8430

[Arrow (Dev)] Refactor arrow scan internals #8430

Conversation

Tishj commented Jul 31, 2023 • edited

pdet left a comment

Choose a reason for hiding this comment

pdet Aug 1, 2023

Choose a reason for hiding this comment

Tishj Aug 1, 2023

Choose a reason for hiding this comment

Tishj Aug 2, 2023

Choose a reason for hiding this comment

Mytherin commented Aug 7, 2023

Tishj commented Jul 31, 2023 •

edited