Skip to content

[Parquet] Implement reading byte stream split encoded data#9240

Merged
Mytherin merged 5 commits into
duckdb:mainfrom
adamreeve:byte_stream_split
Oct 6, 2023
Merged

[Parquet] Implement reading byte stream split encoded data#9240
Mytherin merged 5 commits into
duckdb:mainfrom
adamreeve:byte_stream_split

Conversation

@adamreeve

Copy link
Copy Markdown
Contributor

This adds support for reading byte stream split encoded Parquet data, fixing #9104 (although that also mentions encode support, which this doesn't address).

DELTA_BYTE_ARRAY = 7,
RLE_DICTIONARY = 8
RLE_DICTIONARY = 8,
BYTE_STREAM_SPLIT = 9,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has a comment saying it's autogenerated and not to edit it, but from what I can tell this is autogenerated in the Arrow repository and was copied into DuckDB, so is fine to edit. Is that correct?

@Mytherin Mytherin merged commit d5bc1a2 into duckdb:main Oct 6, 2023
@Mytherin

Mytherin commented Oct 6, 2023

Copy link
Copy Markdown
Collaborator

Thanks for the PR! Looks great.

@adamreeve adamreeve deleted the byte_stream_split branch October 8, 2023 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants