You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
more efficient memory use as the dictionary array would use less memory when loaded in memory
faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
Andrew Lamb / @alamb: @yordan-pavlov I think this would be amazing – and we would definitely use it in IOx. This is the kind of thing that is on our longer term roadmap and I would love to help (e.g. code review, or testing , or documentation, etc).
Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
more efficient memory use as the dictionary array would use less memory when loaded in memory
faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
[~nevime]
, @alamb let me know what you thinkReporter: Yordan Pavlov / @yordan-pavlov
Note: This issue was originally created as ARROW-11410. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: