Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of "DictionaryValue" row encoding #4712

Closed
alamb opened this issue Aug 17, 2023 · 3 comments
Closed

Improve the performance of "DictionaryValue" row encoding #4712

alamb opened this issue Aug 17, 2023 · 3 comments
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Aug 17, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are consider using the "don't use dictionary interning" in DataFusion for high cardinality columns: apache/datafusion#7200 (comment)

@tustvold mentioned this mode could be made faster

Describe the solution you'd like
Review and optimize Code::DictionaryValues

https://github.com/apache/arrow-rs/blob/b810e8f207bbc70294b01acba4be32153c18a6ab/arrow-row/src/lib.rs#L437C14-L437C14

Perhaps this could be made faster:

Codec::DictionaryValues(converter, _) => {

(I am not sure)

Describe alternatives you've considered
There may not be any way to make this faster, but I wanted to file the ticket as follow on to the meeting

Additional context

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Aug 17, 2023
@alamb alamb changed the title Improve the format of "DictionaryValue" row encoding Improve the performance of "DictionaryValue" row encoding Aug 17, 2023
@alamb
Copy link
Contributor Author

alamb commented Sep 25, 2023

For the record, this was fixed b/c we deleted the code that it tracked optimizing 🔥

@tustvold tustvold added the arrow Changes to the arrow crate label Oct 18, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #4819

@tustvold tustvold added the arrow-flight Changes to the arrow-flight crate label Oct 18, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow-flight'} from #4819

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants