Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

No issue

Rationale for this change

This is done to improve the performance when wanting to add already build dictionary to existing builder by taking advantage of the fact that we don't need to check the values for each key

What changes are included in this PR?

added extend_dictionary for PrimitiveDictionaryBuilder and for GenericByteDictionaryBuilder

Are there any user-facing changes?

yes, these are public methods

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 12, 2024
@rluvaton
Copy link
Member Author

Hey @tustvold, can you please re-review?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- this looks like a nice improvement to me.

@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

I added some small suggestions on how to improve the docstrings, but we could do that as a follow on PR as well

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@rluvaton
Copy link
Member Author

I added some small suggestions on how to improve the docstrings, but we could do that as a follow on PR as well

applied

@alamb alamb merged commit 2908a80 into apache:main Dec 19, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Thanks @rluvaton

@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

@rluvaton rluvaton deleted the add-extend-dict branch December 21, 2024 16:37
CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Dec 28, 2024
apache#6875)

* add `extend_dictionary` in dictionary builder for improved performance

* fix extends all nulls

* support null in mapped value

* adding comment

* run `clippy` and `fmt`

* fix ci

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants