Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add freeze_with_dictionary API to MutableArrayData #2915

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Oct 24, 2022

Which issue does this PR close?

Closes #2914.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 24, 2022
@viirya
Copy link
Member Author

viirya commented Oct 24, 2022

cc @sunchao

///
/// # Safety
///
/// As this doesn't validate the provided dictionary `ArrayData` values, the input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure this will work as expected, MutableArrayData has some pretty hard assumptions when it comes to dictionaries - it concatenates the dictionary arrays, and computes new keys based on this assumption

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the values of dictionaries. What MutableArrayData does is simply concatenates the value arrays with another MutableArrayData. The caller must ensure it provides correct concatenated dictionary value array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm? That is also where I looked at. For dictionaries, it uses another MutableArrayData to extend from their value arrays. Isn't it?

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some tests of this functionality would be good, as I'm not entirely sure it is correct.

I would also draw your attention to #2832 and #2798

@viirya
Copy link
Member Author

viirya commented Oct 24, 2022

I would also draw your attention to #2832 and #2798

Just tried it. I can use DictionaryArray::with_values for our case. Still copying dictionary value array with another MutableArrayData, then I changed to invoke with_values on the dictionary array obtained from freezing the first MutableArrayData.

Just feeling that this way it is somehow not efficient as it needs to freeze and make an array, then takes the value array and copy and attach it back.

For multiple dictionaries, one more issue is there as internally MutableArrayData will extend from sources' values. For single dictionary, it has different behavior as no extending.

@viirya
Copy link
Member Author

viirya commented Oct 24, 2022

I'm closing this as DictionaryArray::with_values can fit our use case for now. It is not as convenient as proposed way though, seems no need to add a new API.

@viirya viirya closed this Oct 24, 2022
@tustvold
Copy link
Contributor

tustvold commented Oct 24, 2022

Just feeling that this way it is somehow not efficient as it needs to freeze and make an array, then takes the value array and copy and attach it back.

I think I am missing something. I don't think there is a way to avoid copying and attaching back - this is what MutableArrayData does also? I did create #1981 a while back to track allowing mutation of arrays, but currently there isn't any support for this?

Still copying dictionary value array with another MutableArrayData

This is the step I don't understand, what is this operation you are implementing here? Why are you using MutableArrayData for this at all?

@viirya
Copy link
Member Author

viirya commented Oct 24, 2022

I think I am missing something. I don't think there is a way to avoid copying and attaching back - this is what MutableArrayData does also? I did create #1981 a while back to track allowing mutation of arrays, but currently there isn't any support for this?

Oh, no, copying is unavoidable. I meant that I need to create an array first and take value value from it. I feel that it is not smooth in usage as letting MutableArrayData to handle it as the proposed API. Just my preference.

Still copying dictionary value array with another MutableArrayData

This is the step I don't understand, what is this operation you are implementing here? Why are you using MutableArrayData for this at all?

We reuse array allocation generally. But for certain operations, we need to hold on and wait for all arrays before we can do the operation. So we need to copy arrays there otherwise they will be overwritten. But while MutableArrayData extends DictionaryArray, it doesn't copy the value array (if only one DictionaryArray there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add freeze_with_dictionary API to MutableArrayData
2 participants