Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new_from_strings to create MapArrays #1507

Merged
merged 5 commits into from
Apr 7, 2022
Merged

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 30, 2022

Which issue does this PR close?

Closes #1158.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 30, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya -- this is certainly looking like a much better API

I looked at arrow2 as well and it didn't look like it had anything nicer https://github.com/jorgecarleitao/arrow2/blob/f7f6186ab69c086b013d9f454c91bf3f9438436d/src/array/map/mod.rs

arrow/src/array/array_map.rs Outdated Show resolved Hide resolved
/// Creates map array from provided keys, values and entry_offsets.
pub fn new_from_strings<'a>(
keys: impl Iterator<Item = &'a str>,
values: ArrayData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about using a values: &dyn Array here? I think that might be more natural

Also while reviewing this PR, it seems like keys and entry_offsets are connected to be matched -- I wonder if it would make more sense to have a function like

    pub fn new_keys_and_offsets<'a>(
        keys_and_offsets: impl Iterator<Item = (&'a str, usize)>,
        values: &dyn Array,
    )
...

So that the user would provide some way to provide the key names and offsets into values directly 🤔

Copy link
Member Author

@viirya viirya Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we make it values: &dyn Array, how can we call StructArray::from or make StructArray from it? We cannot move values.

Copy link
Member Author

@viirya viirya Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new_keys_and_offsets, the keys and offsets are not 1:1 matched.

For example, in the test, the keys is vec!["a", "b", "c", "d", "e", "f", "g", "h"]; and the offsets are [0, 3, 6, 8]. So it represents [[a, b, c], [d, e, f], [g, h]].

Otherwise, do you mean ("a", 0), ("b", 0), ("c", 0), ("d", 3), ("e", 3), ("f", 3), ("g", 6), ("h", 6) as keys_and_offsets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we make it values: &dyn Array, how can we call StructArray::from or make StructArray from it? We cannot move values.

I guess I was thinking we could use array.data() to get the ArrayData needed to construct the StructArray

For example, in the test, the keys is vec!["a", "b", "c", "d", "e", "f", "g", "h"]; and the offsets are [0, 3, 6, 8]. So it represents [[a, b, c], [d, e, f], [g, h]].

Ah, sorry, I missed that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking we could use array.data() to get the ArrayData needed to construct the StructArray

Got it. data() returns &ArrayData, so we need a clone here, but I think it should be fine as it won't copy array data.

viirya and others added 2 commits March 31, 2022 16:55
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@codecov-commenter
Copy link

Codecov Report

Merging #1507 (a3922af) into master (c5442cf) will increase coverage by 0.07%.
The diff coverage is 95.65%.

@@            Coverage Diff             @@
##           master    #1507      +/-   ##
==========================================
+ Coverage   82.68%   82.76%   +0.07%     
==========================================
  Files         188      190       +2     
  Lines       54361    54748     +387     
==========================================
+ Hits        44951    45313     +362     
- Misses       9410     9435      +25     
Impacted Files Coverage Δ
arrow/src/array/array_map.rs 84.89% <95.65%> (+3.38%) ⬆️
arrow/src/array/equal/utils.rs 74.45% <0.00%> (-3.24%) ⬇️
parquet_derive/src/parquet_field.rs 65.75% <0.00%> (-0.46%) ⬇️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
arrow/src/array/transform/mod.rs 86.46% <0.00%> (-0.12%) ⬇️
arrow/src/array/builder.rs 86.68% <0.00%> (-0.05%) ⬇️
arrow/src/compute/kernels/filter.rs 88.00% <0.00%> (ø)
arrow/src/compute/kernels/length.rs 100.00% <0.00%> (ø)
arrow/src/ffi_stream.rs 79.03% <0.00%> (ø)
arrow/src/array/equal/union.rs 100.00% <0.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5442cf...a3922af. Read the comment docs.

@alamb
Copy link
Contributor

alamb commented Apr 7, 2022

Thanks @viirya

@alamb alamb merged commit 252a983 into apache:master Apr 7, 2022
@viirya
Copy link
Member Author

viirya commented Apr 7, 2022

Thanks @alamb !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nicer API to create MapArrays
3 participants