Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write nested dictionary under large list in ipc stream reader/writer #1585

Merged
merged 2 commits into from
Apr 20, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 18, 2022

Which issue does this PR close?

Closes #1584.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 18, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2022

Codecov Report

Merging #1585 (e44e52f) into master (6bb6ed0) will increase coverage by 0.08%.
The diff coverage is 97.87%.

❗ Current head e44e52f differs from pull request most recent head 9321baa. Consider uploading reports for the commit 9321baa to get more accurate results

@@            Coverage Diff             @@
##           master    #1585      +/-   ##
==========================================
+ Coverage   82.87%   82.95%   +0.08%     
==========================================
  Files         193      193              
  Lines       55304    55384      +80     
==========================================
+ Hits        45832    45943     +111     
+ Misses       9472     9441      -31     
Impacted Files Coverage Δ
arrow/src/ipc/writer.rs 81.52% <62.50%> (-0.28%) ⬇️
arrow/src/array/array_struct.rs 88.44% <100.00%> (ø)
arrow/src/array/builder.rs 86.68% <100.00%> (ø)
arrow/src/array/data.rs 83.24% <100.00%> (ø)
arrow/src/array/equal/utils.rs 74.45% <100.00%> (ø)
arrow/src/compute/kernels/boolean.rs 96.79% <100.00%> (ø)
arrow/src/compute/kernels/cast.rs 95.81% <100.00%> (+0.08%) ⬆️
arrow/src/compute/kernels/substring.rs 100.00% <100.00%> (+1.68%) ⬆️
arrow/src/csv/reader.rs 89.89% <100.00%> (ø)
arrow/src/csv/writer.rs 71.32% <100.00%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bb6ed0...9321baa. Read the comment docs.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, not sure about the map change, and left some comments around additional tests you might consider adding

@@ -131,7 +131,7 @@ impl Field {
DataType::List(field)
| DataType::LargeList(field)
| DataType::FixedSizeList(field, _)
| DataType::Map(field, _) => collected_fields.push(field),
| DataType::Map(field, _) => collected_fields.extend(field.fields()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this change relate to?

Copy link
Contributor

@tustvold tustvold Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly an accidental inclusion from #1583 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -1444,29 +1444,27 @@ mod tests {
assert_eq!(input_batch, output_batch);
}

#[test]
fn test_roundtrip_stream_nested_dict_dict() {
fn test_test_roundtrip_stream_nested_dict_dict_for_list<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn test_test_roundtrip_stream_nested_dict_dict_for_list<
fn test_roundtrip_stream_dict_of_list_of_dict_impl<

Maybe?

let list_data = ArrayData::builder(list_data_type)
.len(3)
.add_buffer(value_offsets)
.add_child_data(dict_data.clone())
.build()
.unwrap();
let list_array = ListArray::from(list_data);
let list_array = GenericListArray::<OffsetSize>::from(list_data);

let dict_dict_array =
DictionaryArray::<Int8Type>::try_new(&keys, &list_array).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider using a different keys array for this dictionary for added 🌶️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

1,
false,
)));
let offsets: &[i32; 4] = &[0, 2, 4, 6];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try a zero-length slice, instead of uniform 2 each time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, added a zero-length one.

@@ -1481,4 +1479,35 @@ mod tests {
let output_batch = roundtrip_ipc_stream(&input_batch);
assert_eq!(input_batch, output_batch);
}

#[test]
fn test_roundtrip_stream_nested_dict_dict_in_list() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn test_roundtrip_stream_nested_dict_dict_in_list() {
fn test_roundtrip_stream_dict_of_list_of_dict() {

let list_data_type = DataType::List(Box::new(Field::new_dict(
"item",
DataType::Dictionary(Box::new(DataType::Int8), Box::new(DataType::Utf8)),
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also test some nulls, just because they always seem to break things 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a few nulls.

@viirya
Copy link
Member Author

viirya commented Apr 19, 2022

Thanks @tustvold.

@tustvold tustvold merged commit b4642ec into apache:master Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read/write nested dictionary under large list in ipc stream reader/write
3 participants