-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read/write nested dictionary under large list in ipc stream reader/writer #1585
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1585 +/- ##
==========================================
+ Coverage 82.87% 82.95% +0.08%
==========================================
Files 193 193
Lines 55304 55384 +80
==========================================
+ Hits 45832 45943 +111
+ Misses 9472 9441 -31
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, not sure about the map change, and left some comments around additional tests you might consider adding
arrow/src/datatypes/field.rs
Outdated
@@ -131,7 +131,7 @@ impl Field { | |||
DataType::List(field) | |||
| DataType::LargeList(field) | |||
| DataType::FixedSizeList(field, _) | |||
| DataType::Map(field, _) => collected_fields.push(field), | |||
| DataType::Map(field, _) => collected_fields.extend(field.fields()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this change relate to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly an accidental inclusion from #1583 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
arrow/src/ipc/reader.rs
Outdated
@@ -1444,29 +1444,27 @@ mod tests { | |||
assert_eq!(input_batch, output_batch); | |||
} | |||
|
|||
#[test] | |||
fn test_roundtrip_stream_nested_dict_dict() { | |||
fn test_test_roundtrip_stream_nested_dict_dict_for_list< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_test_roundtrip_stream_nested_dict_dict_for_list< | |
fn test_roundtrip_stream_dict_of_list_of_dict_impl< |
Maybe?
arrow/src/ipc/reader.rs
Outdated
let list_data = ArrayData::builder(list_data_type) | ||
.len(3) | ||
.add_buffer(value_offsets) | ||
.add_child_data(dict_data.clone()) | ||
.build() | ||
.unwrap(); | ||
let list_array = ListArray::from(list_data); | ||
let list_array = GenericListArray::<OffsetSize>::from(list_data); | ||
|
||
let dict_dict_array = | ||
DictionaryArray::<Int8Type>::try_new(&keys, &list_array).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider using a different keys array for this dictionary for added 🌶️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
arrow/src/ipc/reader.rs
Outdated
1, | ||
false, | ||
))); | ||
let offsets: &[i32; 4] = &[0, 2, 4, 6]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try a zero-length slice, instead of uniform 2 each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, added a zero-length one.
arrow/src/ipc/reader.rs
Outdated
@@ -1481,4 +1479,35 @@ mod tests { | |||
let output_batch = roundtrip_ipc_stream(&input_batch); | |||
assert_eq!(input_batch, output_batch); | |||
} | |||
|
|||
#[test] | |||
fn test_roundtrip_stream_nested_dict_dict_in_list() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_roundtrip_stream_nested_dict_dict_in_list() { | |
fn test_roundtrip_stream_dict_of_list_of_dict() { |
arrow/src/ipc/reader.rs
Outdated
let list_data_type = DataType::List(Box::new(Field::new_dict( | ||
"item", | ||
DataType::Dictionary(Box::new(DataType::Int8), Box::new(DataType::Utf8)), | ||
false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also test some nulls, just because they always seem to break things 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added a few nulls.
Thanks @tustvold. |
Which issue does this PR close?
Closes #1584.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?