-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ipc): Support serializing structs containing dictionaries #848
Conversation
Dictionary fields nested in structs were not properly marked as dictionary fields when serializing to fb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helgikrs could you please run |
Codecov Report
@@ Coverage Diff @@
## master #848 +/- ##
==========================================
- Coverage 82.65% 82.63% -0.03%
==========================================
Files 168 168
Lines 48081 48077 -4
==========================================
- Hits 39742 39727 -15
- Misses 8339 8350 +11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice improvement to me. Thank you @helgikrs
If I don't hear anything from other reviewers, I plan to merge this in tomorrow (and include it in the 6.1.0 release, slated for end of this week, early next week)
@@ -610,21 +610,7 @@ pub(crate) fn get_fb_field_type<'a>( | |||
// struct's fields are children | |||
let mut children = vec![]; | |||
for field in fields { | |||
let inner_types = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case anyone else is curious, without this code change the test in this PR fails like this:
failures:
---- ipc::convert::tests::convert_schema_round_trip stdout ----
thread 'ipc::convert::tests::convert_schema_round_trip' panicked at 'assertion failed: `(left == right)`
left: `Schema { fields: [Field { name: "uint8", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: Some({"k": "v"}) }, Field { name: "uint16", data_type: UInt16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint32", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint64", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int8", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int16", data_type: Int16, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int64", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float16", data_type: Float16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float32", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float64", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "null", data_type: Null, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date32", data_type: Date32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date64", data_type: Date64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[s]", data_type: Time32(Second), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[ms]", data_type: Time32(Millisecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[us]", data_type: Time64(Microsecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[ns]", data_type: Time64(Nanosecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[s]", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ms]", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[us]", data_type: Timestamp(Microsecond, Some("Africa/Johannesburg")), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ns]", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[ym]", data_type: Interval(YearMonth), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[dt]", data_type: Interval(DayTime), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "utf8", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "binary", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[u8]", data_type: List(Field { name: "item", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<float32, int32, bool>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "float32", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<dictionary<int32, utf8>>", data_type: Struct([Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<int64, list[struct<date32, list[struct<>]>]>", data_type: Struct([Field { name: "int64", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<date32, list[struct<>]>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "date32", data_type: Date32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<>]", data_type: List(Field { name: "struct", data_type: Struct([]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<>", data_type: Struct([]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "dictionary<uint8, uint32>", data_type: Dictionary(UInt8, UInt32), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "decimal<usize, usize>", data_type: Decimal(10, 6), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {"Key": "value"} }`,
right: `Schema { fields: [Field { name: "uint8", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: Some({"k": "v"}) }, Field { name: "uint16", data_type: UInt16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint32", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint64", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int8", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int16", data_type: Int16, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int64", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float16", data_type: Float16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float32", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float64", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "null", data_type: Null, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date32", data_type: Date32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date64", data_type: Date64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[s]", data_type: Time32(Second), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[ms]", data_type: Time32(Millisecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[us]", data_type: Time64(Microsecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[ns]", data_type: Time64(Nanosecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[s]", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ms]", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[us]", data_type: Timestamp(Microsecond, Some("Africa/Johannesburg")), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ns]", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[ym]", data_type: Interval(YearMonth), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[dt]", data_type: Interval(DayTime), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "utf8", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "binary", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[u8]", data_type: List(Field { name: "item", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<float32, int32, bool>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "float32", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<dictionary<int32, utf8>>", data_type: Struct([Field { name: "dictionary<int32, utf8>", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<int64, list[struct<date32, list[struct<>]>]>", data_type: Struct([Field { name: "int64", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<date32, list[struct<>]>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "date32", data_type: Date32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<>]", data_type: List(Field { name: "struct", data_type: Struct([]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<>", data_type: Struct([]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "dictionary<uint8, uint32>", data_type: Dictionary(UInt8, UInt32), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "decimal<usize, usize>", data_type: Decimal(10, 6), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {"Key": "value"} }`', arrow/src/ipc/convert.rs:864:9
stack backtrace:
0: rust_begin_unwind
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14
2: core::panicking::assert_failed_inner
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:183:17
3: core::panicking::assert_failed
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:140:5
4: arrow::ipc::convert::tests::convert_schema_round_trip
at ./src/ipc/convert.rs:864:9
5: arrow::ipc::convert::tests::convert_schema_round_trip::{{closure}}
at ./src/ipc/convert.rs:707:5
6: core::ops::function::FnOnce::call_once
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
7: core::ops::function::FnOnce::call_once
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
* fix(ipc): Support serializing structs containing dictionaries Dictionary fields nested in structs were not properly marked as dictionary fields when serializing to fb. * style: cargo fmt
…865) * fix(ipc): Support serializing structs containing dictionaries Dictionary fields nested in structs were not properly marked as dictionary fields when serializing to fb. * style: cargo fmt Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
Which issue does this PR close?
Closes #847 and is a part of #846.
What changes are included in this PR?
Dictionary fields nested in structs are not properly marked as dictionary fields when serializing a schema to FB. This PR fixes that issue and adds a new test case.