Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ipc): Support serializing structs containing dictionaries #848

Merged
merged 2 commits into from
Oct 26, 2021

Conversation

helgikrs
Copy link
Contributor

@helgikrs helgikrs commented Oct 22, 2021

Which issue does this PR close?

Closes #847 and is a part of #846.

What changes are included in this PR?

Dictionary fields nested in structs are not properly marked as dictionary fields when serializing a schema to FB. This PR fixes that issue and adds a new test case.

Dictionary fields nested in structs were not properly marked as
dictionary fields when serializing to fb.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 22, 2021
@alamb alamb requested a review from nevi-me October 23, 2021 11:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @helgikrs -- this looks like a very nice change. @nevi-me do you have time to review this (relatively small) PR?

@alamb
Copy link
Contributor

alamb commented Oct 23, 2021

@helgikrs could you please run cargo fmt to get the Lint CI check to pass?

@codecov-commenter
Copy link

Codecov Report

Merging #848 (917a091) into master (bd9d561) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #848      +/-   ##
==========================================
- Coverage   82.65%   82.63%   -0.03%     
==========================================
  Files         168      168              
  Lines       48081    48077       -4     
==========================================
- Hits        39742    39727      -15     
- Misses       8339     8350      +11     
Impacted Files Coverage Δ
arrow/src/ipc/convert.rs 92.87% <100.00%> (-0.07%) ⬇️
arrow/src/ipc/gen/Schema.rs 41.52% <0.00%> (-1.03%) ⬇️
parquet/src/encodings/encoding.rs 94.67% <0.00%> (+0.19%) ⬆️
arrow/src/array/equal_json.rs 85.21% <0.00%> (+0.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd9d561...917a091. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement to me. Thank you @helgikrs

If I don't hear anything from other reviewers, I plan to merge this in tomorrow (and include it in the 6.1.0 release, slated for end of this week, early next week)

@@ -610,21 +610,7 @@ pub(crate) fn get_fb_field_type<'a>(
// struct's fields are children
let mut children = vec![];
for field in fields {
let inner_types =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case anyone else is curious, without this code change the test in this PR fails like this:

failures:

---- ipc::convert::tests::convert_schema_round_trip stdout ----
thread 'ipc::convert::tests::convert_schema_round_trip' panicked at 'assertion failed: `(left == right)`
  left: `Schema { fields: [Field { name: "uint8", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: Some({"k": "v"}) }, Field { name: "uint16", data_type: UInt16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint32", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint64", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int8", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int16", data_type: Int16, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int64", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float16", data_type: Float16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float32", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float64", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "null", data_type: Null, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date32", data_type: Date32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date64", data_type: Date64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[s]", data_type: Time32(Second), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[ms]", data_type: Time32(Millisecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[us]", data_type: Time64(Microsecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[ns]", data_type: Time64(Nanosecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[s]", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ms]", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[us]", data_type: Timestamp(Microsecond, Some("Africa/Johannesburg")), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ns]", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[ym]", data_type: Interval(YearMonth), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[dt]", data_type: Interval(DayTime), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "utf8", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "binary", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[u8]", data_type: List(Field { name: "item", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<float32, int32, bool>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "float32", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<dictionary<int32, utf8>>", data_type: Struct([Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<int64, list[struct<date32, list[struct<>]>]>", data_type: Struct([Field { name: "int64", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<date32, list[struct<>]>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "date32", data_type: Date32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<>]", data_type: List(Field { name: "struct", data_type: Struct([]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<>", data_type: Struct([]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "dictionary<uint8, uint32>", data_type: Dictionary(UInt8, UInt32), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "decimal<usize, usize>", data_type: Decimal(10, 6), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {"Key": "value"} }`,
 right: `Schema { fields: [Field { name: "uint8", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: Some({"k": "v"}) }, Field { name: "uint16", data_type: UInt16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint32", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "uint64", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int8", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int16", data_type: Int16, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int64", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float16", data_type: Float16, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float32", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "float64", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "null", data_type: Null, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date32", data_type: Date32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date64", data_type: Date64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[s]", data_type: Time32(Second), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time32[ms]", data_type: Time32(Millisecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[us]", data_type: Time64(Microsecond), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "time64[ns]", data_type: Time64(Nanosecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[s]", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ms]", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[us]", data_type: Timestamp(Microsecond, Some("Africa/Johannesburg")), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "timestamp[ns]", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[ym]", data_type: Interval(YearMonth), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "interval[dt]", data_type: Interval(DayTime), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "utf8", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "binary", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[u8]", data_type: List(Field { name: "item", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<float32, int32, bool>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "float32", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "bool", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<dictionary<int32, utf8>>", data_type: Struct([Field { name: "dictionary<int32, utf8>", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<int64, list[struct<date32, list[struct<>]>]>", data_type: Struct([Field { name: "int64", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<date32, list[struct<>]>]", data_type: List(Field { name: "struct", data_type: Struct([Field { name: "date32", data_type: Date32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "list[struct<>]", data_type: List(Field { name: "struct", data_type: Struct([]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "struct<>", data_type: Struct([]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "dictionary<int32, utf8>", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "dictionary<uint8, uint32>", data_type: Dictionary(UInt8, UInt32), nullable: true, dict_id: 123, dict_is_ordered: true, metadata: None }, Field { name: "decimal<usize, usize>", data_type: Decimal(10, 6), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {"Key": "value"} }`', arrow/src/ipc/convert.rs:864:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14
   2: core::panicking::assert_failed_inner
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:183:17
   3: core::panicking::assert_failed
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:140:5
   4: arrow::ipc::convert::tests::convert_schema_round_trip
             at ./src/ipc/convert.rs:864:9
   5: arrow::ipc::convert::tests::convert_schema_round_trip::{{closure}}
             at ./src/ipc/convert.rs:707:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


@alamb alamb merged commit e44f1ad into apache:master Oct 26, 2021
alamb pushed a commit that referenced this pull request Oct 26, 2021
* fix(ipc): Support serializing structs containing dictionaries

Dictionary fields nested in structs were not properly marked as
dictionary fields when serializing to fb.

* style: cargo fmt
alamb added a commit that referenced this pull request Oct 27, 2021
…865)

* fix(ipc): Support serializing structs containing dictionaries

Dictionary fields nested in structs were not properly marked as
dictionary fields when serializing to fb.

* style: cargo fmt

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structs containing nested dictionaries don't serialize correctly to FB in IPC
3 participants