Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect nullable flag when reading maps ( test_read_maps fails when force_validate is active) #1587

Closed
jhorstmann opened this issue Apr 18, 2022 · 1 comment · Fixed by #1592
Labels
bug parquet Changes to the parquet crate

Comments

@jhorstmann
Copy link
Contributor

Describe the bug

The validation fails when comparing the nullable flag of a MapArray data field.

---- arrow::arrow_reader::tests::test_read_maps stdout ----
thread 'arrow::arrow_reader::tests::test_read_maps' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Child type mismatch for Struct([Field { name: \"key\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"value\", data_type: Map(Field { name: \"key_value\", data_type: Struct([Field { name: \"key\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"value\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, false), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]). Expected Map(Field { name: \"key_value\", data_type: Struct([Field { name: \"key\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"value\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, false) but child data had Map(Field { name: \"key_value\", data_type: Struct([Field { name: \"key\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"value\", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, false)")', arrow/src/array/data.rs:308:34
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/result.rs:1749:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/result.rs:1065:23
   4: arrow::array::data::ArrayData::new_unchecked
             at /home/jhorstmann/Source/github/apache/arrow-rs/arrow/src/array/data.rs:308:9
   5: arrow::array::data::ArrayDataBuilder::build_unchecked
             at /home/jhorstmann/Source/github/apache/arrow-rs/arrow/src/array/data.rs:1446:9
   6: <parquet::arrow::array_reader::map_array::MapArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch
             at ./src/arrow/array_reader/map_array.rs:106:35
   7: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch::{{closure}}
             at ./src/arrow/array_reader.rs:719:27
   8: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/iter/adapters/map.rs:91:28
   9: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/iter/traits/iterator.rs:2109:21
  10: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/iter/adapters/map.rs:117:9
  11: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch
             at ./src/arrow/array_reader.rs:716:30
  12: <parquet::arrow::arrow_reader::ParquetRecordBatchReader as core::iter::traits::iterator::Iterator>::next
             at ./src/arrow/arrow_reader.rs:229:15
  13: parquet::arrow::arrow_reader::tests::test_read_maps
             at ./src/arrow/arrow_reader.rs:1066:22
  14: parquet::arrow::arrow_reader::tests::test_read_maps::{{closure}}
             at ./src/arrow/arrow_reader.rs:1056:5
  15: core::ops::function::FnOnce::call_once
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/ops/function.rs:227:5
  16: core::ops::function::FnOnce::call_once
             at /rustc/4ce3749235fc31d15ebd444b038a9877e8c700d7/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

To Reproduce

$ RUST_BACKTRACE=1 cargo test --features force_validate -- test_read_map

Additional context
I haven't looked deeper into this. First idea was that maybe the validation should use DataType::equals_datatype for comparison, but that also compares the nullable flag.

@viirya
Copy link
Member

viirya commented Apr 20, 2022

The inner map has data type in which the key_value field is nullable as false according to schema:

Map(Field { name: "key_value", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "value", data_type: Map(Field { name: "key_value", data_type: Struct([Field { name: "key", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "value", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, false), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, false)

But in visit_map, as it uses Type.is_optional as nullable flag for inner map, it takes optional value of the wrapping field of the map (i.e., the value field in top map) and that is true.

That is why child array data's map type is not the same as the schema in the nullable flag.

@alamb alamb changed the title test_read_maps fails when force_validate is active Incorrect nullable flag when reading maps ( test_read_maps fails when force_validate is active) Apr 27, 2022
@alamb alamb added parquet Changes to the parquet crate arrow Changes to the arrow crate and removed arrow Changes to the arrow crate labels Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants