Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude dict_id and dict_is_ordered from equality comparison of Field #1646

Closed
viirya opened this issue May 4, 2022 · 0 comments · Fixed by #1647
Closed

Exclude dict_id and dict_is_ordered from equality comparison of Field #1646

viirya opened this issue May 4, 2022 · 0 comments · Fixed by #1647
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@viirya
Copy link
Member

viirya commented May 4, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This issue is discovered when I'm debugging generate_nested_dictionary_case integration failure on C++/Rust integration cases by running archery --debug integration --with-cpp=true --with-rust=true:

Error: ComputeError("Schemas do not match. JSON: Schema { fields: [Field { name: \"list_dict\", data_type: Dictionary(Int8, List(Field { name: \"str_dict\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })), nullable: true, dict_id: 1, dict_is_ordered: false, metadata: None }, Field { name: \"struct_dict\", data_type: Dictionary(Int8, Struct([Field { name: \"str_dict_a\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"str_dict_b\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }])), nullable: true, dict_id: 2, dict_is_ordered: false, metadata: None }], metadata: {} }. Arrow: Schema { fields: [Field { name: \"list_dict\", data_type: Dictionary(Int8, List(Field { name: \"str_dict\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 1, dict_is_ordered: false, metadata: None })), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"struct_dict\", data_type: Dictionary(Int8, Struct([Field { name: \"str_dict_a\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 3, dict_is_ordered: false, metadata: None }, Field { name: \"str_dict_b\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 4, dict_is_ordered: false, metadata: None }])), nullable: true, dict_id: 2, dict_is_ordered: false, metadata: None }], metadata: {} }")

The arrow file from C++ has a schema which has only difference dict_id than the schema read from Json file at Rust side. By excluding dict_id from equality comparison of Field, the schema and record batches are exactly the same.

Based on C++ implementation of Field (API doc), it doesn't contain dictionary related properties like dict_id, so its equality comparison doesn't compare it.

In Arrow spec, I don't see id is specified in dictionary encoded layout, but only mentioned in dictionary message in IPC. It is used basically to know where a dictionary is used in the schema. So it seems to me, the dict_id isn't necessary to be used in equality comparison of Field. It's only required to be consistent across dictionary encoded data and schema (so we can match correct dictionary to correct field). That's said, it looks like to be weird to include dict_id and dict_is_ordered in equality comparison of Field out of IPC messages.

Describe the solution you'd like
Exclude dict_id and dict_is_ordered from equality comparison of Field.

archery --debug integration --with-cpp=true --with-rust=true should pass with 0 failure.

Describe alternatives you've considered

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants