Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type_ids in Union datatype #1703

Merged
merged 5 commits into from
May 17, 2022
Merged

Add type_ids in Union datatype #1703

merged 5 commits into from
May 17, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented May 16, 2022

Which issue does this PR close?

Closes #1690.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels May 16, 2022
@viirya viirya added the api-change Changes to the arrow API label May 16, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 16, 2022

Codecov Report

Merging #1703 (46cfbfc) into master (5b154ea) will increase coverage by 0.07%.
The diff coverage is 86.27%.

@@            Coverage Diff             @@
##           master    #1703      +/-   ##
==========================================
+ Coverage   83.25%   83.32%   +0.07%     
==========================================
  Files         195      195              
  Lines       55906    56012     +106     
==========================================
+ Hits        46544    46672     +128     
+ Misses       9362     9340      -22     
Impacted Files Coverage Δ
integration-testing/src/lib.rs 0.00% <0.00%> (ø)
parquet/src/arrow/arrow_writer.rs 97.66% <ø> (ø)
parquet/src/arrow/levels.rs 84.80% <0.00%> (ø)
parquet/src/arrow/schema.rs 96.79% <ø> (ø)
arrow/src/datatypes/datatype.rs 65.79% <75.00%> (+0.34%) ⬆️
arrow/src/datatypes/field.rs 55.87% <80.00%> (-0.45%) ⬇️
arrow/src/util/display.rs 73.17% <80.00%> (ø)
arrow/src/array/array_union.rs 90.92% <84.61%> (+0.29%) ⬆️
arrow/src/array/data.rs 84.50% <85.71%> (+1.20%) ⬆️
arrow/src/ipc/convert.rs 95.56% <95.23%> (-0.06%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b154ea...46cfbfc. Read the comment docs.

@viirya viirya force-pushed the union_typeids branch 2 times, most recently from f2dfab6 to 29d0d16 Compare May 16, 2022 06:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya

Would it be correct to say that this PR also "Adds IPC support for UnionArray with non increasing integer type ids" ?

@@ -115,7 +115,7 @@ pub enum DataType {
/// A nested datatype that contains a number of sub-fields.
Struct(Vec<Field>),
/// A nested datatype that can represent slots of differing types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A nested datatype that can represent slots of differing types.
/// A nested datatype that can represent slots of differing types. Components:
///
/// 1. [`Field`] for each possible child type the Union can hold
/// 2. The corresponding `type_id` used to identify which Field
/// 3. The type of union (Sparse or Dense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated as suggested.

@@ -516,24 +516,15 @@ impl DataType {
.as_array()
.unwrap()
.iter()
.map(|t| t.as_i64().unwrap())
.map(|t| t.as_i64().unwrap() as i8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could call t.as_i8()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no as_i8() API for Value. 😢

@@ -632,39 +632,13 @@ fn array_from_json(
let array = MapArray::from(array_data);
Ok(Arc::new(array))
}
DataType::Union(fields, _) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This certainly makes it easier to understand

@@ -435,19 +435,10 @@ mod tests {
"my_union",
DataType::Union(
vec![
Field::new("f1", DataType::Int32, true).with_metadata(Some(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was the metadata removed from this test? Is it simply no longer necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it is used to store type id. Now it is stored directly in union type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have no place to keep the type ids from Json, I put them in metadata in previously. Now we can keep them in union type formally.

@viirya
Copy link
Member Author

viirya commented May 16, 2022

Thanks @viirya

Would it be correct to say that this PR also "Adds IPC support for UnionArray with non increasing integer type ids" ?

Thanks @alamb for reviewing this. Yeah, that's correct. Previously we don't have type ids in Union IPC message and we assume it is always the default case.

@alamb alamb merged commit 20ffaf8 into master May 17, 2022
@viirya viirya deleted the union_typeids branch May 17, 2022 20:50
@alamb alamb changed the title Add type ids in Union datatype Add type_ids in Union datatype May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep type ids in Union datatype to follow Arrow spec and integrate with other implementations
3 participants