Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(schema): allow null fields to be merged with other datatypes #4902

Merged
merged 1 commit into from
Oct 9, 2023

Conversation

kskalski
Copy link
Contributor

@kskalski kskalski commented Oct 9, 2023

Which issue does this PR close?

Closes #4901

Rationale for this change

All fields can be marked nullable, so they could be easily made compatible with a field with DataType::Null

What changes are included in this PR?

Handle DataType::Null in schema field merge

Are there any user-facing changes?

Merging null fields no longer returns error when being merged

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 9, 2023
@@ -494,7 +497,9 @@ impl Field {
| DataType::LargeUtf8
| DataType::Decimal128(_, _)
| DataType::Decimal256(_, _) => {
if self.data_type != from.data_type {
if from.data_type == DataType::Null {
Copy link
Contributor

@tustvold tustvold Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this change as a NullArray is a physically distinct type and is therefore not inherently compatible with any other types. It can be coerced to the same type, via the cast kernels, but it is not inherently compatible which is the notion encoded in this method.

Perhaps https://docs.rs/arrow-cast/latest/arrow_cast/cast/fn.can_cast_types.html can meet your requirements?

@kskalski
Copy link
Contributor Author

kskalski commented Oct 9, 2023

I see, in C++ API this behavior is guarded by options flag: https://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv4N5arrow5Field12MergeOptionsE
(default to true).
So they question is does this PR require adding options to API to allow disabling the feature, or it could be the only behavior until options are added.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting this again, I was conflating this API with Schema::contains and friends. This change makes sense to me, as this method isn't actually returning a notion of interchangeability, in fact it will even re-order struct fields, which is something the cast kernel can't currently handle (#4908) .

@tustvold tustvold merged commit 2af5163 into apache:master Oct 9, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow schema fields to merge with Null datatype
2 participants