-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Handling "logical" nulls: add GetLogicalNullCount / update IsNull() to check for logical null #34361
Labels
Milestone
Comments
jorisvandenbossche
added
Component: C++
Breaking Change
Includes a breaking change to the API
labels
Feb 27, 2023
take |
Might make sense to do the same for Go too, to keep parity in features. |
zeroshade
pushed a commit
that referenced
this issue
Apr 6, 2023
…itmaps like Unions and Run-End Encoded (#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to #34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken #34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: #34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
3 tasks
liujiacheng777
pushed a commit
to LoongArch-Python/arrow
that referenced
this issue
May 11, 2023
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
ArgusLi
pushed a commit
to Bit-Quill/arrow
that referenced
this issue
May 15, 2023
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
rtpsw
pushed a commit
to rtpsw/arrow
that referenced
this issue
May 16, 2023
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
There are some data types where the nulls are not stored "physically" using a validity bitmap on the parent ArrayData, but through nulls in child data:
(sidenote: Dictionary arrays could be considered here as well, but are a bit a mixed bag: there is a top-level null count through nulls in the indices, but additionally also the dictionary can contain nulls. So nulls can be encoded in two different ways)
The format specification has a "null_count" (in the IPC
FieldNode
in the Recordbatch message, and in the C Data Interface), and in those cases this refers to the "physical" null count. And this is followed by the C++ implementation, where the baseArray::null_count()
(implemented byArrayData::GetNullCount()
) looks at the validity buffer (typically the first buffer) to count the the unset bits, or directly return 0 if there is no validity buffer.However, in practice you often want to know if there are actual "logical" nulls (not considering those leads to bugs, for example #34315).
@felipecrv @westonpace and I had some discussion about this on zulip (https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Null.20count.20of.20a.20UnionArray/near/336538844), and I think our current idea would be:
Add a
GetLogicalNullCount
to complement the existingArray::null_count()
/ArrayData::GetNullCount()
(changingnull_count()
itself might be too much of a breaking change? And would also create an inconsistency with where this is used in the specs)Change
Array::IsNull(i)
to consider logical nulls instead of just physical nullsThe text was updated successfully, but these errors were encountered: