-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pa.compute.is_null()
returns incorrect answer for dense union arrays and segfaults for dense union scalars
#34315
Comments
pa.compute.is_null()
returns incorrect answer for dense union arrays and segfaults for dense union scalarspa.compute.is_null()
returns incorrect answer for dense union arrays and segfaults for dense union scalars
@dannygoldstein thanks for the report! That's indeed a bug. The problem is that for a union array, there is no top-level validity, but this is defined by the validity bitmaps of its child arrays. But the The problem is also that the
|
thanks for the quick response @jorisvandenbossche! and thanks also for all the great work on arrow. it is an awesome package :) |
Looking at the kernel it seems both problems are there. It does indeed shortcut based on |
…itmaps like Unions and Run-End Encoded (#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to #34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken #34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: #34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…ded logical nulls
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…hout bitmaps like Unions and Run-End Encoded (apache#34408) Bonus: add `ArrayData::IsValid()` to make it consistent with `Array` and `ArraySpan`. ### Rationale for this change This is the proposed fix to apache#34361 plus the addition of more APIs dealing with validity/nullity. ### What changes are included in this PR? This PR changes the behavior of `IsNull` and `IsValid` in `Array`, `ArrayData`, and `ArraySpan`. It preserves the behavior of `MayHaveNulls`, `GetNullCount` and introduces new member functions to `Array`, `ArrayData`, and `ArraySpan`: - `bool HasValidityBitmap() const` - `bool MayHaveLogicalNulls() const` - `int64_t ComputeLogicalNullCount() const` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. See above. Breakage with these changes can only happen if users rely on `IsNull(i)` always returning `true` for union types, but we have users reporting that the current behavior or broken apache#34315. This is why the behavior of `IsNull` and `IsValid` is changing., **This PR contains a "Critical Fix".** * Closes: apache#34361 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…ded logical nulls
Describe the bug, including details regarding any error messages, version, and platform.
In pyarrow version 11.0.0 and 10.0.1, if I create a dense array with some null elements,
pa.compute.is_null()
returns that they are not null. Repro:Illustration of the first issue:
Illustration of the second issue:
I do
pa.compute.is_null()
on a null element of the array, i get a segfault:Component(s)
Python
The text was updated successfully, but these errors were encountered: