[Python] Add unittests for converter arrays with pandas masks #29956

asfimport · 2021-10-19T22:39:35Z

Cover the changes in #11465

Reporter: Krisztian Szucs / @kszucs
Assignee: Alessandro Molina / @amol-

PRs and other links:

GitHub Pull Request #11481

_{Note: This issue was originally created as ARROW-14388. Please see the migration documentation for further details.}

asfimport · 2021-10-20T13:40:37Z

Alessandro Molina / @amol-:
@jorisvandenbossche @kszucs I was able to reproduce the segfault using the provided test. I confirmed the test reproduces the issue by reverting the changes in #11465 and triggering the segfault.

I also added an additional check, that I verified does prevent the segfault replacing it with a proper Invalid("Invalid mask type") error to catch future regressions.

I couldn't find a way to trigger that error with current codebase on master, so it ends up being uncovered by a test. That's because with current codebase anything that is not a numpy.array gets converted to it so there is no way to end up into that situation normally. Ideally it's a kind of issue I would simulate by monkeypatching, but given that everything runs within Cython I can't monkeypatch get_values anyway the check is there and should prevent us from reintroducing the same issue in the future.

asfimport · 2021-10-20T13:43:05Z

Alessandro Molina / @amol-:
PS: If you are wondering about the pandas.Series values, those are exactly took from the pyspark test that triggered the segfault for me

test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... 
ARRAY 0    1
Name: a, dtype: int64 TYPE <class 'pandas.core.series.Series'>
---
MASK 0    False
Name: a, dtype: bool TYPE <class 'pandas.core.series.Series'>


Had test failures in pyspark.sql.tests.test_arrow ArrowTests with /Users/amol/ARROW/venv/bin/python3; see logs.

asfimport · 2021-11-03T17:13:52Z

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 11481
#11481

asfimport closed this as completed Nov 3, 2021

asfimport assigned amol- Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Add unittests for converter arrays with pandas masks #29956

[Python] Add unittests for converter arrays with pandas masks #29956

asfimport commented Oct 19, 2021

asfimport commented Oct 20, 2021

asfimport commented Oct 20, 2021

asfimport commented Nov 3, 2021

[Python] Add unittests for converter arrays with pandas masks #29956

[Python] Add unittests for converter arrays with pandas masks #29956

Comments

asfimport commented Oct 19, 2021

PRs and other links:

asfimport commented Oct 20, 2021

asfimport commented Oct 20, 2021

asfimport commented Nov 3, 2021