Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add unittests for converter arrays with pandas masks #29956

Closed
asfimport opened this issue Oct 19, 2021 · 3 comments
Closed

[Python] Add unittests for converter arrays with pandas masks #29956

asfimport opened this issue Oct 19, 2021 · 3 comments

Comments

@asfimport
Copy link

Cover the changes in #11465

cc @amol-

Reporter: Krisztian Szucs / @kszucs
Assignee: Alessandro Molina / @amol-

PRs and other links:

Note: This issue was originally created as ARROW-14388. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Alessandro Molina / @amol-:
@jorisvandenbossche @kszucs  I was able to reproduce the segfault using the provided test. I confirmed the test reproduces the issue by reverting the changes in #11465 and triggering the segfault.

I also added an additional check, that I verified does prevent the segfault replacing it with a proper Invalid("Invalid mask type") error to catch future regressions.

I couldn't find a way to trigger that error with current codebase on master, so it ends up being uncovered by a test. That's because with current codebase anything that is not a numpy.array gets converted to it so there is no way to end up into that situation normally. Ideally it's a kind of issue I would simulate by monkeypatching, but given that everything runs within Cython I can't monkeypatch get_values anyway the check is there and should prevent us from reintroducing the same issue in the future.

@asfimport
Copy link
Author

Alessandro Molina / @amol-:
PS: If you are wondering about the pandas.Series values, those are exactly took from the pyspark test that triggered the segfault for me

test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... 
ARRAY 0    1
Name: a, dtype: int64 TYPE <class 'pandas.core.series.Series'>
---
MASK 0    False
Name: a, dtype: bool TYPE <class 'pandas.core.series.Series'>


Had test failures in pyspark.sql.tests.test_arrow ArrowTests with /Users/amol/ARROW/venv/bin/python3; see logs. 

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 11481
#11481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants