Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14381: [CI][Python] Fix Spark integration failures #11465

Closed
wants to merge 5 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Oct 19, 2021

I don't have a small reproducer, but either a pandas series or a dataframe gets passed as mask to pa.array()

@github-actions
Copy link

@kszucs
Copy link
Member Author

kszucs commented Oct 19, 2021

@github-actions crossbow submit -spark-

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@kszucs
Copy link
Member Author

kszucs commented Oct 19, 2021

@amol- I don't have a concise reproducer, but this has resolved the spark tests for me locally.

@github-actions
Copy link

Revision: bbd6804

Submitted crossbow builds: ursacomputing/crossbow @ actions-1009

Task Status
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.8-spark-master Github Actions

@kszucs
Copy link
Member Author

kszucs commented Oct 19, 2021

@github-actions crossbow submit -spark-

@github-actions
Copy link

Revision: 2e9d790

Submitted crossbow builds: ursacomputing/crossbow @ actions-1010

Task Status
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.8-spark-master Github Actions

@kszucs
Copy link
Member Author

kszucs commented Oct 19, 2021

@github-actions crossbow submit -spark-

@github-actions
Copy link

Revision: 06fb996

Submitted crossbow builds: ursacomputing/crossbow @ actions-1012

Task Status
test-conda-python-3.6-spark-v3.0.3 Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

"when converting numpy arrays")
if mask is not None:
if _is_array_like(mask):
mask = get_values(mask, &is_pandas_object)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spark integrations tests verify that this resolves the python error, but we should cover this with unittests.
Deferred to https://issues.apache.org/jira/browse/ARROW-14388

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a Series or a pandas extension array here should trigger it (I suppose that's what @amol- is testing)?

@kszucs
Copy link
Member Author

kszucs commented Oct 19, 2021

@BryanCutler updated the spark tasks to build against specific spark releases to maintain compatibility.

@kszucs kszucs changed the title ARROW-14381: [CI] Spark integration failures ARROW-14381: [CI][Python] Spark integration failures Oct 19, 2021
Copy link
Member Author

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kszucs kszucs changed the title ARROW-14381: [CI][Python] Spark integration failures ARROW-14381: [CI][Python] Fix Spark integration failures Oct 19, 2021
@kszucs kszucs closed this in bc223c6 Oct 19, 2021
@ursabot
Copy link

ursabot commented Oct 19, 2021

Benchmark runs are scheduled for baseline = 0960fa6 and contender = bc223c6. bc223c6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.58% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @kszucs !

@kiszk
Copy link
Member

kiszk commented Oct 20, 2021

Thanks @kszucs

ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Nov 3, 2021
I don't have a small reproducer, but either a pandas series or a dataframe gets passed as mask to `pa.array()`

Closes apache#11465 from kszucs/ARROW-14381

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants