Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sequences (tuples and lists) as pivot values argument in PySpark. #33083

Closed
wants to merge 1 commit into from

Conversation

wrobell
Copy link

@wrobell wrobell commented Jun 25, 2021

Both tuples and lists are accepted by PySpark on runtime.

Both tuples and lists are accepted by PySpark on runtime.
@holdenk
Copy link
Contributor

holdenk commented Jun 25, 2021

Jenkins ok to test

@holdenk
Copy link
Contributor

holdenk commented Jun 25, 2021

This looks reasonable to me, I'm not very familiar with the typing code for Python yet so cc @zero323

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140338 has finished for PR 33083 at commit 8187108.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44869/

@zero323
Copy link
Member

zero323 commented Jun 25, 2021

Technically speaking, Sequence is more than Tuple or List (I mention that, because we have quite a few cases where we explicitly restrict inputs to these two), but Py4J should be able map any Sequence in the same way (I've done some rough testing on custom Sequence implementation to be sure, and it seems to work fine).

Copy link
Member

@zero323 zero323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, subject to passing tests.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

@wrobell, can you file a JIRA (see https://spark.apache.org/contributing.html), and enable GitHub Actions in your fork repo (see https://github.com/apache/spark/pull/33083/checks?check_run_id=2913538608)?

Also please keep the GIthub PR template (https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE) and format PR title properly.

@HyukjinKwon
Copy link
Member

Otherwise, looks fine to me too. I'll leave it to him.

@zero323
Copy link
Member

zero323 commented Jul 8, 2021

Otherwise, looks fine to me too. I'll leave it to him.

Sure, I'll handle this once pending comments are addressed.

@zero323
Copy link
Member

zero323 commented Aug 4, 2021

Gentle ping @wrobell

@wrobell
Copy link
Author

wrobell commented Aug 13, 2021

Sorry, but due to personal circumstances I will not be able to help with this for next couple of weeks.

@dchvn
Copy link
Contributor

dchvn commented Oct 26, 2021

any update here? python/pyspark/sql/group.pyi has been removed by #34197, so can I create a JIRA ticket and a PR for this issue? @HyukjinKwon @zero323 @wrobell
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants