Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33636][PYTHON][ML][3.0] Add labelsArray to PySpark StringIndexer #30580

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 3, 2020

What changes were proposed in this pull request?

This is a followup to add missing labelsArray to PySpark StringIndexer.

Why are the changes needed?

labelsArray is for multi-column case for StringIndexer. We should provide this accessor at PySpark side too.

Does this PR introduce any user-facing change?

Yes, labelsArray was missing in PySpark StringIndexer in Spark 3.0.

How was this patch tested?

Unit test.

@viirya
Copy link
Member Author

viirya commented Dec 3, 2020

cc @srowen

@viirya
Copy link
Member Author

viirya commented Dec 3, 2020

cc @HyukjinKwon too.

@HyukjinKwon
Copy link
Member

LGTM but I will leave it to @srowen.

@SparkQA
Copy link

SparkQA commented Dec 3, 2020

Test build #132080 has finished for PR 30580 at commit f5faca4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36680/

@SparkQA
Copy link

SparkQA commented Dec 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36680/

@viirya
Copy link
Member Author

viirya commented Dec 3, 2020

Thanks. Then I will merge this to branch-3.0.

viirya added a commit that referenced this pull request Dec 3, 2020
### What changes were proposed in this pull request?

This is a followup to add missing `labelsArray` to PySpark `StringIndexer`.

### Why are the changes needed?

`labelsArray` is for multi-column case for `StringIndexer`. We should provide this accessor at PySpark side too.

### Does this PR introduce _any_ user-facing change?

Yes, `labelsArray` was missing in PySpark `StringIndexer` in Spark 3.0.

### How was this patch tested?

Unit test.

Closes #30580 from viirya/SPARK-33636.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
@viirya viirya closed this Dec 3, 2020
@viirya viirya deleted the SPARK-33636 branch December 27, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants