Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24513][ML] Attribute support in UnaryTransformer #21525

Closed
wants to merge 1 commit into from

Conversation

dongjinleekr
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds Metadata support in UnaryTransformer, as a preliminary work of SPARK-13998 and SPARK-13964.

How was this patch tested?

unit test: build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ml.feature.* test

@SparkQA

This comment has been minimized.

@dongjinleekr
Copy link
Contributor Author

If needed, I can propose a draft version of SPARK-13998 implemented on top of this work.

@dongjinleekr
Copy link
Contributor Author

dongjinleekr commented Jun 12, 2018

@jkbradley Excuse me. Could you have a look when you are free? Thanks.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95732 has finished for PR 21525 at commit cfeae4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val transformUDF = udf(this.createTransformFunc, outputDataType)
dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol))))
val metadata = outputMetadata(outputSchema, dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashingTF is an example that the metadata is created in transformSchema and attached to outputSchema. So my question is, do we need an extra API outputMetadata to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Here is the answer: because the ultimate goal is to make HashingTF to extend UnaryTransformer, not just attaching attribute. Yes, you are right, HashingTF is an example of how metadata is created and attached to outputSchema. However, we need a method to wrap that metadata routine to replace HashingTF extends Transformer with HasInputCol with HasOutputCol into HashingTF extends UnaryTransformer. It's why. (Please refer Joseph K. Bradley's comment at SPARK-13998)

@dongjinleekr
Copy link
Contributor Author

@dongjoon-hyun I just rebased the PR against the latest master. Could you have a look when you are free? And, could you assign the JIRA ticket to me?

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107807 has finished for PR 21525 at commit 09cbbb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Sorry for being late, @dongjinleekr . In fact, this is not good at ML module.
Hi, @srowen . Could you review this PR?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 10, 2020
@github-actions github-actions bot closed this Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants