-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24513][ML] Attribute support in UnaryTransformer #21525
Conversation
This comment has been minimized.
This comment has been minimized.
If needed, I can propose a draft version of SPARK-13998 implemented on top of this work. |
@jkbradley Excuse me. Could you have a look when you are free? Thanks. |
retest this please |
This comment has been minimized.
This comment has been minimized.
18b5b25
to
cfeae4d
Compare
Test build #95732 has finished for PR 21525 at commit
|
val transformUDF = udf(this.createTransformFunc, outputDataType) | ||
dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol)))) | ||
val metadata = outputMetadata(outputSchema, dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashingTF
is an example that the metadata is created in transformSchema
and attached to outputSchema
. So my question is, do we need an extra API outputMetadata
to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Here is the answer: because the ultimate goal is to make HashingTF
to extend UnaryTransformer
, not just attaching attribute. Yes, you are right, HashingTF
is an example of how metadata is created and attached to outputSchema
. However, we need a method to wrap that metadata routine to replace HashingTF extends Transformer with HasInputCol with HasOutputCol
into HashingTF extends UnaryTransformer
. It's why. (Please refer Joseph K. Bradley's comment at SPARK-13998)
cfeae4d
to
09cbbb1
Compare
@dongjoon-hyun I just rebased the PR against the latest master. Could you have a look when you are free? And, could you assign the JIRA ticket to me? |
Test build #107807 has finished for PR 21525 at commit
|
Sorry for being late, @dongjinleekr . In fact, this is not good at ML module. |
Can one of the admins verify this patch? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR adds Metadata support in
UnaryTransformer
, as a preliminary work of SPARK-13998 and SPARK-13964.How was this patch tested?
unit test:
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ml.feature.* test