-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-27249][ML] Developers API for Transformers beyond UnaryTransformer #28492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
This is what VectorTransformer is for in general. What's the use case in Spark? |
Please review the thread in SPARK-27249. A way to perform more complex transforms was requested that could act on multiple columns in a |
|
Yes I saw it, but is it needed by anything in Spark? |
|
I'm not in a position to comment further, but I'll ask the requestor chime in. I did post twice in spark-dev inquiring about the need and whether or not existing API's covered this (nobody replied). Pinged the requester in the Jira ticket. I'm not aware of how the custom user |
|
@srowen Please review Everett Rush's comment in the ticket today and advise. |
|
All of this seems not-crazy but not necessary in Spark, and I do not see there's much general use case for it. Users can implement custom Transformers exactly as they need. |
|
That makes sense. I did inquire on the dev list before starting to work on this Jira ticket (nobody replied). Maybe it would be helpful if superfluous Jira tickedts could be periodically closed. |
|
They are, very asynchronously, as are PRs. That's good, but there is also push back on prematurely closing tickets - hey what do you mean you don't like my idea, why won't you let someone work on it, etc. I favor earlier closing as it's at least decisive if not guaranteed to be correct, but at least its reversible. |
What changes were proposed in this pull request?
Adding class
MultiTransformerto allow encapsulation of a sequence of transforms.Why are the changes needed?
Requested in SPARK-2749 as the current API is limiting.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By running the new example