Skip to content

Conversation

@nafshartous
Copy link

@nafshartous nafshartous commented May 10, 2020

What changes were proposed in this pull request?

Adding class MultiTransformer to allow encapsulation of a sequence of transforms.

Why are the changes needed?

Requested in SPARK-2749 as the current API is limiting.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the new example

bin/run-example ml.MultiTransformerExample

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title [SPARK-27249] Developers API for Transformers beyond UnaryTransformer [SPARK-27249][ML] Developers API for Transformers beyond UnaryTransformer May 11, 2020
@srowen
Copy link
Member

srowen commented May 11, 2020

This is what VectorTransformer is for in general. What's the use case in Spark?

@nafshartous
Copy link
Author

This is what VectorTransformer is for in general. What's the use case in Spark?

Please review the thread in SPARK-27249. A way to perform more complex transforms was requested that could act on multiple columns in a DataSet. VectorTransformer doesn't seem to fill this need as its a trait without implementation of chaining together the transforms of DataSets.

@srowen
Copy link
Member

srowen commented May 11, 2020

Yes I saw it, but is it needed by anything in Spark?
Compared to just writing this in your own code, as we're already talking about supporting a custom user Transformer.

@nafshartous
Copy link
Author

nafshartous commented May 11, 2020

I'm not in a position to comment further, but I'll ask the requestor chime in. I did post twice in spark-dev inquiring about the need and whether or not existing API's covered this (nobody replied).

Pinged the requester in the Jira ticket. I'm not aware of how the custom user Transformerwould work. The MultiTransformer seems like a nice companion to UnaryTransformer because it manifests how to compose more complex transformations involving multiple columns.

@nafshartous
Copy link
Author

@srowen Please review Everett Rush's comment in the ticket today and advise.

https://issues.apache.org/jira/browse/SPARK-27249

@srowen
Copy link
Member

srowen commented Aug 16, 2020

All of this seems not-crazy but not necessary in Spark, and I do not see there's much general use case for it. Users can implement custom Transformers exactly as they need.

@nafshartous
Copy link
Author

That makes sense. I did inquire on the dev list before starting to work on this Jira ticket (nobody replied). Maybe it would be helpful if superfluous Jira tickedts could be periodically closed.

@srowen
Copy link
Member

srowen commented Aug 16, 2020

They are, very asynchronously, as are PRs. That's good, but there is also push back on prematurely closing tickets - hey what do you mean you don't like my idea, why won't you let someone work on it, etc. I favor earlier closing as it's at least decisive if not guaranteed to be correct, but at least its reversible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants