-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #53711 has finished for PR 11866 at commit
|
|
Test build #53712 has finished for PR 11866 at commit
|
python/pyspark/ml/pipeline.py
Outdated
| def load(cls, path): | ||
| """Reads an ML instance from the input path, a shortcut of `read().load(path)`.""" | ||
| return cls.read().load(path) | ||
| def _transfer_stage_from_java(cls, java_stage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little confusing to call it java_stage while the expected input is a pipeline. Shall we rename it to _from_java_pipeline and _transfer_stage_to_java to _to_java_pipeline? If we have to use the same method name, I would recommend _from_java and _to_java instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use the same method name, so I'll switch to _from/to_java.
|
Not part of this PR, in PySpark |
|
Making MLWritable.write a property sounds good. I'll create a JIRA for it. |
|
Test build #53785 has finished for PR 11866 at commit
|
|
LGTM. Merged into master. Thanks! |
| @inherit_doc | ||
| class JavaMLWriter(MLWriter): | ||
| """ | ||
| (Private) Specialization of :py:class:`MLWriter` for :py:class:`JavaWrapper` types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley @mengxr Excuse me, why we need to add a Private annotation here and in other places?
What changes were proposed in this pull request?
Adds support for saving and loading nested ML Pipelines from Python. Pipeline and PipelineModel do not extend JavaWrapper, but they are able to utilize the JavaMLWriter, JavaMLReader implementations.
Also:
How was this patch tested?
Added new unit test for nested Pipelines. Abstracted validity check into a helper method for the 2 unit tests.