[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866

jkbradley · 2016-03-21T20:54:06Z

What changes were proposed in this pull request?

Adds support for saving and loading nested ML Pipelines from Python. Pipeline and PipelineModel do not extend JavaWrapper, but they are able to utilize the JavaMLWriter, JavaMLReader implementations.

Also:

Separates out interfaces from Java wrapper implementations for MLWritable, MLReadable, MLWriter, MLReader.
Moves methods _stages_java2py, _stages_py2java into Pipeline, PipelineModel as _transfer_stage_from_java, _transfer_stage_to_java

How was this patch tested?

Added new unit test for nested Pipelines. Abstracted validity check into a helper method for the 2 unit tests.

SparkQA · 2016-03-21T21:10:27Z

Test build #53711 has finished for PR 11866 at commit 1c070e7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,
- class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, JavaMLReadable,
- class PolynomialExpansion(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,
- class JavaMLWritable(MLWritable):
- class JavaMLReader(MLReader):

SparkQA · 2016-03-21T21:26:49Z

Test build #53712 has finished for PR 11866 at commit 6d74b97.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2016-03-21T22:56:42Z

python/pyspark/ml/pipeline.py

-    def load(cls, path):
-        """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
-        return cls.read().load(path)
+    def _transfer_stage_from_java(cls, java_stage):


It is a little confusing to call it java_stage while the expected input is a pipeline. Shall we rename it to _from_java_pipeline and _transfer_stage_to_java to _to_java_pipeline? If we have to use the same method name, I would recommend _from_java and _to_java instead.

I'd like to use the same method name, so I'll switch to _from/to_java.

mengxr · 2016-03-21T22:57:41Z

Not part of this PR, in PySpark DataFrame.write is a property instead of a method. Shall we follow the same convention?

jkbradley · 2016-03-22T17:06:54Z

Making MLWritable.write a property sounds good. I'll create a JIRA for it.

SparkQA · 2016-03-22T17:20:24Z

Test build #53785 has finished for PR 11866 at commit be879e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2016-03-22T19:12:29Z

LGTM. Merged into master. Thanks!

zhengruifeng · 2019-09-24T08:03:00Z

python/pyspark/ml/util.py

+@inherit_doc
+class JavaMLWriter(MLWriter):
+    """
+    (Private) Specialization of :py:class:`MLWriter` for :py:class:`JavaWrapper` types


@jkbradley @mengxr Excuse me, why we need to add a Private annotation here and in other places?

jkbradley added 2 commits March 21, 2016 13:39

Added nested Pipeline persistence

89d5382

cleanups, and style fixes

1c070e7

jkbradley mentioned this pull request Mar 21, 2016

[SPARK-13951] Add nested Pipeline load/save supports in PySpark #11835

Closed

removed println

6d74b97

mengxr reviewed Mar 21, 2016
View reviewed changes

Renamed _transfer_stage_to/from_java to _to/from_java

be879e0

asfgit closed this in 7e3423b Mar 22, 2016

jkbradley deleted the nested-pipeline-io branch March 22, 2016 22:20

zhengruifeng reviewed Sep 24, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866

[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866

Uh oh!

jkbradley commented Mar 21, 2016

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

mengxr Mar 21, 2016

Uh oh!

jkbradley Mar 22, 2016

Uh oh!

mengxr commented Mar 21, 2016

Uh oh!

jkbradley commented Mar 22, 2016

Uh oh!

SparkQA commented Mar 22, 2016

Uh oh!

mengxr commented Mar 22, 2016

Uh oh!

zhengruifeng Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866

[SPARK-13951][ML][PYTHON] Nested Pipeline persistence #11866

Uh oh!

Conversation

jkbradley commented Mar 21, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

mengxr Mar 21, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Mar 22, 2016

Choose a reason for hiding this comment

Uh oh!

mengxr commented Mar 21, 2016

Uh oh!

jkbradley commented Mar 22, 2016

Uh oh!

SparkQA commented Mar 22, 2016

Uh oh!

mengxr commented Mar 22, 2016

Uh oh!

zhengruifeng Sep 24, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants