[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

yanboliang · 2015-12-24T09:20:34Z

Implement MLWriter/MLWritable/MLReader/MLReadable for PySpark.
Making LinearRegression to support save/load as example. After this merged, the work for other transformers/estimators will be easy, then we can list and distribute the tasks to the community.

cc @mengxr @jkbradley

SparkQA · 2015-12-24T09:47:37Z

Test build #48300 has finished for PR 10469 at commit 19b07b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LinearRegressionModel(JavaModel, MLWritable, TransformerMLReadable):\n * class MLWriter(object):\n * class MLWritable(object):\n * class MLReader(object):\n * class MLReadable(object):\n * java_class = cls._java_loader_class()\n * class TransformerMLReadable(MLReadable):\n * class EstimatorMLReadable(MLReadable):\n

jkbradley · 2016-01-11T23:59:54Z

@yanboliang I'll take a look at this now. Sorry for the delay!

jkbradley · 2016-01-12T02:24:53Z

python/pyspark/ml/regression.py

+    True
+    >>> abs(model.intercept - model2.intercept) < 0.001
+    True
+    >>> model_path = path + "/model"


Use directory "/lr_model"?

jkbradley · 2016-01-12T02:25:52Z

I just added some comments quickly, but let me know if my suggestions are workable. I did not test the suggestions myself.

SparkQA · 2016-01-25T11:18:56Z

Test build #49987 has finished for PR 10469 at commit 106a2b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-26T02:30:53Z

@yanboliang Thanks for the updates; I'll try to make final comments soon. I left one response in one of the threads above.

SparkQA · 2016-01-26T10:57:47Z

Test build #50099 has finished for PR 10469 at commit 3a0d6be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-01-26T15:18:53Z

python/pyspark/ml/wrapper.py

@@ -159,15 +151,16 @@ class JavaModel(Model, JavaTransformer):

    __metaclass__ = ABCMeta

-    def __init__(self, java_model):
+    def __init__(self, java_model=None):


Unify the construction of Model and Estimator. Model can be instantiated without argument which is used by load.

Can you add a note in the doc to explain this?

jkbradley · 2016-01-26T21:33:28Z

Thanks for the updates! Done with a pass. They are mostly minor comments, except for making MLReadable, MLWritable more general and not specific to Java wrappers.

yanboliang · 2016-01-27T08:06:17Z

@jkbradley Thanks for your comments! I have made MLReadable and MLWritable more general and not specific to Java wrappers, addressed all comments except for setting the Param.parent. To that issue, I left the inline comment in the threads above.

SparkQA · 2016-01-27T08:15:54Z

Test build #50182 has finished for PR 10469 at commit 7329124.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class JavaMLWriter(object):
- class JavaMLReader(object):

Wenpei · 2016-01-27T13:36:06Z

HI
I raise a common issues here when I start look pyspark. I found there is only one test.py to test basic RDD and spark submit related api. There is no ml & mllib related unit test currently, I knew we just implement wrapper here, but some place may need ut, for example, parameter process.
Just raise what I thought.

Regards.
Wenpei

jkbradley · 2016-01-27T19:15:14Z

@Wenpei This isn't the right forum for posting comments like that (since hardly anyone will see your comment). I'd recommend identifying missing unit tests and making JIRAs for them. We do have tests.py files under pyspark/ml and pyspark/mllib, so please do check those first before making JIRAs.

jkbradley · 2016-01-28T06:03:33Z

@yanboliang I hope you don't mind, but I took the liberty of experimenting a bit myself and sending this PR: [https://github.com/yanboliang/pull/4] Please let me know what you think!

Btw, thanks for the generalization-related updates. I guess we'd have to go further (providing MLWriter, MLReader abstract classes) if we wanted to allow Python developers to implement persistence from Python, but we can address that in the future (if anyone requests it). Your changes should help us work towards that though.

SparkQA · 2016-01-28T07:56:33Z

Test build #50264 has finished for PR 10469 at commit f07ffcb.

This patch fails Python style tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):\n * class LinearRegressionModel(JavaModel, MLWritable, MLReadable):\n * class JavaMLWriter(object):\n * class MLWritable(object):\n * class JavaMLReader(object):\n * java_class = cls._java_loader_class(clazz)\n * class MLReadable(object):\n

…d fixed current issues

SparkQA · 2016-01-28T08:52:05Z

Test build #50267 has finished for PR 10469 at commit 7334be9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-01-28T09:00:09Z

@jkbradley You PR looks good and get merged, thanks!

jkbradley · 2016-01-29T16:58:51Z

LGTM
Thanks for this PR! This makes it really simple to add persistence.
I'll merge it with master.

jkbradley reviewed Jan 12, 2016
View reviewed changes

yanboliang reviewed Jan 26, 2016
View reviewed changes

yanboliang changed the title ~~[SPARK-11939] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example~~ [SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example Jan 27, 2016

yanboliang added 7 commits January 28, 2016 15:49

PySpark support model export/import and take LinearRegression as example

0cf2566

Address comments

61324d3

Combine Estimator & Transformer MLReadable

63db658

MLWritable should _transfer_params_to_java

0ccb130

update docs

e9ea63d

Make MLReadable general and not specific to Java wrappers

62e31b4

address comments

bbd032f

jkbradley and others added 2 commits January 28, 2016 15:58

added unit test for persistence to check for param UIDs carefully, an…

08b9760

…d fixed current issues

fix typos & docs

7334be9

yanboliang force-pushed the spark-11939 branch from f07ffcb to 7334be9 Compare January 28, 2016 08:26

asfgit closed this in e51b6ea Jan 29, 2016

yanboliang deleted the spark-11939 branch January 30, 2016 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

yanboliang commented Dec 24, 2015

SparkQA commented Dec 24, 2015

jkbradley commented Jan 11, 2016

jkbradley Jan 12, 2016

jkbradley commented Jan 12, 2016

SparkQA commented Jan 25, 2016

jkbradley commented Jan 26, 2016

SparkQA commented Jan 26, 2016

yanboliang Jan 26, 2016

jkbradley Jan 26, 2016

jkbradley commented Jan 26, 2016

yanboliang commented Jan 27, 2016

SparkQA commented Jan 27, 2016

Wenpei commented Jan 27, 2016

jkbradley commented Jan 27, 2016

jkbradley commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Jan 28, 2016

yanboliang commented Jan 28, 2016

jkbradley commented Jan 29, 2016

[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

Conversation

yanboliang commented Dec 24, 2015

SparkQA commented Dec 24, 2015

jkbradley commented Jan 11, 2016

jkbradley Jan 12, 2016

Choose a reason for hiding this comment

jkbradley commented Jan 12, 2016

SparkQA commented Jan 25, 2016

jkbradley commented Jan 26, 2016

SparkQA commented Jan 26, 2016

yanboliang Jan 26, 2016

Choose a reason for hiding this comment

jkbradley Jan 26, 2016

Choose a reason for hiding this comment

jkbradley commented Jan 26, 2016

yanboliang commented Jan 27, 2016

SparkQA commented Jan 27, 2016

Wenpei commented Jan 27, 2016

jkbradley commented Jan 27, 2016

jkbradley commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Jan 28, 2016

yanboliang commented Jan 28, 2016

jkbradley commented Jan 29, 2016