Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example #10469

Closed
wants to merge 9 commits into from

Conversation

yanboliang
Copy link
Contributor

  • Implement MLWriter/MLWritable/MLReader/MLReadable for PySpark.
  • Making LinearRegression to support save/load as example. After this merged, the work for other transformers/estimators will be easy, then we can list and distribute the tasks to the community.

cc @mengxr @jkbradley

@SparkQA
Copy link

SparkQA commented Dec 24, 2015

Test build #48300 has finished for PR 10469 at commit 19b07b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LinearRegressionModel(JavaModel, MLWritable, TransformerMLReadable):\n * class MLWriter(object):\n * class MLWritable(object):\n * class MLReader(object):\n * class MLReadable(object):\n * java_class = cls._java_loader_class()\n * class TransformerMLReadable(MLReadable):\n * class EstimatorMLReadable(MLReadable):\n

@jkbradley
Copy link
Member

@yanboliang I'll take a look at this now. Sorry for the delay!

True
>>> abs(model.intercept - model2.intercept) < 0.001
True
>>> model_path = path + "/model"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use directory "/lr_model"?

@jkbradley
Copy link
Member

I just added some comments quickly, but let me know if my suggestions are workable. I did not test the suggestions myself.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #49987 has finished for PR 10469 at commit 106a2b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@yanboliang Thanks for the updates; I'll try to make final comments soon. I left one response in one of the threads above.

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50099 has finished for PR 10469 at commit 3a0d6be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -159,15 +151,16 @@ class JavaModel(Model, JavaTransformer):

__metaclass__ = ABCMeta

def __init__(self, java_model):
def __init__(self, java_model=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify the construction of Model and Estimator. Model can be instantiated without argument which is used by load.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note in the doc to explain this?

@jkbradley
Copy link
Member

Thanks for the updates! Done with a pass. They are mostly minor comments, except for making MLReadable, MLWritable more general and not specific to Java wrappers.

@yanboliang
Copy link
Contributor Author

@jkbradley Thanks for your comments! I have made MLReadable and MLWritable more general and not specific to Java wrappers, addressed all comments except for setting the Param.parent. To that issue, I left the inline comment in the threads above.

@yanboliang yanboliang changed the title [SPARK-11939] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example [SPARK-13032] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example Jan 27, 2016
@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50182 has finished for PR 10469 at commit 7329124.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaMLWriter(object):
    • class JavaMLReader(object):

@Wenpei
Copy link
Contributor

Wenpei commented Jan 27, 2016

HI
I raise a common issues here when I start look pyspark. I found there is only one test.py to test basic RDD and spark submit related api. There is no ml & mllib related unit test currently, I knew we just implement wrapper here, but some place may need ut, for example, parameter process.
Just raise what I thought.

Regards.
Wenpei

@jkbradley
Copy link
Member

@Wenpei This isn't the right forum for posting comments like that (since hardly anyone will see your comment). I'd recommend identifying missing unit tests and making JIRAs for them. We do have tests.py files under pyspark/ml and pyspark/mllib, so please do check those first before making JIRAs.

@jkbradley
Copy link
Member

@yanboliang I hope you don't mind, but I took the liberty of experimenting a bit myself and sending this PR: [https://github.com/yanboliang/pull/4] Please let me know what you think!

Btw, thanks for the generalization-related updates. I guess we'd have to go further (providing MLWriter, MLReader abstract classes) if we wanted to allow Python developers to implement persistence from Python, but we can address that in the future (if anyone requests it). Your changes should help us work towards that though.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50264 has finished for PR 10469 at commit f07ffcb.

  • This patch fails Python style tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):\n * class LinearRegressionModel(JavaModel, MLWritable, MLReadable):\n * class JavaMLWriter(object):\n * class MLWritable(object):\n * class JavaMLReader(object):\n * java_class = cls._java_loader_class(clazz)\n * class MLReadable(object):\n

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50267 has finished for PR 10469 at commit 7334be9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

@jkbradley You PR looks good and get merged, thanks!

@jkbradley
Copy link
Member

LGTM
Thanks for this PR! This makes it really simple to add persistence.
I'll merge it with master.

@asfgit asfgit closed this in e51b6ea Jan 29, 2016
@yanboliang yanboliang deleted the spark-11939 branch January 30, 2016 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants