Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10509][PYSPARK] Reduce excessive param boiler plate code #10216

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Dec 9, 2015

The current python ml params require cut-and-pasting the param setup and description between the class & __init__ methods. Remove this possible case of errors & simplify use of custom params by adding a _copy_new_parent method to param so as to avoid cut and pasting (and cut and pasting at different indentation levels urgh).

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47408 has finished for PR 10216 at commit a976b78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Dec 14, 2015

cc @JoshRosen & @yanboliang who have probably had to deal with a fair amount of this boilerplate code in the past.

@holdenk
Copy link
Contributor Author

holdenk commented Dec 30, 2015

ping @yanboliang if you have a chance to look at this would be appreciated.

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48492 has finished for PR 10216 at commit 384fd0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Jan 7, 2016

re-ping @yanboliang if you have any thoughts on this approach.

@holdenk
Copy link
Contributor Author

holdenk commented Jan 7, 2016

cc @jkbradley

self.threshold = Param(self, "threshold",
"Threshold in binary classification prediction, in range [0, 1]." +
" If threshold and thresholds are both set, they must match.")
self.threshold = LogisticRegression.threshold._copy_new_parent(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegression.threshold is a placeholder to make the param appear in the generated doc. After this change, it has more uses, should we also change the annotation of that dummy param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good point - I'll remove the comments about them just being dummy params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can eliminate the need for these calls in concrete classes' init methods.

@yanboliang
Copy link
Contributor

@holdenk Sorry for late response, please see my inline comments.
This patch works well, but I think we should to ensure it conforms to the structure of the basic design.
cc @jkbradley @mengxr for another pass.

@SparkQA
Copy link

SparkQA commented Jan 8, 2016

Test build #49031 has finished for PR 10216 at commit 8e8cbae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2016

Test build #49032 has finished for PR 10216 at commit e0f3f00.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49176 has finished for PR 10216 at commit 7aecb59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2016

Test build #49533 has finished for PR 10216 at commit c4a2919.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Jan 19, 2016

ping @jkbradley if you've got the time to look at this that would be awesome.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #50013 has finished for PR 10216 at commit 53edd3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@holdenk Sorry for the delay. This looks great, but I was wondering if it could be improved even further: Would it be possible to add a method in the __init__ method of class Params which inspects the class, identifies attributes of type Param, and adds them to the new instance? That would eliminate the call required for each Param inside each class' init method.

@holdenk
Copy link
Contributor Author

holdenk commented Jan 25, 2016

@jkbradley sounds interesting - would probably want to change the code generated params as well then to match but yah I'll take a crack at that this week.

@jkbradley
Copy link
Member

@holdenk I'm not quite sure what you mean, but I'll comment in a few places to make sure I make my idea clear.

@@ -72,7 +72,6 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti
.. versionadded:: 1.3.0
"""

# a placeholder to make it appear in the generated doc
threshold = Param(Params._dummy(), "threshold",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can hopefully remain the same.

@holdenk
Copy link
Contributor Author

holdenk commented Jan 26, 2016

@jkbradley I mean there is also the shared params code gen, so we would want to update the generated params as well (right now the change only affects the manual params).

@jkbradley
Copy link
Member

Ohh, right! Thanks for remembering. Should I make another pass?

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50122 has finished for PR 10216 at commit 0d28922.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

$dummyPlaceHolders

def __init__(self):
super(DecisionTreeParams, self).__init__()
$realParams'''
super(DecisionTreeParams, self).__init__()'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep newline at end

@jkbradley
Copy link
Member

Just a few small comments now

…t and setters in tree params codegen, regen shared params
@holdenk
Copy link
Contributor Author

holdenk commented Jan 26, 2016

@jkbradley thanks addressed those small issues :)

@jkbradley
Copy link
Member

LGTM pending tests.
Thanks a lot for this PR---this is awesome!

@holdenk
Copy link
Contributor Author

holdenk commented Jan 26, 2016

I hate copy'paste code, so glad to be able to kill some (and not make more of making Python wrappers) :)

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50134 has finished for PR 10216 at commit 8396aef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Merging with master

Btw, I verified that the doc generates in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants