New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14104][PYSPARK][ML] All Python param setters should use the _set
method
#11939
Conversation
Test build #54062 has finished for PR 11939 at commit
|
@@ -444,7 +444,14 @@ def _setDefault(self, **kwargs): | |||
Sets default params. | |||
""" | |||
for param, value in kwargs.items(): | |||
self._defaultParamMap[getattr(self, param)] = value | |||
p = getattr(self, param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a previous PR a parameter was given an incorrect type converter, and this was not caught by the tests. Enforcing _setDefault
to use the type converter for the param will ensure that all params with default values cannot be given incompatible type converters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
Test build #54068 has finished for PR 11939 at commit
|
@@ -1721,7 +1721,7 @@ def __init__(self, inputCol=None, outputCol=None, stopWords=None, | |||
self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover", | |||
self.uid) | |||
stopWordsObj = _jvm().org.apache.spark.ml.feature.StopWords | |||
defaultStopWords = stopWordsObj.English() | |||
defaultStopWords = list(stopWordsObj.English()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change to _setDefault
, I had to change this default to be a list instead of JavaObject. The other option would be to have type converters do nothing if they encounter JavaObjects. It is nice to leave stop words as a JavaObject if they are never accessed explicitly on the Python side. Would appreciate thoughts on this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever explicitly access them on the Python side - although a users application might attempt to do that and append stop words to the existing list in which case having it as a list is maybe good. One could get a similar effect by changing getStopWords without having to round trip the list in cases where it isn't ever accessed on the python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is simple to make this change, so I think it's a good idea. This will help in the future for similar cases or if the list of stopwords grows even larger. I changed getStopWords
to return a list always which is better for users, I think. Thanks for the suggestion!
cc @holdenk @jkbradley Could you take a look whenever you get a chance? |
Test build #54711 has finished for PR 11939 at commit
|
Would be good to update to master so can run the latest tests, but at its current point it seems to have gotten all of the direct paramMap sets (although there may be more in master now). It might make sense to also add a clearParam function and then add a note that no one (including developers) should directly access the param map but instead use one of the access functions? |
@holdenk I added a |
Test build #55767 has finished for PR 11939 at commit
|
I'll take a look now |
@@ -125,7 +125,8 @@ class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPrediction | |||
""" | |||
|
|||
metricName = Param(Params._dummy(), "metricName", | |||
"metric name in evaluation (areaUnderROC|areaUnderPR)") | |||
"metric name in evaluation (areaUnderROC|areaUnderPR)", | |||
TypeConverters.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify typeConverter as a keyword arg (here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Test build #55930 has finished for PR 11939 at commit
|
Thanks for the updates. I just sent #12422 Could you please take a look at it? |
Test build #2792 has finished for PR 11939 at commit
|
LGTM |
…set` method ## What changes were proposed in this pull request? Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens. Additional changes: * [SPARK-13068](apache#11663) missed adding type converters in evaluation.py so those are done here * An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here. ## How was this patch tested? Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#11939 from sethah/SPARK-14104.
What changes were proposed in this pull request?
Param setters in python previously accessed the _paramMap directly to update values. The
_set
method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to_paramMap
besides the one in the_set
method to ensure type checking happens.Additional changes:
toBoolean
type converter was used for StringIndexerhandleInvalid
param in previous PR. This is fixed here.How was this patch tested?
Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.