Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14104][PYSPARK][ML] All Python param setters should use the _set method #11939

Closed
wants to merge 7 commits into from

Conversation

sethah
Copy link
Contributor

@sethah sethah commented Mar 24, 2016

What changes were proposed in this pull request?

Param setters in python previously accessed the _paramMap directly to update values. The _set method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to _paramMap besides the one in the _set method to ensure type checking happens.

Additional changes:

  • SPARK-13068 missed adding type converters in evaluation.py so those are done here
  • An incorrect toBoolean type converter was used for StringIndexer handleInvalid param in previous PR. This is fixed here.

How was this patch tested?

Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54062 has finished for PR 11939 at commit d8d97e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -444,7 +444,14 @@ def _setDefault(self, **kwargs):
Sets default params.
"""
for param, value in kwargs.items():
self._defaultParamMap[getattr(self, param)] = value
p = getattr(self, param)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous PR a parameter was given an incorrect type converter, and this was not caught by the tests. Enforcing _setDefault to use the type converter for the param will ensure that all params with default values cannot be given incompatible type converters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54068 has finished for PR 11939 at commit 793ba7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1721,7 +1721,7 @@ def __init__(self, inputCol=None, outputCol=None, stopWords=None,
self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
self.uid)
stopWordsObj = _jvm().org.apache.spark.ml.feature.StopWords
defaultStopWords = stopWordsObj.English()
defaultStopWords = list(stopWordsObj.English())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to _setDefault, I had to change this default to be a list instead of JavaObject. The other option would be to have type converters do nothing if they encounter JavaObjects. It is nice to leave stop words as a JavaObject if they are never accessed explicitly on the Python side. Would appreciate thoughts on this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever explicitly access them on the Python side - although a users application might attempt to do that and append stop words to the existing list in which case having it as a list is maybe good. One could get a similar effect by changing getStopWords without having to round trip the list in cases where it isn't ever accessed on the python side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simple to make this change, so I think it's a good idea. This will help in the future for similar cases or if the list of stopwords grows even larger. I changed getStopWords to return a list always which is better for users, I think. Thanks for the suggestion!

@sethah
Copy link
Contributor Author

sethah commented Apr 1, 2016

cc @holdenk @jkbradley Could you take a look whenever you get a chance?

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54711 has finished for PR 11939 at commit 3b0f89b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Apr 11, 2016

Would be good to update to master so can run the latest tests, but at its current point it seems to have gotten all of the direct paramMap sets (although there may be more in master now). It might make sense to also add a clearParam function and then add a note that no one (including developers) should directly access the param map but instead use one of the access functions?

@sethah
Copy link
Contributor Author

sethah commented Apr 14, 2016

@holdenk I added a _clearParam function. I am open to adding a note, but I'm not sure where to put it that would make it most effective. It seems a bit awkward for the note to go into the API docs, since it's more for developers. You were right, more direct param sets were added in Generalized Linear Regression, so I've removed them.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55767 has finished for PR 11939 at commit 8079c11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

I'll take a look now

@@ -125,7 +125,8 @@ class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPrediction
"""

metricName = Param(Params._dummy(), "metricName",
"metric name in evaluation (areaUnderROC|areaUnderPR)")
"metric name in evaluation (areaUnderROC|areaUnderPR)",
TypeConverters.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify typeConverter as a keyword arg (here and elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55930 has finished for PR 11939 at commit 37b9ac5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Thanks for the updates. I just sent #12422 Could you please take a look at it?

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #2792 has finished for PR 11939 at commit 37b9ac5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks for the PR!

@asfgit asfgit closed this in 129f2f4 Apr 15, 2016
lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
…set` method

## What changes were proposed in this pull request?

Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens.

Additional changes:
* [SPARK-13068](apache#11663) missed adding type converters in evaluation.py so those are done here
* An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here.

## How was this patch tested?

Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#11939 from sethah/SPARK-14104.
asfgit pushed a commit that referenced this pull request Apr 20, 2016
… method

## What changes were proposed in this pull request?
#11939 make Python param setters use the `_set` method. This PR fix omissive ones.

## How was this patch tested?
Existing tests.

cc jkbradley sethah

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12531 from yanboliang/setters-omissive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants