Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13761] [ML] Deprecate validateParams #11620

Closed
wants to merge 9 commits into from

Conversation

hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Mar 10, 2016

What changes were proposed in this pull request?

Deprecate validateParams() method here:


Move all functionality in overridden methods to transformSchema().
Check docs to make sure they indicate complex Param interaction checks should be done in transformSchema.

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52781 has finished for PR 11620 at commit 5682bfc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52822 has finished for PR 11620 at commit 3ccf90a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52833 has finished for PR 11620 at commit 29c5a2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
@deprecated("All the checks should be merged into transformSchema", "2.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also say this method will be removed in 2.1.0

@jkbradley
Copy link
Member

Only a few small comments

@@ -61,7 +63,8 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam
* Param for the name of link function which provides the relationship
* between the linear predictor and the mean of the distribution function.
* Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt".
* @group param
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@jkbradley
Copy link
Member

Thanks for the updates. Still some indentation issues, but that's it.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53347 has finished for PR 11620 at commit 91f72a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Mar 16, 2016

@jkbradley Sorry for those unintentional changes and thanks for the patience.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #53375 has finished for PR 11620 at commit f348044.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

No problem. Thanks for the PR!
LGTM
Merging with master

@asfgit asfgit closed this in 92b7057 Mar 17, 2016
*/
@deprecated("Will be removed in 2.1.0. Checks should be merged into transformSchema.", "2.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this now causes a number of deprecation warnings in the Spark code, which we're trying to get rid of. Can most of the remaining usages be transformed to not use this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies! I should have checked the Jenkins logs. I'll send a clean-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jkbradley have you already started on this? Sorry for the troubling. I didn't remove the ones in CrossValidator and TrainValidationSplit because I think it can be handy if we can run some validation before submitting the paramMap. Let me know if I can help in any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem; I just sent a PR for it.

asfgit pushed a commit that referenced this pull request Mar 17, 2016
## What changes were proposed in this pull request?

Cleanups from [#11620]: remove remaining uses of validateParams, and put functionality into transformSchema

## How was this patch tested?

Existing unit tests, modified to check using transformSchema instead of validateParams

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #11790 from jkbradley/SPARK-13761-cleanup.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

Deprecate validateParams() method here: https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553
Move all functionality in overridden methods to transformSchema().
Check docs to make sure they indicate complex Param interaction checks should be done in transformSchema.

## How was this patch tested?

unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes apache#11620 from hhbyyh/depreValid.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

Cleanups from [apache#11620]: remove remaining uses of validateParams, and put functionality into transformSchema

## How was this patch tested?

Existing unit tests, modified to check using transformSchema instead of validateParams

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#11790 from jkbradley/SPARK-13761-cleanup.
ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 26, 2016
…hon API.

## What changes were proposed in this pull request?
apache#14597 modified ```ChiSqSelector``` to support ```fpr``` type selector, however, it left some issue need to be addressed:
* We should allow users to set selector type explicitly rather than switching them by using different setting function, since the setting order will involves some unexpected issue. For example, if users both set ```numTopFeatures``` and ```percentile```, it will train ```kbest``` or ```percentile``` model based on the order of setting (the latter setting one will be trained). This make users confused, and we should allow users to set selector type explicitly. We handle similar issues at other place of ML code base such as ```GeneralizedLinearRegression``` and ```LogisticRegression```.
* Meanwhile, if there are more than one parameter except ```alpha``` can be set for ```fpr``` model, we can not handle it elegantly in the existing framework. And similar issues for ```kbest``` and ```percentile``` model. Setting selector type explicitly can solve this issue also.
* If setting selector type explicitly by users is allowed, we should handle param interaction such as if users set ```selectorType = percentile``` and ```alpha = 0.1```, we should notify users the parameter ```alpha``` will take no effect. We should handle complex parameter interaction checks at ```transformSchema```. (FYI apache#11620)
* We should use lower case of the selector type names to follow MLlib convention.
* Add ML Python API.

## How was this patch tested?
Unit test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15214 from yanboliang/spark-17017.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants