[SPARK-10686] [ML] Add quantilesCol to AFTSurvivalRegression#8836
[SPARK-10686] [ML] Add quantilesCol to AFTSurvivalRegression#8836yanboliang wants to merge 5 commits intoapache:masterfrom
Conversation
|
Test build #42711 has finished for PR 8836 at commit
|
|
The changes look good to me. Just want to discuss the semantics to enable quantiles in output. We set a default value for I would suggest making |
…es to enable quantiles output
|
Test build #42750 has finished for PR 8836 at commit
|
There was a problem hiding this comment.
hasQuantilesCol && hasQuantileProbabilities? If only one is set, I think we should make a warn log message.
|
@yanboliang Shall we have a test without "quantilesCol" or having "quantilesProbabilities |
There was a problem hiding this comment.
The semantic:
1, Users set both quantileProbabilities and quantilesCol => output w/ quantiles column
2, Users set quantileProbabilities but not quantilesCol => print warning here, output w/o quantiles column and users can use predictQuantiles
3, Users set quantilesCol but not quantileProbabilities => print warning here, output w/o quantiles column. Users can not use predictQuantiles otherwise throw IllegalArgumentException at predictQuantiles
4, Neither quantileProbabilities nor quantilesCol is set => Users purpose to output w/o quantiles column and it will in line with their expectation.
There was a problem hiding this comment.
I think the sematic is clear to me. On the other hand, can we consider these two as a combo, i.e.
case class QuantileParams(quantileProbabilities: Array[Double], quantilesCol: String)
Then the user will have to set both items simultaneously.
Not sure it will work but try to brain storm. :)
There was a problem hiding this comment.
@rotationsymmetry Thanks for your comments! I think we should keep the same tradition with R which have a default prediction and some other predictions could be enabled by parameter.
There was a problem hiding this comment.
I think if users set quantilesCol but not quantileProbabilities, we should throw an exception rather than an warning message because users should expect quantilesCol in the output. Actually, I think it is more convenient to set the default value of quantileProbabilities to [0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99], and then require a non-empty array for this param. In this case, quantilesCol controls whether to output quantiles in transform.
|
Test build #42813 has finished for PR 8836 at commit
|
There was a problem hiding this comment.
It is better to put setters before fit to make the estimator and the model it produces consistent.
|
Test build #42897 has finished for PR 8836 at commit
|
|
Jenkins, test this please. |
|
Test build #42901 has finished for PR 8836 at commit
|
|
LGTM. Merged into master. Thanks! |
By default
quantilesColshould be empty. IfquantileProbabilitiesis set, we should append quantiles as a new column (of type Vector).