Skip to content

[SPARK-10686] [ML] Add quantilesCol to AFTSurvivalRegression#8836

Closed
yanboliang wants to merge 5 commits intoapache:masterfrom
yanboliang:spark-10686
Closed

[SPARK-10686] [ML] Add quantilesCol to AFTSurvivalRegression#8836
yanboliang wants to merge 5 commits intoapache:masterfrom
yanboliang:spark-10686

Conversation

@yanboliang
Copy link
Contributor

By default quantilesCol should be empty. If quantileProbabilities is set, we should append quantiles as a new column (of type Vector).

@SparkQA
Copy link

SparkQA commented Sep 19, 2015

Test build #42711 has finished for PR 8836 at commit 1d8a682.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Sep 20, 2015

The changes look good to me. Just want to discuss the semantics to enable quantiles in output. We set a default value for quantilesCol, and use quantileProbabilities to control whether to output quantiles. Another issue is that predictQuantiles requires quantileProbabilities. It means in order to use predictQuantiles, we need to append a new column in transform. It is not straightforward to understand the logic here.

I would suggest making quantilesCol default to an empty string and ask users to explicitly set both quantilesCol and quantileProbabilities to enable quantile output.

@SparkQA
Copy link

SparkQA commented Sep 21, 2015

Test build #42750 has finished for PR 8836 at commit 6b86903.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasQuantilesCol && hasQuantileProbabilities? If only one is set, I think we should make a warn log message.

@mengxr
Copy link
Contributor

mengxr commented Sep 21, 2015

@yanboliang Shall we have a test without "quantilesCol" or having "quantilesProbabilitiesbut notquantilesCol`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic:
1, Users set both quantileProbabilities and quantilesCol => output w/ quantiles column
2, Users set quantileProbabilities but not quantilesCol => print warning here, output w/o quantiles column and users can use predictQuantiles
3, Users set quantilesCol but not quantileProbabilities => print warning here, output w/o quantiles column. Users can not use predictQuantiles otherwise throw IllegalArgumentException at predictQuantiles
4, Neither quantileProbabilities nor quantilesCol is set => Users purpose to output w/o quantiles column and it will in line with their expectation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sematic is clear to me. On the other hand, can we consider these two as a combo, i.e.
case class QuantileParams(quantileProbabilities: Array[Double], quantilesCol: String)
Then the user will have to set both items simultaneously.

Not sure it will work but try to brain storm. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rotationsymmetry Thanks for your comments! I think we should keep the same tradition with R which have a default prediction and some other predictions could be enabled by parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if users set quantilesCol but not quantileProbabilities, we should throw an exception rather than an warning message because users should expect quantilesCol in the output. Actually, I think it is more convenient to set the default value of quantileProbabilities to [0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99], and then require a non-empty array for this param. In this case, quantilesCol controls whether to output quantiles in transform.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #42813 has finished for PR 8836 at commit 34c518d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to put setters before fit to make the estimator and the model it produces consistent.

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42897 has finished for PR 8836 at commit 417c8cb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42901 has finished for PR 8836 at commit 417c8cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Sep 23, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in ce2b056 Sep 23, 2015
@yanboliang yanboliang deleted the spark-10686 branch September 28, 2015 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants