Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Use type argument to control prediction types #7947

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

david-cortes
Copy link
Contributor

ref #7906

In R packages for statistical modeling, the predict functions for model objects tend to allow switching between different prediction types such as probabilities or classes through a parameter type (for example, base R's predict.glm does it like that).

xgboost offers several mutually exclusive parameters to control prediction types (outputmargin, precontrib, predinteraction, predleaf). This PR merges them all into a single type parameter in line with base R and other popular packages, adding an option to predict classes for classification objectives.

Note however that objective multi:softmax is problematic in this scenario, since it doesn't allow switching between probabilities and classes on the fly, so an exception about it is mentioned in the docs.

Right now the code logic is to check the objective and determine if and how it should determine the predicted class from the outputs, but:

  • I wasn't able to find any C-level function to extract the booster's objective, and am not sure that the way is done in this PR is ideal.
  • It currently lists a set of objectives which allow class prediction from the original output, but I think a better approach would be to check if the objective starts with binary or multi - problem with such approach right now is that multi:softmax doesn't output probabilities.

Disclaimer: this is likely to cause several breakages in R packages that depend on xgboost.

@trivialfis
Copy link
Member

Thank you for the work on aligning XGBoost with the R ecosystem. I will take a deeper look later to see if it's possible to keep the compatibility with some deprecation warnings. Otherwise we will have to run a reverse dependency check and talk to other projects before cran submission for the next release.

@trivialfis
Copy link
Member

@hetong007 Would you great if you can help review the changes.

@hetong007
Copy link
Member

hetong007 commented May 30, 2022

In terms of the potential breaks, may I suggest that (at least in the next release) we keep both the new argument type and the old set of arguments? In this case, users can work with predict in either way. Just when users predict with the old arguments, predict throws out a deprecation warning.

Moreover, by setting a higher priority to type we resolve the case where the new and the old are conflicting.

@david-cortes
Copy link
Contributor Author

Added deprecation warnings and acceptance of the old argument types.


for (arg in names(old_args_for_type)) {
if (arg %in% extra_argnames) {
warning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with warning is that it might break CRAN checks of reverse dependencies. We need to talk to their maintainers before pushing this change to CRAN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It most definitly will. Good thing is, next version is going to be a 2.0, which signals a major release and thus potentially breaking changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In concept for general development this is true, while for CRAN it still regardlessly breaks reverse dependencies and got Ripleyed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the interface of predict() will definitively cause chaos, not only on CRAN.

@david-cortes
Copy link
Contributor Author

Renamed the types to be more in line with base R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants