Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Remove parameters and attributes related to ntree and rebase iterationrange #9935

Merged
merged 3 commits into from
Jan 20, 2024

Conversation

david-cortes
Copy link
Contributor

@david-cortes david-cortes commented Dec 29, 2023

ref #9810

After the introduction of newer tree modalities such as multi-quantile regression, the code for determining the number of trees in a model is no longer correct, and other sections that rely on it might produce incorrect results. I see that parameters referring to number of trees have been deprecated in favor of parameters referring to number of iterations, so I'm making the switch here.

This PR:

  • Removes the deprecated argument ntree_limit in predict.
  • Removes the 'ntree' attribute and internal accessor which are no longer needed.
  • Fixes incorrect usage of num_class to determine prediction shapes, as now there are more ways in which predict can output multi-dimensional results.
  • Changes the format of iterationrange to match with R's sequences/ranges.

I've based it off from the current last commit of previous PR #9924 which is a requisite for the changes introduced here.

I'm not sure how to make the PR show the diff w.r.t to that other PR, as it's not a branch of this repository, and I cannot open a PR here to merge to a branch on my own repository, so this PR will need to be rebased later on in order to make it mergable (I think the kind of commits that github generates will also mess up git merge later on so a rebase will anyway be needed).

@hcho3
Copy link
Collaborator

hcho3 commented Jan 3, 2024

I'm not sure how to make the PR show the diff w.r.t to that other PR, as it's not a branch of this repository, and I cannot open a PR here to merge to a branch on my own repository, so this PR will need to be rebased later on in order to make it mergable

It's fine. It's common for us to open pull requests that include the commits from another pull request. Let's try to merge #9924 soon.

@david-cortes
Copy link
Contributor Author

@trivialfis Would be ideal if you could review this PR next.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into #9948 as well, would be great if there's a common reindexing function either in C or in R that handles all the translation and can be tested independently.

I find it difficult to reason that all places that need indexing are performing the translation consistently.

R-package/R/xgb.Booster.R Show resolved Hide resolved
R-package/R/xgb.Booster.R Outdated Show resolved Hide resolved
@david-cortes
Copy link
Contributor Author

I'm looking into #9948 as well, would be great if there's a common reindexing function either in C or in R that handles all the translation and can be tested independently.

I find it difficult to reason that all places that need indexing are performing the translation consistently.

Do you have a list of places where such reindexing should be applied?

As I see it, currently there is:

  • best_iteration - but this one follows base-0 in the C attribute and base-1 in the R attribute, so handling would be quite different from the others.
  • iterationrange - not too hard to do the conversion, since it always amounts to just subtracting from the initial index.
  • Booster slicing - but this one requires a very different approach that the others do not share; and if it were to be re-implemented by taking an integer array instead of start/end/step, the handling would also be completely different if we want it to follow R's idioms.
  • Class index in the linear coefficients callback (currently base-0) - but I think this one would make more sense to match with the numbers in label which are base-0.

@trivialfis
Copy link
Member

trivialfis commented Jan 15, 2024

Thank you for sharing, it might be difficult to gather everything into one place. An additional concern is the categorical data.

It's not just 0-based indexing v.s. 1-based indexing, exclusive and inclusive on the end of a range is also problematic, would be great if we could at least find a place to document all the differences so that we can lookup in the future.

@david-cortes
Copy link
Contributor Author

Thank you for sharing, it might be difficult to gather everything into one place. An additional concern is the categorical data.

It's not just 0-based indexing v.s. 1-based indexing, exclusive and inclusive on the end of a range is also problematic, would be great if we could at least find a place to document all the differences so that we can lookup in the future.

You mean in one of those .rst files from the readthedocs page for developers?

@trivialfis
Copy link
Member

You mean in one of those .rst files from the readthedocs page for developers?

sounds good!

@trivialfis trivialfis mentioned this pull request Jan 20, 2024
27 tasks
@trivialfis trivialfis merged commit c5d0608 into dmlc:master Jan 20, 2024
25 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants