Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hist in the partition builder under column split #9120

Merged
merged 4 commits into from May 10, 2023

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 4, 2023

First step in supporting column split for hist.

@rongou
Copy link
Contributor Author

rongou commented May 4, 2023

@trivialfis

@@ -218,7 +232,27 @@ class PartitionBuilder {
}
}
} else {
LOG(FATAL) << "Column data split is only supported for the `approx` tree method";
auto pred_hist = [&](auto ridx, auto bin_id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, is there any difference from

auto pred_hist = [&](auto ridx, auto bin_id) {
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are the same, but since it's a closure, can't be easily reused. Maybe we should extract into a standalone function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partitioner is becoming less and less manageable now. I don't have any good suggestions on how to refactor it at the moment and can approve the PR as-is. But would be great if there's a way to simplify/modularize it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to separate out the column-split logic into its own class?

Copy link
Member

@trivialfis trivialfis May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe next time. Currently, we have total 8 combinations for tree_method, categorical, column-split, along with these, we also need to take multi-output into account. After these combinations, we have type dispatching for histogram bins and dense/sparse/dense-with-missing. So ... how many cases there are?

I'm trying to think of a way to make the code easier to understand and write a test suite specifically for partition builder.

@rongou
Copy link
Contributor Author

rongou commented May 8, 2023

@trivialfis looks like the CI failure is unrelated. Can this be merged now? Thanks!

@trivialfis
Copy link
Member

I have restarted all the tests.

@trivialfis trivialfis merged commit 603f8ce into dmlc:master May 10, 2023
24 checks passed
@rongou rongou deleted the colsplit-hist branch September 25, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants