Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Experimental support for l1 error. #7812

Merged
merged 127 commits into from
Apr 26, 2022
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Apr 15, 2022

Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:

  • The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
  • The GPU implementation is not ideal due to sampling support. The partition is calculated from node position, if a node doesn't have any valid sample then it's missing in the partitioner and we will have to fill it back.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks significantly better to me without trying to share the internal row representations inside the updaters.

auto p_begin = row_set.Data()->data();
ParallelFor(row_set.Size(), ctx->Threads(), [&](size_t i) {
auto const& node = row_set[i];
if (node.node_id < 0 || !tree[node.node_id].IsLeaf()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually occur?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the CPU, all nodes are being kept track of including internal nodes. For node_id < 0, it should be an internal node and we can skip the IsLeaf check, I will see if there's any edge case I missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned the condition into a check.

src/tree/gpu_hist/row_partitioner.cuh Show resolved Hide resolved
});
} else {
// Avoid copying nodes by using evaluator to get leaf weight on-the-fly, this is
// useful when tree leaf is not updated after tree construction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth keeping this optimisation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't run the full benchmark yet, copying nodes is probably fine, but not sure about the memory allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

info_->num_col_,
batch_param));
dh::safe_cuda(cudaSetDevice(ctx_->gpu_id));
info_->feature_types.SetDevice(ctx_->gpu_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of unrelated changes in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change for device_ -> ctx_->gpu_id might not be strictly related, I changed it when I ran into a segfault during the test of gpu_id being specified to 1. I can revert those if needed, but I think these changes are easy to skip.

@trivialfis
Copy link
Member Author

trivialfis commented Apr 25, 2022

This is an initial benchmark with gbm-bench. I haven't run the benchmark with the newly added l1 error, which can use some optimizations in quantile computation and I will run the benchmark after the related work is finished.

bosch higgs years
Time AUC Time AUC MSE
CPU Hist (master) 79.08961106099741 0.6902547809021118 141.6087378029988 0.8399291373085628 30.87248679699769 79.77742340435458
CPU Hist (PR) 77.38803030599956 0.6902547809021118 137.95923503099766 0.8399291373085628 24.732420017997356 79.77742340435458
GPU Hist (master) 21.8356087760003 0.6887260447943401 23.57480316100191 0.839534647606739 10.975372369997785 80.31760322683175
GPU Hist (PR) 21.80807627499962 0.6887260388302106 24.510682308002288 0.8395346476950757 10.876273957001104 79.91012003330812

@trivialfis trivialfis merged commit fdf533f into dmlc:master Apr 26, 2022
@trivialfis trivialfis deleted the adaptive-tree branch April 26, 2022 13:41
@s-banach
Copy link

Hey @trivialfis, looks like great work!
Sorry I can't read C++ very well, so I hope I can ask for clarification here.
The 1.7.0 release notes claim you can optimize "without a valid hessian".
Does that mean the hessian is ignored completely when choosing splits?

It seems to be implicit in the xgboost paper than the hessian is supposed to be nonnegative.
I wonder if this approach would allow custom objectives where the hessian may sometimes be negative?

@trivialfis
Copy link
Member Author

@s-banach

At the moment XGBoost uses the sample weight as hessian (default to 1) for l1, and recomputes the leaf values based on input labels after growing the tree.

I wonder if this approach would allow custom objectives where the hessian may sometimes be negative

We haven't exposed the feature to custom objectives yet unless you are writing C++ extensions.

@kayhman
Copy link

kayhman commented Jan 17, 2023

Hi @trivialfis,

This is a very interesting approach. How do you update the value after the tree is built ?

The changelog mention line search, but I see no mention of that in the code. I guess that this is the method UpdateTreeLeaf of the objectives that does the job (in UpdateTreeLeafHost ?), but I don't understand what mathematical operation is done.

Can you provide more details?

Best regards,

@kayhman
Copy link

kayhman commented Jan 17, 2023

ok, I see. The line search is done thought the alpha parameter, and you use the alpha-quantile as leaf value?

And as far as I can see, alpha is always set to 0.5.

Is that right?

@trivialfis
Copy link
Member Author

For l1, the optimal is median, which is 0.5 quantile.

@kayhman
Copy link

kayhman commented Jan 22, 2023

ok, thanks for the answer. So no needs for line search in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants