fix: handle both 1D and 2D input shapes in XGBoost custom objectives#866
Conversation
|
|
Hi @Soutehkeshan, thanks so much for picking this up and submitting a fix so quickly! Really appreciated. The code change looks correct. Both On the DCO check failing: if you haven't already, you can either add a sign-off by amending your commit ( Looking forward to seeing more contributions from you! |
Adds regression tests verifying both arctan_loss_multi_objective and pinball_loss_multi_objective accept 2D (n_samples, n_quantiles) inputs as passed by XGBoost 3.2, producing identical results to the flat 1D arrays used in older versions. Tests intentionally FAIL on the unfixed code and PASS after PR #866 is merged into release/v4.0.0. Closes #865 (once #866 is merged and these pass) Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Adds regression tests verifying both arctan_loss_multi_objective and pinball_loss_multi_objective accept 2D (n_samples, n_quantiles) inputs as passed by XGBoost 3.2, producing identical results to the flat 1D arrays used in older versions. Tests intentionally FAIL on the unfixed code and PASS after PR #866 is merged into release/v4.0.0. Closes #865 (once #866 is merged and these pass) Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Adds regression tests verifying both arctan_loss_multi_objective and pinball_loss_multi_objective accept 2D (n_samples, n_quantiles) inputs as passed by XGBoost 3.2, producing identical results to the flat 1D arrays used in older versions. Tests intentionally FAIL on the unfixed code and PASS after PR #866 is merged into release/v4.0.0. Closes #865 (once #866 is merged and these pass) Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Adds regression tests verifying both arctan_loss_multi_objective and pinball_loss_multi_objective accept 2D (n_samples, n_quantiles) inputs as passed by XGBoost 3.2, producing identical results to the flat 1D arrays used in older versions. Tests intentionally FAIL on the unfixed code and PASS after PR #866 is merged into release/v4.0.0. Closes #865 (once #866 is merged and these pass) Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com> Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
## Summary Adds regression tests verifying that `arctan_loss_multi_objective` and `pinball_loss_multi_objective` accept **2D `(n_samples, n_quantiles)` input arrays** as passed by XGBoost 3.2+, and produce results identical to the flat 1D arrays used in older versions. ## Context XGBoost 3.2 changed the calling convention for custom objectives with multi-output trees: arrays are now passed as 2D `(n_samples, n_outputs)` instead of the flat 1D `(n_samples * n_outputs,)` used previously. This was tracked in #865 and fixed in #866. ## Relationship to #866 - These tests **intentionally FAIL** on the unfixed code (this branch's base, `release/v4.0.0` before #866). - They will **pass** once #866 is merged. - This PR should be merged **after** #866 to ensure CI stays green. ## Tests added - `test_loss_fn__2d_input_matches_1d_input[pinball]` — pinball objective with 2D input - `test_loss_fn__2d_input_matches_1d_input[arctan]` — arctan objective with 2D input - `test_loss_fn__2d_input_with_sample_weights_matches_1d[pinball]` — pinball with 2D input + sample weights - `test_loss_fn__2d_input_with_sample_weights_matches_1d[arctan]` — arctan with 2D input + sample weights Closes #865 (when merged after #866) --------- Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
|
Hi @egordm, Thanks for your feedback. I'm not sure if sign-off is still needed considering that the PR has already been merged. If so, could you please handle it with a squash-merge on your side? Looking forward to contributing more. |



Summary
Fixes compatibility with XGBoost 3.2, which changed the shape of arrays passed to custom objectives via the sklearn interface. With
multi_strategy="one_output_per_tree",y_trueandy_predare now passed as 2D(n_samples, n_quantiles)arrays instead of 1D flattened arrays of lengthn_samples * n_quantiles.Closes #865
Root Cause
The old reshape logic in both objective functions assumed 1D input:
With 2D input,
len(y_true)returnsn_samples(notn_samples * n_quantiles), son_rows = n_samples // n_quantilesis wrong, causing the reshape to fail or produce incorrect results.The below image shows running a minimal script using XGBoost 3.1:

The below image shows running the same minimal script using XGBoost 3.2:

Fix
Replaced with shape-agnostic normalisation in
arctan_loss_multi_objectiveandpinball_loss_multi_objective:np.reshape(arr, (-1, n_quantiles))is a no-op when the array is already(n_samples, n_quantiles)and correctly reshapes a 1D flat array, maintaining backward compatibility with XGBoost <3.2.Investigation Notes
mean_pinball_loss: already usednp.reshape(y_pred, [-1, n_quantiles])which handles both shapes correctly — no change neededmetrics_probabilistic.py: no changes needed,sample_weight_eval_setverified working correctly with XGBoost 3.2xgboost_forecaster.py: no changes neededpyproject.toml: already at>=3,<4on this branch — no change needed