ENH: add `quantile` function with weights support #494

cakedev0 · 2025-10-21T10:01:51Z

Resolves #340

Specs:

no weights: ND-support, propagate NaNs, supported methods: "linear", "inverted_cdf", "averaged_inverted_cdf"
- adding support for omitting NaNs shouldn't be too hard (if you assume NaNs are sorted at the end, which is what everyone does), but I don't think it's needed for now.
with weights: Only 1D or 2D support, propagate or omit NaNs, supported methods: "inverted_cdf", "averaged_inverted_cdf"
Delegates for all major supported backends when possible (numpy, cupy, torch and jax), except dask, as it was raising errors.

Comparison with NumPy:

Those specs matches NumPy quantile for nan_policy="propagate" and nanquantile for nan_policy="omit". Differences are:

using nan_policy="propagate"|"omit" (inspired by scipy) instead of two functions like numpy (quantile for nan_policy="propagate" and nanquantile. Why? Less code/doc duplication, and clearer behavior IMO.
weighted case:
- support for "averaged_inverted_cdf" method (only "inverted_cdf" method is supported by numpy): we need this in scikit-learn.
- only up to 2D: we don't need more for now.
non-weighted case:
- less methods. Why? We don't need more. I've only found one call with a method in scikit-learn and it's a deprecated method.
- no support for nan_policy="omit": there are a few calls to nanpercentile in scikit-learn. Would be easy to implement, following what has been done in scipy.
- implementation: we sort instead of relying on partitioning like numpy does. The partitioning-based implementation would be quite more complex and would not benefits perf in most cases, as currently xpx.partion relies on sorting when not delegating. This could be worked

Comparison with SciPy:

Main difference is the broadcasting/reduction behavior. We aligned on numpy for this one.

Also, SciPy doesn't support weights.

Implementation for the non-weighted case is more or less copy-pasted from SciPy, except that I've only kept support for 3 methods.

Future use of `xpx.quantile` in scikit-learn

Those specs should be enough to replace sklearn.utils.stats._weighted_percentile and to replace most uses of np.quantile/percentile when rewriting numpy-based functions to support Array API standard.

Note that the implementation for the weighted case differs a bit from sklearn's _weighted_percentile: it's mostly the same approach (sort weights based on a and compute cumulative sum), but they differ in the way edge cases are handled (null weights mostly). I believe my implementation to be easier to read and to get right, and equivalent in terms of performance (dominated by argsort for big inputs).

Still TODO:

decide whether to support dask. I don't think we should it's very slow and doesn't match the purpose of dask of mentioned here. Supporting dask would slow down the CI quite a bit.
read scikit-learn's tests for _weighted_percentile to see if there is something I missed in my tests

lucyleeow · 2025-10-21T10:28:23Z

From scipy/scipy#23832

There is a slight difference between NumPy gufunc and scipy.stats rules that has to do with prepending 1s to the shapes when the arguments have different dimensionality. Specifically, in scipy.stats, axis identifies the core dimension after 1s are prepended to match the dimensionalities.

I am assuming all delegate packages do what numpy does, in terms of broadcasting? If that is the case we should probably follow unless we think scipy has better rules?

…havior

cakedev0 · 2025-10-23T15:19:53Z

This PR is now ~95% finished, so marking it ready for review. Updated PR desc.

mathause · 2025-10-25T05:28:13Z

src/array_api_extra/_delegation.py

+            raise ValueError(msg)
+    else:
+        if method not in {"inverted_cdf", "averaged_inverted_cdf"}:
+            msg = f"`method` '{method}' not supported with weights."


only method x/ y support weights?

mathause · 2025-10-25T05:29:02Z

src/array_api_extra/_delegation.py

+
+    methods = {"linear", "inverted_cdf", "averaged_inverted_cdf"}
+    if method not in methods:
+        msg = f"`method` must be one of {methods}"


Sort methods to get a deterministic output?

I think it's deterministic already. But do you mean declaring methods in the sorted order?

Like this: methods = {"averaged_inverted_cdf", "inverted_cdf", "linear"}

src/array_api_extra/_delegation.py

mathause · 2025-10-25T05:30:59Z

src/array_api_extra/_delegation.py

+        raise ValueError(msg)
+    nan_policies = {"propagate", "omit"}
+    if nan_policy not in nan_policies:
+        msg = f"`nan_policy` must be one of {nan_policies}"


Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

…uantile

lucascolley · 2025-10-25T17:44:06Z

@cakedev0 would you be able to fill out the PR description a little more to justify API decisions, primarily comparing how the implementation compares to what is available in NumPy and SciPy respectively?

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

cakedev0 · 2025-10-25T19:15:50Z

Added some comparisons with numpy/scipy in the PR description.

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

We are not at this stage yet, but I would be happy to open an "experiment PR" where I replace most calls to np.percentile/np.quantile/_weighted_percentile by calls to xpx.quantile to check that the tests pass. This would help a lot being confident about this PR (in terms of API decisions and implementation correctness).

lucascolley · 2025-10-25T19:38:43Z

Thanks, yeah, that sounds like a good idea.

cakedev0 added 2 commits October 21, 2025 11:58

Draft

a6f6c93

Merge remote-tracking branch 'upstream/main' into quantile

d30bcbf

cakedev0 marked this pull request as draft October 21, 2025 10:05

cakedev0 added 2 commits October 21, 2025 12:06

revert changes to renovate.json

dc236da

revert changes to renovate.json

f92fc4b

lucascolley changed the title ~~FEA: Add quantile function - method="linear", no weights~~ ENH: add quantile function - method="linear", no weights Oct 21, 2025

lucascolley added enhancement New feature or request new function labels Oct 21, 2025

lucascolley mentioned this pull request Oct 21, 2025

ENH: add quantile #341

Closed

cakedev0 added 7 commits October 21, 2025 18:39

untested implem; limited to method="linear"; trying to mimic numpy be…

06e370a

…havior

remove unused imports

dc7a1e5

draft version with some tests that are passing

98fe39f

linting: fix pyright

034c064

linting: fix mypy

05ffb7b

fixed linting

89d8410

WIP: adding support for weights

19fa6ea

This was referenced Oct 22, 2025

ENH: add quantile #340

Open

BUG: nanpercentile/nanquantile does not broadcast weights as the doc suggested numpy/numpy#29709

Open

cakedev0 added 10 commits October 23, 2025 07:37

Weighted quantile; nan-policy; everything mostly works

fa789fc

linting: pyright & mypy

1d8fef7

linting: ruff

26804fe

linting & cleanup

3611708

fix tests for numpy 1.x

7160bae

working on coverage

0b2cb9b

working on coverage

3226659

more coverage

c395b84

fix test

07f7007

second attempt to fix test

e319529

cakedev0 changed the title ~~ENH: add quantile function - method="linear", no weights~~ ENH: add quantile function with weights support Oct 23, 2025

cakedev0 marked this pull request as ready for review October 23, 2025 15:19

cakedev0 added 2 commits October 23, 2025 18:04

more validation

8ab7d62

some more tests

1b48267

cakedev0 mentioned this pull request Oct 23, 2025

[WIP] FIX: fix weighted-nanquantile broadcasting issue. numpy/numpy#30066

Draft

mathause reviewed Oct 25, 2025

View reviewed changes

cakedev0 and others added 3 commits October 25, 2025 11:34

Fix typo in err msg

c71351f

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

avoid sorting a; just sort the weights

ce55335

Merge branch 'quantile' of github.com:cakedev0/array-api-extra into q…

0353767

…uantile

cakedev0 mentioned this pull request Oct 25, 2025

Add Weighted Quantile and Percentile Support to jax.numpy jax-ml/jax#32737

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: add `quantile` function with weights support #494

ENH: add `quantile` function with weights support #494

cakedev0 commented Oct 21, 2025 •

edited

Loading

Uh oh!

lucyleeow commented Oct 21, 2025

Uh oh!

cakedev0 commented Oct 23, 2025 •

edited

Loading

Uh oh!

mathause Oct 25, 2025

Uh oh!

mathause Oct 25, 2025

Uh oh!

cakedev0 Oct 25, 2025

Uh oh!

Uh oh!

mathause Oct 25, 2025

Uh oh!

lucascolley commented Oct 25, 2025

Uh oh!

cakedev0 commented Oct 25, 2025

Uh oh!

lucascolley commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ENH: add quantile function with weights support #494

Are you sure you want to change the base?

ENH: add quantile function with weights support #494

Conversation

cakedev0 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Specs:

Comparison with NumPy:

Comparison with SciPy:

Future use of xpx.quantile in scikit-learn

Uh oh!

lucyleeow commented Oct 21, 2025

Uh oh!

cakedev0 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathause Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

mathause Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

cakedev0 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathause Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

lucascolley commented Oct 25, 2025

Uh oh!

cakedev0 commented Oct 25, 2025

Uh oh!

lucascolley commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ENH: add `quantile` function with weights support #494

ENH: add `quantile` function with weights support #494

cakedev0 commented Oct 21, 2025 •

edited

Loading

Future use of `xpx.quantile` in scikit-learn

cakedev0 commented Oct 23, 2025 •

edited

Loading