Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2878: Add groupby for floats #2884

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

stress-tess
Copy link
Member

@stress-tess stress-tess commented Dec 14, 2023

This PR (closes #2878) adds groupby for floats by reinterpretting their bits as a uint (using transmute) and using the resulting uint pdarray for the groupby

Verification that groupby and related methods work on floats as expected

>>> select_from =  ak.linspace(2**65, 2**66, 10)
>>> idx = ak.randint(1, 10, 100)
>>> group_on = select_from[idx]
>>> ak.GroupBy(group_on).count()
(array([4.099276460824345e+19 4.5092041069067796e+19 4.9191317529892135e+19 5.3290593990716482e+19 5.7389870451540828e+19 6.1489146912365167e+19 6.5588423373189513e+19 6.968769983401386e+19 73786976294838206464.00000000000000000]),
 array([9 10 14 10 11 6 17 11 12]))

>>> ak.GroupBy([group_on,group_on]).count()
((array([4.099276460824345e+19 4.5092041069067796e+19 4.9191317529892135e+19 5.3290593990716482e+19 5.7389870451540828e+19 6.1489146912365167e+19 6.5588423373189513e+19 6.968769983401386e+19 73786976294838206464.00000000000000000]),
  array([4.099276460824345e+19 4.5092041069067796e+19 4.9191317529892135e+19 5.3290593990716482e+19 5.7389870451540828e+19 6.1489146912365167e+19 6.5588423373189513e+19 6.968769983401386e+19 73786976294838206464.00000000000000000])),
 array([9 10 14 10 11 6 17 11 12]))

>>> ak.sort(group_on)
array([4.099276460824345e+19 4.099276460824345e+19 4.099276460824345e+19 ... 73786976294838206464.00000000000000000 73786976294838206464.00000000000000000 73786976294838206464.00000000000000000])

>>> ak.argsort(group_on)
array([2 14 16 ... 42 67 71])

>>> ak.coargsort([group_on,group_on])
array([2 14 16 ... 42 67 71])

>>> ak.unique(group_on)
array([4.099276460824345e+19 4.5092041069067796e+19 4.9191317529892135e+19 5.3290593990716482e+19 5.7389870451540828e+19 6.1489146912365167e+19 6.5588423373189513e+19 6.968769983401386e+19 73786976294838206464.00000000000000000])

>>> ak.GroupBy(group_on).unique_keys
array([4.099276460824345e+19 4.5092041069067796e+19 4.9191317529892135e+19 5.3290593990716482e+19 5.7389870451540828e+19 6.1489146912365167e+19 6.5588423373189513e+19 6.968769983401386e+19 73786976294838206464.00000000000000000])

This PR (closes Bears-R-Us#2878) adds groupby for floats by reinterpretting their bits as a uint and running on that
@stress-tess stress-tess added this pull request to the merge queue Dec 14, 2023
Merged via the queue into Bears-R-Us:master with commit b761a88 Dec 14, 2023
12 checks passed
@stress-tess stress-tess deleted the 2878_float_groupby branch January 4, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add groupby for floats
2 participants