-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimethods for statistical functions #70
Multimethods for statistical functions #70
Conversation
That's a bug. We should probably add that.
Just do
That seems perfect to me. |
I've implemented the default for
If this is okay I will implement other defaults similarly. |
Unfortunately, not. |
Not entirely sure how to do the defaults. I'm guessing that they have to be implemented with respect to some other method since item assignment is undesirable. Regarding the traversing of the dimensions, I think it could be done moving the given axes with |
In general, I'd follow the following pattern:
For median, sort and then apply the selection. For mean, it's just the For For the |
When you mention reshape with |
That's correct. |
The last commit fixes this.
This was added as well. |
@hameerabbasi The last commit refactors The implementation is also a mashup of these two functions. Also, I think eventually the if/else part can be refactored further into a helper function that works for other reduce methods such as If this seems alright to you I can start implementing the other defaults. |
LGTM so far, just one minor comment. |
2d2ca56
to
4d87111
Compare
91b7cfa
to
3b39aa7
Compare
There's a few issues with the defaults:
In [2]: a = onp.asarray([[10, onp.nan, 4], [3, 2, 1]])
In [3]: onp.sort(a, axis=1)
Out[3]:
array([[ 4., 10., nan],
[ 1., 2., 3.]])
In [4]: onp.median(a, axis=1)
Out[4]: array([nan, 2.]) In this example, the
|
For 1: Use |
The default's results are commented: In [2]: a = onp.asarray([[1, 2], [3, 4]])
In [3]: onp.var(a, ddof=1)
Out[3]: 1.6666666666666667 # -1.1111111111111125
In [4]: onp.var(a, axis=0, ddof=1)
Out[4]: array([2., 2.]) # array([ -6. -16.])
In [5]: onp.var(a, axis=1, ddof=1)
Out[5]: array([0.5, 0.5]) # array([ -4. -24.]) |
Can you merge |
Is anything remaining here? |
No, I think I added all that I wanted. You can merge if you think everything is okay. 😄 |
dims[ax] = 1 | ||
dims = tuple(dims) | ||
|
||
a = transpose(a, unselected_axis + axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you transpose and call functions with axis=-1
when you could just call the basic functions with axis=axis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested that during the meeting to make the implementation simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken it was necessary for median
's default which led me to using it on the other reduce methods as well. But now that you point that out it's possible that the only method that needs it is median
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove _reduce
from the methods where it's not needed.
unumpy/_multimethods.py
Outdated
mask = any(isnan(a), axis=-1).reshape(a.shape[0:-1] + (1,)) | ||
a = where(mask, nan, a) | ||
|
||
a = sort(a, axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use partition
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's mostly for simplicity. Is partition
more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be slightly more efficient as it doesn't completely sort the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced sort
with partition
as you suggested.
unumpy/_multimethods.py
Outdated
N = 0 | ||
for ax in axis: | ||
N += a.shape[ax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N = 0 | |
for ax in axis: | |
N += a.shape[ax] | |
N = 1 | |
for ax in axis: | |
N *= a.shape[ax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you might want a _prod
helper so you can do:
N = _prod(a.shape[ax] for ax in axis)
Since this comes up in a few different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now.
unumpy/_multimethods.py
Outdated
N = 0 | ||
for ax in axis: | ||
N += a.shape[ax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here.
I think this can be merged now. |
Thanks for the excellent work, @joaosferreira! |
This pull request adds multimethods for statistical functions. The multimethods added are the following:
Order statistics
percentile
nanpercentile
quantile
nanquantile
Averages and variances
median
average
mean
nanmedian
nanmean
nanstd
nanvar
Correlating
corrcoef
correlate
cov
Histograms
histogram
histogram2d
histogramdd
bincount
histogram_bin_edges
digitize
A few things to discuss:
dtype
not being dispatched instd
andvar
? Other multimethods that use_reduce_argreplacer
also don't dispatchdtype
.min
andmax
seem to be equal toamin
andamax
that this PR intends to add:median
that reduce array slices along an axis might be the easier ones for now. As I understand these require a for loop over the given axis, so in terms of complexity it would be O(n) where n is the length of that dimension.