Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
fix ordering of quantiles in describe. #4647
Previously dataframe.describe was using the built-in set to sort percentiles which doesn't sort properly. This led to unpredictably wrong results. For example percentiles=[0.25, 0.5, 0.75, 0.99] was reordered by set to [0.25, 0.5, 0.99, 0.75] which broke dataframe.quantiles which assumes quantiles are sorted by value.
After this patch the built-in sorted will be used to sort quantiles.
referenced this pull request
Mar 29, 2019
Appears to fail on flake8 linting:
Yesterday's patch, #4650, should help this problem but this patch is still important. Currently in
percentiles = list(set(sorted(percentiles + [0.5])))
But look what happens if
A = dask.array.arange(101) s = dask.dataframe.from_dask_array(A) s.describe(percentiles=np.array([0.25])).compute()
It's calculated the 75'th% instead of the 25'th% because numpy added [0.5] instead of concatenating like
I'm off on vacation without a computer. Back in a week.
Hi, I'm back.
As a user I'm certainly surprised whenever a function in the scipy ecosystem doesn't accept a 1d ndarray when a 1-d container is needed but I admit to having had to google "list-like." Is there a tight definition for this in dask? It turns out Pandas has a function
It looks new to me. I agree with Martin's assessment that it's probably due to the 1.1.0 release. I'm slammed today. Does anyone else have time to resolve this?…
On Tue, Apr 9, 2019 at 1:10 PM Martin Durant ***@***.***> wrote: (Bokeh 1.1.0 was released today, with a lot of layout refactoring) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4647 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszBa3ED5aWBf90buHxcfARawIli06ks5vfNeogaJpZM4cR-bk> .