-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knuth’s rule fails with simple and small array (eats up system's memory) #11357
Comments
Did you see this error? I see it when I run with 4.3.dev:
|
Nope, no error no warnings. Is that branch accessible? I can not see it here |
You can get 4.3.dev by installing the development version from astropy/astropy/stats/histogram.py Line 16 in 6fd98e5
|
I think I tracked it down to here: astropy/astropy/stats/histogram.py Line 328 in 6fd98e5
But I don't know how to fix it. >>> import numpy as np
>>> from scipy import optimize
>>> from astropy.stats.histogram import _KnuthF, freedman_bin_width
>>> x = np.array([0.05555556, 0, 0, 0, 0, 1, 0, 0, 0, 0.5])
>>> a = np.asarray(x).ravel()
>>> dx0, bins0 = freedman_bin_width(a, True)
>>> dx0
0.03867991004116571
>>> bins0
array([0. , 0.03867991, 0.07735982, 0.11603973, 0.15471964,
0.19339955, 0.23207946, 0.27075937, 0.30943928, 0.34811919,
0.3867991 , 0.42547901, 0.46415892, 0.50283883, 0.54151874,
0.58019865, 0.61887856, 0.65755847, 0.69623838, 0.73491829,
0.7735982 , 0.81227811, 0.85095802, 0.88963793, 0.92831784,
0.96699775, 1.00567766])
>>> len(bins0)
27
>>> knuthF = _KnuthF(a)
>>> zzz = optimize.fmin(knuthF, len(bins0), disp=True) # This takes a long time
Optimization terminated successfully.
Current function value: -11.814026
Iterations: 63
Function evaluations: 153
>>> zzz
array([45298336.05000024])
>>> M = zzz[0]
>>> bins = knuthF.bins(M)
>>> len(bins) # over 42 million elements
45298337 |
It looks like something in |
An easy (hacky) fix would be to introduce an error when the number of bins gets too large here astropy/astropy/stats/histogram.py Line 399 in d971b1c
Something like:
|
Why not do the check here (after this line)? astropy/astropy/stats/histogram.py Line 328 in d971b1c
Then you can actually have |
Because the issue happens inside the |
The issue is that for this particular instance of |
Since this log-likelihood function is one-dimensional, it would be more sensible to just evaluate it in a bounded grid of number of bins and then pick the value of |
@mirca could you write a quick example of how your fix would work? |
@Gabriel-p The solution I propose is to replace astropy/astropy/stats/histogram.py Line 328 in d971b1c
by an integer grid search over knuthF. |
@mirca sounds reasonable, but I'm worried that it could slow down the evaluation. Over what domain do you propose to run the grid search? From 1 to 1000000? Do you propose to use |
@Gabriel-p Even if that is case, I think it's better than the current code which tries to optimize a highly nonconvex function that can possibly have no minimum. A grid search with a sensible choice for the upper-bound on the number of bins seems a reasonable solution. What do you think? PS: The max number of bins could even be an input parameter with some sensible default. |
Ok, I've been running some tests with N=50 N=500 Two things are apparent in these simple tests:
What do you guys think?
|
Description
The
knuth_bin_width
is not able to handle a small and simple array.Expected behavior
A histogram should be generated, or an error shown explaining why it was not possible to obtain it.
Actual behavior
The function starts to gobble up the system's memory.
Steps to Reproduce
System Details
The text was updated successfully, but these errors were encountered: