Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect histogram_ bounds check #163

Closed
GregorySchwartz opened this issue Jun 11, 2020 · 8 comments
Closed

Incorrect histogram_ bounds check #163

GregorySchwartz opened this issue Jun 11, 2020 · 8 comments

Comments

@GregorySchwartz
Copy link

The data is too large for me to narrow down, but I can try. However, I get

/Data/Vector/Generic/Mutable.hs:697 (read): index out of bounds (10,10)

with a bin size of 10 for data ranging from 0.0 to 747.0564541606117 with the custom set range of those values (I set the range equal to the minimum and maximum of the list). Is there a rounding issue here?

@GregorySchwartz
Copy link
Author

Using ceiling for the upper bounds resolves the issue, so there must be something wrong with the calculation of the last bin.

@Shimuuar
Copy link
Collaborator

Floating point strikes again. Here is reproducer:

> (\hi -> histogram_ 10 0 hi (U.fromList [hi::Double]) :: U.Vector Double) 747.0564541606117
*** Exception: ./Data/Vector/Generic/Mutable.hs:697 (read): index out of bounds (10,10)

Problem is when upper limit of histogram is set to maximum value of sample latter could go to N+1 bin which out of range. I'm not sure how to fix this.

@GregorySchwartz
Copy link
Author

If it's just a floating point issue, then we can assume it's N+1 for this case always? If so, can we just clamp it to the max number of bins?

@Shimuuar
Copy link
Collaborator

Not quite. histogram_ is underspecified for out of range inputs. What should it do in following case?

histogram_ 10 0 1 [2]

In histogram-fill I had special under/overflow bins. Here original semantics should be kept. Anything out of specified range should throw exception. Only question is how to calculate bins

@GregorySchwartz
Copy link
Author

How about clamp if it's within floating point precision error?

@Shimuuar
Copy link
Collaborator

I think that's what have been tried. It didn't quite work out:

https://github.com/bos/statistics/blob/6aedd2dd7c595b308c4a005fec96029fd6df3dbe/Statistics/Sample/Histogram.hs#L78

I think it's viable but requires very accurate implementation

@Shimuuar
Copy link
Collaborator

Not to mention that this approach is plain wrong. It pretends to work for any RealFrac but constant is for Doubles

@GregorySchwartz
Copy link
Author

GregorySchwartz commented Jun 15, 2020

What about a special case of the final bin if the element is equal to the upper bound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants