Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different result compared to numpy #61

Open
d5423197 opened this issue Jan 18, 2023 · 4 comments
Open

different result compared to numpy #61

d5423197 opened this issue Jan 18, 2023 · 4 comments

Comments

@d5423197
Copy link

d5423197 commented Jan 18, 2023

Hello there,

I am trying to use this repo to replace numpy but get different result.

I put range as the minimum of the input and the maximum of the input. But I found out that the result is missing some maximum value.

For example,

test_case = np.array([1, 1, 2, 2, 3, 3, 10, 10]) freq, bins = np.histogram(test_case, range(np.min(test_case), np.max(test_case + 1))) result = histogram1d(test_case, bins=9, range=(np.min(test_case), np.max(test_case)))

@d5423197 d5423197 changed the title different result different result compared to numpy Jan 18, 2023
@d5423197
Copy link
Author

Is this repo still maintained?

@d5423197
Copy link
Author

d5423197 commented Jan 19, 2023

For numpy 1d histogram function, if you set bins as 10, the returned hist would be length of 9. But for fast histogram 1d function, if you set bins as 10, the returned hist would be length of 10 which is inconsistent.

test_case = np.array([1, 1, 2, 2, 3, 3, 10, 10])
freq, bins = np.histogram(test_case, bins=range(np.min(test_case), np.max(test_case + 1)))
test = np.bincount(test_case, minlength=9)
result = histogram1d(test_case, bins=10, range=(np.min(test_case), np.max(test_case)))
result_1 = histogram1d(test_case, bins=9, range=(np.min(test_case), np.max(test_case) + 1))
result_2 = histogram1d(test_case, bins=10, range=(np.min(test_case), np.max(test_case) + 1))

I realized that fast histogram set the upper range as excluded which is inconsistent with numpy. Correct me if I am wrong.

I have tried many ways. The result_2 is the closest one but with a length of 10.

I really want to replace numpy histogram with a fast histogram. But I need the same result.

@astrofrog
Copy link
Owner

Yes this is still maintained - will respond soon!

@astrofrog
Copy link
Owner

@d5423197 if you are trying to bin integers, I highly recommend using np.bincount - what you are seeing here is a subtle difference between Numpy and fast-histogram which is that indeed if a value is exactly the same as the upper bound of the range then it will not be included in fast-histogram (this is for performance). If you prefer not to use np.bincount (which should be the fastest if you really are trying to bin integers) then another option is to add a tiny value to the upper end of the range when calling fast-histogram, e.g, instead of binning from 0 to 10 you would bin from 0 to 10 + 1e-30 or similar. Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants