New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hist normed=True problem? #206
Comments
I can't reproduce this at 3419eb8 with from pylab import *
npts = 900
nbins = 200
b = 1.0005509853363037
a = 0.99959301948547363
d = (b - a) * randn(npts) + a
pdf, bins, patches = hist(d, nbins, normed=True) The maximum value on the Can anyone else? Might the bug in |
@louking: Hopefully you are the same louking on sourceforge. Do you remember raising this 18 months ago? Is @dmcdougall missing anything with his attempt to reproduce? @dmcdougall: Thanks for taking the time to get a small example to reproduce. I will close this issue as fixed in a couple of weeks if we can't get in touch with @louking. |
Yes, I am the same louking. I remember raising this. Now that I am re-looking at this, it is possible that I made a mistake raising the issue. Unfortunately, I do not have access to the data I used to get the results which, at the time, I thought were wrong. I appreciate you taking the time to look at this. Edited to add: I am having some confusion on pdf meaning with very small range, and am doing some investigation. Please keep this open until I respond further, or two weeks have passed. Thanks. |
I did my investigation. Since I don't have the old data, I cannot prove whether the integral was greater than 1. If you are saying that the integral of this data == 1, then I guess we can close this ticket. |
All the bins are the same width, so: In [8]: h = bins[1] - bins[0]; print h
2.99437080707e-05
In [9]: area = 0
In [10]: for i in range(len(pdf)):
....: area += pdf[i] * h
....:
In [11]: print area
1.0 :) |
Original report at SourceForge, opened Mon May 9 09:36:46 2011
I have a series of 900 points, all very close to 1.000 (e.g., (min(dataList),max(dataList)) = (0.99959301948547363, 1.0005509853363037). for numBins = 200,
produces histogram with very large Y axis numbers (e.g., 18892 for 99 occurrences, per the formula in the matplotlib documentation:
(Pdb) dbin = (max(dataList)-min(dataList))/numBins
(Pdb) dbin
4.7898292541503906e-06
(Pdb) 1/(len(dataList)_dbin)
190.83702861801964
(Pdb) 99/(len(dataList)_dbin)
18892.865833183947
Yes, I agree the dbin is very small, but should that cause this behavior?
SourceForge Comments
On Mon May 9 10:33:32 2011, louking wrote:
Looking at this a bit more, I think the following:
In the case where the bin sizes are the same, the Y axis should be
n / sum(dataList)
And in the case where the bin sizes are different, the Y axis should be
(n / sum(dataList) * weight_i
where
weight_i = dbin_i / mean(dbin)
SourceForge History
The text was updated successfully, but these errors were encountered: