Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hist normed=True problem? #206

Closed
ddale opened this issue Jun 20, 2011 · 5 comments
Closed

hist normed=True problem? #206

ddale opened this issue Jun 20, 2011 · 5 comments
Labels
status: needs clarification Issues that need more information to resolve. status: needs confirmation

Comments

@ddale
Copy link
Contributor

ddale commented Jun 20, 2011

Original report at SourceForge, opened Mon May 9 09:36:46 2011

I have a series of 900 points, all very close to 1.000 (e.g., (min(dataList),max(dataList)) = (0.99959301948547363, 1.0005509853363037). for numBins = 200,

    pdf, bins, patches = ax.hist(dataList, numBins, normed=True)

produces histogram with very large Y axis numbers (e.g., 18892 for 99 occurrences, per the formula in the matplotlib documentation:

(Pdb) dbin = (max(dataList)-min(dataList))/numBins
(Pdb) dbin
4.7898292541503906e-06

(Pdb) 1/(len(dataList)_dbin)
190.83702861801964
(Pdb) 99/(len(dataList)_dbin)
18892.865833183947

Yes, I agree the dbin is very small, but should that cause this behavior?

SourceForge Comments

On Mon May 9 10:33:32 2011, louking wrote:

Looking at this a bit more, I think the following:

In the case where the bin sizes are the same, the Y axis should be

n / sum(dataList)

And in the case where the bin sizes are different, the Y axis should be

(n / sum(dataList) * weight_i

where

weight_i = dbin_i / mean(dbin)

SourceForge History

  • On Mon May 9 09:37:01 2011, by louking: File Added: 411137: loutest.csv-plot-probability.png
@dmcdougall
Copy link
Member

I can't reproduce this at 3419eb8 with numpy version 1.6.2 and the following code:

from pylab import *
npts = 900
nbins = 200
b = 1.0005509853363037
a = 0.99959301948547363
d = (b - a) * randn(npts) + a
pdf, bins, patches = hist(d, nbins, normed=True)

The maximum value on the y axis is about 700:

histogram picture.

Can anyone else? Might the bug in numpy circa version 1.4 regarding normed histograms be the cause of this?

@pelson
Copy link
Member

pelson commented Aug 27, 2012

@louking: Hopefully you are the same louking on sourceforge. Do you remember raising this 18 months ago? Is @dmcdougall missing anything with his attempt to reproduce?

@dmcdougall: Thanks for taking the time to get a small example to reproduce. I will close this issue as fixed in a couple of weeks if we can't get in touch with @louking.

@louking
Copy link

louking commented Aug 27, 2012

Yes, I am the same louking. I remember raising this.

Now that I am re-looking at this, it is possible that I made a mistake raising the issue. Unfortunately, I do not have access to the data I used to get the results which, at the time, I thought were wrong.

I appreciate you taking the time to look at this.

Edited to add: I am having some confusion on pdf meaning with very small range, and am doing some investigation. Please keep this open until I respond further, or two weeks have passed. Thanks.

@louking
Copy link

louking commented Aug 27, 2012

I did my investigation. Since I don't have the old data, I cannot prove whether the integral was greater than 1. If you are saying that the integral of this data == 1, then I guess we can close this ticket.

@dmcdougall
Copy link
Member

All the bins are the same width, so:

In [8]: h = bins[1] - bins[0]; print h
2.99437080707e-05

In [9]: area = 0

In [10]: for i in range(len(pdf)):
   ....:     area += pdf[i] * h
   ....:     

In [11]: print area
1.0

:)

@efiring efiring closed this as completed Aug 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs clarification Issues that need more information to resolve. status: needs confirmation
Projects
None yet
Development

No branches or pull requests

5 participants