New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats.effhist - estimate efficiency and uncertainties #754
Conversation
These look great and is well tested, but my only pause is that I'm not sure what we settled on for what gets in |
The binomial confidence interval is something fundamental and widely usable that ought to be in something like scipy, but isn't, so making it available in astropy.stats seems like a good idea for the time being. Apropos of that -- Is there a particular reason you're using a flat prior for the binomial confidence interval? I ask because previous discussions of a Bayesian approach to the problem seem to use a Jeffreys prior. (As a side comment, using the Wilson [1927] confidence intervals would probably be just as accurate, and wouldn't require scipy, since it's just a simple bit of algebra.) |
@eteq It may belong in the core package. But if we decide @perwin Both good questions. You're probably right that this should use a Jeffreys prior (I don't really know, but that's what Bayesian experts seem to think...). Also, the confidence interval I'm estimating is reasonably well-motivated, but seems non-standard. Perhaps I should use a more "standard" one, such as the Jeffreys interval (which would also use I think this sort of thing isn't included in |
My perspective is largely based on Brown, Cai, & Dasgupta (2001, Statistical Science); you can find a PDF of the article here: I have a slight preference for the Wilson interval mainly because it's relatively easy to compute and doesn't require any additional libraries, but if people are happier with a Bayesian derivation and don't mind requiring the scipy library, there's not much reason to object. (I have Python code for computing the Wilson interval, but I haven't bothered properly generalizing it to handle arbitrary percentages for the interval; it only works with multiples of the 68.3% interval.) Given how many astronomers are still using the "standard" Gaussian approximation (what Brown et al. refer to as the "Wald interval"), pretty much any improvement would be a good thing! It might be useful to do some testing to see if the confidence intervals vary appreciably between using the Jeffreys prior and the uniform prior (and the Wilson intervals); if the variation is slight, then in most cases people won't care which they're using, as long as it's clearly better than the Wald interval. |
I like the idea of an |
@perwin Thanks for the reference! I'll have a closer look at it. One question I have about the Jeffreys interval: it seems like for k=0 and k=n, the interval contains 1 - alpha/2 rather than 1 - alpha (alpha being, e.g. 0.317 for I like the idea of an But since no one here is arguing that, I'm going to go ahead and implement the kwarg. I think |
Just a minor comment - can we think of a better/more explicit name than |
👍 to |
"detection" might be too specific. This works for anything where you are estimating probability of an outcome as a function of some parameter (for example, the probability of passing some cut). How about cc @sosey because she mentioned completeness studies in the photometry API discussion. |
@kbarbary -- sounds like a good idea. (I agree that a single function with an option to specify how the interval is calculated is simpler and the way to go.) I'm not sure I know the proper answer to your question about the Jeffreys interval definition (but presumably you could code up both version and test which matches the Wilson interval better!). |
I've updated this PR in response to the comments. Rebuilt docs are in the same place as before: http://kbarbary.github.com/astropy/stats/index.html Two notes:
|
Desired probability content contained in confidence interval | ||
(p - perr[0], p + perr[1]). Default is 0.68269. | ||
interval : {'wilson', 'jeffreys'}, optional | ||
Formula used for confidence interval. Default is 'wilson'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should add a sentence here that directs users to the binom_conf_interval
docstrings for a detailed description of what these intervals mean? When I first looked at it, I thought you hadn't documented that. I see that the link is below, but if someone reads this over without background info, they won't necessarily understand that that's where they should go for the background needed to understand.
It's fine to use Can you add a And what about adding a 'gaussian' or 'normal' to the list of |
@eteq Excellent comments, thanks. The plot directive is especially cool. I think the example is a lot better now as well. I implemented all your suggestions with the exception of the 'gaussian' / 'normal' (or 'wald') interval. I personally want to demote its use as much as possible, so I think that it should not even be an option, lest someone use it by accident. It gives nonsensical results, particularly in typical astronomy use cases: When the detection efficiency is near 0 or 1 (see binned efficiency example) the interval gives an uncertainty of 0, even when there has only been a single trial. You can argue about priors and whatnot with the Jeffreys and Wilson and other intervals, but the Wald interval is clearly wrong. In any case, it's only a single line of code
so users can compare it to |
@eteq Before this is merged, should I squash these commits into one? Will that remove the png image from the repo history? (probably doesn't matter much but...) |
@kbarbary - Good point about the commits. I would say don't squish all the commits, just remove the one that added the image in the first place. You should be able to do something like this Does anyone else (@astrofrog or @perwin) have strong opinions on including/not including a 'normal' option? I agree it gives wrong results, but I still think people will want to use it for the "what not to do" sorts of examples, as well as comparing to earlier work that does use it. And anyway, it does work fine for the large n case, right? (even if that's very rare in astronomy) An alternative might be to add another example showing why you should not use the Wald interval. If that example includes your single line of code here, then anyone who actually reads it should have no problem reproducing the same thing (in which case I agree there's less need for it as an explicit option.) |
@eteq when the efficiency is near 0 or 1, the Wald interval gives wrong results even for large n. The example of why not to use it is a good idea though; I'll do that. |
I don't have any strong feelings one way or the other about making the Wald interval an available option (as long as it's not the default!). I agree that not including it at all is potentially a good way of encouraging people not to use it -- and that mentioning/illustrating the disadvantages of the Wald interval is a good idea. As a minor comment, it might be useful to alternate "frequency" with "efficiency" in the discussion/examples for binom_conf_interval, just because people may not realize that it's the same thing. E.g., someone wanting to know what the confidence intervals for the frequency of galaxies with some property measured from their sample may find the term "detection efficiency" somewhat confusing. |
I'm reconsidering the 'wald' interval. If it isn't the default and there's a warning in the docs about why it probably shouldn't be used, then it might be fine to use it as an option. As long as we call it the 'wald' interval and not 'normal' or 'gaussian'. I don't want a user to see 'normal' and think, "I'm familiar with that; I know what it means... I'll use that." (and having it as an option in As for terminology @perwin, I was battling with this a bit. You're right that "efficiency" doesn't really apply in many potential use cases. I see how you're using "frequency" but "frequency of galaxies with some property" could mean something else, such as number per unit volume. I think the most general term is probably "proportion", but I was worried that this sounds a bit more opaque to a common user, so I tried to use both "proportion" and "efficiency" in places. Do you think |
@eteq I implemented the Wald interval and removed the image commit. Let me know what you think. Docs are getting long! http://kbarbary.github.com/astropy/stats/index.html Still to do:
|
Looks great - long docs are a good thing, especially for statistics! |
@kbarbary -- sure, "proportion" is probably just as good as "frequency" (and the Wikipedia entry on binomial statistics seems to use "proportion", so there's that...). The revised documentation for binom_conf_interval looks good! |
From the name, it's not apparent to me what |
@kbarbary, I guess I would prefer binned_proportion, since it seems more straightforwardly descriptive (my first reaction to the initial code was "I wonder what an 'efficiency histogram' is..." -- I guess this is a particular usage that makes perfect sense to you, but it is not something I've come across). I'll admit to still being a little confused as to what that function is actually doing... maybe a slightly simpler example of usage? |
When I think about it a bit more, I think a more fully descriptive name is better to remove the ambiguity of the word "proportion." Something like |
-------- | ||
Suppose we wish to estimate the detection efficiency of an astronomical | ||
source as a function of magnitude using a Monte Carlo experiment with | ||
100 trials between magnitude 20. and 30. Suppose the true 50% detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misplaced period here, I think. Also, above, I think you mean "efficiency of a survey in detecting an astronomical source", right? Right now it sounds like you're estimating how good some star is at detecting things :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will change.
@perwin I'll explain my use case for For supernova surveys, we often want to find the efficiency at which SNe are detected as a function of magnitude. To do this, I'd insert a bunch of "fake stars" with magnitudes spread over a wide range at the start of the data reduction pipeline. Then at the other end of the pipeline, I check whether each fake star was detected or not. So I have a list of magnitudes and a list of True/False values ("detected"/"not detected"). |
@eteq Name-wise, I think we should stick with |
@kbarbary -- Ah, OK, that makes more sense now. It wasn't quite clear to me that you were doing something like "start with a set of data points where x values are continuously distributed and y values are binomial, then bin them according to x value, then calculate proportions and confidence intervals" -- right? I could still argue that there are use cases for which the term "efficiency" seems a bit odd -- e.g., a lot of the plots in a couple of recent papers I was co-author on were just this kind of histogram w/ binomial confidence intervals, but we were plotting "fraction of galaxies in each bin with property X" (spiral morphology, detected bar, AGN spectrum) vs e.g. galaxy absolute magnitude. |
Two comments on the current help text for binned_efficiency() [as seen on the current documentation page]:
that should be "number of successes / total number", no? (Or even just "fraction of successes"?)
"Suppose we wish to estimate the detection efficiencies for astronomical sources as a function of magnitude (i.e., the probability of detecting a source given its magnitude). In a realistic case, we might prepare a large number of sources with randomly selected magnitudes, inject them into simulated images, and then record which were detected at the end of the reduction pipeline. As a toy example, we generate 100 data points with randomly selected magnitudes between 20 and 30 and "observe" them with a known detection function (here, the error function, with 50% detection probability at magnitude 25):" |
@perwin Exactly right. I should add "continuously distributed" to the description for
|
I guess I still see "efficiency" and "proportion" as fairly ambiguous words that could mean multiple things (although Although I admit this name stuff is all quibbling, as people probably need to read the docs for this one, anyway... |
Well, as @perwin's example points out, it's not always going to be describing a success/failure scenario. |
Well, to me that's still a "success"/total ratio, in the statistical sense of the word. That is, fraction with property X might be treated as the "success" rate of that property. That's actually why I suggested "bernoulli trial" before - it gets across the stats concept without necessarily tying it directly to a specific example (which is good in the docs, but not necessarily in the name). But it is becoming clear to me that this is probably just we each have slightly different hazy memories of the language used in our earliest stats textbook or something :) And that means there may not be any good answer. So having said my piece (and perhaps a bit more...), I'll leave it to you to contemplate the discussion and I'm fine with whatever you settle on. |
Thanks for all the comments both of you. I've updated docstrings according to the comments, and implemented basic tests for the wald interval. Naming: I agree @eteq there is probably not a name that satisfies: (1) unambiguous (2) completely general (3) not opaque (4) not > 80 characters :) . I changed the name to If you look at the built docs table: http://kbarbary.github.com/astropy/stats/index.html at least the descriptions for the two functions match up nicely. As far as I'm concerned this is ready to merge. |
>>> plt.errorbar(bins, p, xerr=binshw, yerr=perr, ls='none', marker='o', | ||
... label='estimate') | ||
|
||
.. plot:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if maybe you should add a title to this plot (and perhaps the above one)? The text above this is easily confused with the previous plot. Maybe even title this one "Why the Gaussian approximation is bad" so it's obvious you're anti-recommending it?
Aside from that possible one-line change I just suggested, looks good to me. I'll give a day or so and then merge, in case anyone else wants to comment and/or you want to do something regarding the above comment. |
Looks pretty good to me... |
@eteq good idea. I added titles and also added to the discussion before the second plot example. http://kbarbary.github.com/astropy/_generated/astropy.stats.funcs.binned_binom_proportion.html |
Looks great! As I said, I'll give it a day and merge if there's nothing else. |
Whoops, it appears travis is unhappy because it doesn't have matplotlib, so it can't do the I'll have to take some time to fiddle with the travis configuration to make this work. It should be fine to just install matplotlib - if so, I'll merge this alongside a commit to add that. |
@eteq - note that if you want matplotlib to work on Travis, you'll have issues with Python 3 unless you use the latest master: matplotlib/matplotlib#1781 (comment) |
@astrofrog - I think that's not an issue because our sphinx build is on 2.7, but I'll test it on my fork before merging anything |
See https://travis-ci.org/eteq/astropy/builds/5148687, which passed. The sphinx build now takes quite some time (~25 minutes) because it now needs scipy, so I moved it higher up in the build order. So I'll merge that into master, which should close this. Thanks @kbarbary! |
…est fixes also included from eteq/effhist
This implements a function
stats.effhist()
that is very useful for estimating detection efficiency as a function of some parameter (e.g., magnitude) in Monte Carlo experiments. Importantly, reasonable uncertainties on the detection efficiency are calculated. This is done with a helper functionstats.binom_conf_interval()
.These functions both require
scipy
.Built docstrings for the two functions can be viewed here:
http://kbarbary.github.com/astropy/stats/index.html