Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boundary parameter handling #283

Open
1 of 3 tasks
simonbyrne opened this issue Sep 16, 2014 · 18 comments
Open
1 of 3 tasks

Boundary parameter handling #283

simonbyrne opened this issue Sep 16, 2014 · 18 comments
Labels

Comments

@simonbyrne
Copy link
Member

simonbyrne commented Sep 16, 2014

How should we handle parameters which lie on the boundary? e.g.

In most cases the limits end up being Dirac measures, though in some cases there can be ambiguity (e.g. Beta(0,0)).

If we do include these, we also need to decide how to handle skewness/kurtosis: presumably either NaN or throw an error.

Update:
Don't allow this for continuous distributions. Distributions that need update:

  • Poisson
  • Geometric
  • NegativeBinomial
@andreasnoack
Copy link
Member

As argued on the list, I don't see the problem in extending the Poisson distribution to include λ=0. I doubt that skewness/kurtosis will be used for anything in the degenerate case anyway so whether they return Inf, NaN or an error is less important to me. I think the limits (Inf) are nicest and that it is okay that degenerate distributions return different results for skewness/kurtosis depending on which distribution they degenerate from.

I'd say let's wait and see for a demand before changing Beta and Normal. It is a bit more dramatic to go from continuous to discrete so maybe is not as useful as the Poisson case.

@simonbyrne
Copy link
Member Author

That seems like a reasonable idea. This would also match the handling of Binomial.

@nalimilan
Copy link
Member

I agree with @andreasnoack. As long as the limits are clearly defined, there's no reason to raise an error.

@spaceLem
Copy link

As the one who brought it up, I'm for the change. It's a more useful result than an error (at least to me and other modellers who are likely to encounter it), and it makes sense in the limit lambda -> 0. Also it matches behaviour in Matlab, Octave, R, and the GSL (although not Scipy).

@jiahao
Copy link
Member

jiahao commented Sep 16, 2014

The mailing list thread had a question about the limiting skewness of Poisson distributions. One can do a more careful derivation, but empirically it looks like the limit is well defined as positive infinity:

julia> for i=1:15
        P = Poisson(10.0^-i)
        println(i,"\t", skewness(P))
       end
1   3.162277660168379
2   10.0
3   31.622776601683796
4   100.0
5   316.2277660168379
6   1000.0
7   3162.277660168379
8   10000.0
9   31622.776601683792
10  99999.99999999999
11  316227.76601683797
12  1.0e6
13  3.1622776601683795e6
14  1.0e7
15  3.1622776601683795e7

@johnmyleswhite
Copy link
Member

As the person who's most worried about this proposed change, I'd like to argue for making this kind of change much more systematically. What worried me about the original proposal is that it seemed to only offer a small bit of convenience at the potential expense of formal correctness in lots of other computations.

In general, I've come to strongly prefer making decisions about core packages based on sweeping principles of design that dictate how the package should behave no matter what specific case is being considered. As @simonbyrne points out, there are many other boundary cases we should consider before deciding to allow Poisson(0).

For all of those cases, we might adopt the design principle that whenever a boundary condition exists and has a clear well-defined limit, we adopt the value at the limit as the value for that boundary condition. In particular, whenever a boundary condition is equivalent to a Dirac measure, we produce outputs equivalent to those we would produce for a hypothetical Dirac measure distribution type.

If do we adopt that kind of design principle, I'd like to make sure we apply it systematically and not wait for someone to complain about inconsistencies in how we handle different distributions.

I suspect this principle could affect a lot of other distributions, including at least:

  • Binomial with zero counts
  • Cauchy with 0 scale
  • Dirichlet with some alpha = 0
  • DiscreteUniform with lower bound = upper bound
  • Exponential with scale = 0
  • Gamma with shape = 0 and scale = 0
  • Geometric with p = 0 or p = 1
  • InverseGamma with shape = 0 and scale = 0
  • Laplace with scale = 0
  • Logistic with scale = 0
  • LogNormal with log standard deviation = 0
  • NegativeBinomial with p = 0
  • Uniform with lower bound = upper bound

So I'd say that, if we're going to embrace boundary conditions, we should really embrace them and figure out how this design principle would impact everything in Distributions.

@andreasnoack
Copy link
Member

Okay, let me try to break the filibuster attempt. If we were paid to spend all our time on Distributions I think your proposal is reasonable. However, our resources are scarce so we should try to devote them where they make most use. I don't think this allows much time spend on going through all the methods of the Gumbel distribution for a zero scale parameter.

@spaceLem proposed a small change to the Poisson distribution which would make it a bit easier to use in an application and I don't believe that the change will give problems elsewhere.

A compromise could be to extend the discrete distributions only. I think it makes sense because, as argued on the list, the change from continuous to a point measure is more dramatic and, I think, less relevant.

@StefanKarpinski
Copy link

Another way to look at this issue when to indicate a problem for certain values of distribution parameters. There are some values that are all around useless and should cause an error immediately. Others, like those being discussed here seem to be ok or not depending on the question one then asks about the resulting distribution object. In such cases, it seems reasonable and in line with Julia's dynamic nature to allow construction and sensible questions and defer errors to until the wrong question is actually asked. It also seems like for a lot of these questions there's an arguably correct non-finite answer.

@StefanKarpinski
Copy link

Also, middle ground between handling cases in ad hoc fashion and implementing it all at once, consistently: figure out a good principle and implement some cases, but don't try to deal with all of them right away.

@nalimilan
Copy link
Member

That's what I was going to suggest. @johnmyleswhite criteria are good, but we can wait for actual use cases to come up before implementing them. Starting with common cases is a good strategy.

@StefanKarpinski
Copy link

Yeah, having a coherent policy means that any time the issue comes up, everyone knows what to do.

@lindahua
Copy link
Contributor

lindahua commented Nov 8, 2014

I have no problem with allowing zero rate for Poisson. However, allowing zero scale for continuous distributions seems to be a more complex problem. Dealing with atomic distributions is nontrivial. How can you tell an infinite density with probability mass 1.0 from that with probability mass 0.5?

@lindahua
Copy link
Contributor

lindahua commented Nov 8, 2014

Also, now Poisson distributions depend on Rmath. Does R support zero rate?

@andreasnoack
Copy link
Member

> rpois(10, 0)
 [1] 0 0 0 0 0 0 0 0 0 0

@lindahua lindahua added this to the v0.8 milestone Jul 29, 2015
@richardreeve
Copy link
Contributor

I've submitted pull request #398 to fix the Poisson(0) issue.

johnmyleswhite added a commit that referenced this issue Aug 4, 2015
Fixing Boundary parameter handling #283 for Poisson(0)
@richardreeve
Copy link
Contributor

The Poisson(0) issue is now fixed since pull requests #398 and #401 have been merged.

@lindahua lindahua removed this from the v0.9 milestone Feb 4, 2017
@simonbyrne
Copy link
Member Author

I would be keen to allow Normal with std dev = 0, as it comes up fairly often.

simonbyrne added a commit that referenced this issue Oct 26, 2018
Part of #283. I have hit this issue a few times: it can be very handy, especially for simulation.

See also JuliaStats/StatsFuns.jl#62, which tweaks the underlying functions to be more useful.
@itsdebartha
Copy link
Contributor

itsdebartha commented Mar 6, 2024

I would very much like to incorporate Geometric with the success parameter p=1. I came across a situation in a simulation study of a response-adaptive treatment allocation and encountered some error relating to zero(p) < p < one(p) when an allocation probability becomes 1. Moreover, I think including p=1 will be generalising this distribution a bit more.

Am willing to create a PR if this seems a satisfactory addition to the people here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants