Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ajr/negativebinomialvariants #1536

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

andrewjradcliffe
Copy link

This introduces 3 new distribution types, corresponding to
re-parameterizations of the NegativeBinomial distribution as

  1. location-scale Negative binomial distribution (alternative paramerization), Stan
  2. loglocation-scale Negative binomial distribution (log alternative parameterization), Stan
  3. shape-scale Negative binomial distribution, Bayesian Data Analysis (3rd edition), Appendix A

These have a number of uses. In particular, 1. and 2. emphasize the
that the negative binomial distribution is a robust form of the Poisson,
which has noted utility for negative binomial regression involving
offsets (a model frequently employed in epidemiology, failure
modeling). The shape-scale parameterization confers superior
numerical stability, which can be useful when working with rare
events (particularly when one wishes to draw samples).

In particular, for rare events, the alternative parameterizations
are robust in the presence of values which might otherwise suffer
from floating point roundoff using the NegativeBinomial.
This can be demonstrated by considering the conversion of a
location-scale NegativeBinomial2 to NegativeBinomial

using Plots
gr(size=(600,400))
f(μ, ϕ) = ϕ / (μ + ϕ)
p1 = contour(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);
p2 = heatmap(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);

I write the above not to malign the choice of the NegativeBinomial
in terms of (r, p), but to emphasize that there are real
advantages to using the alternative parameterizations. Naturally,
conversions are provided between the parameterizations, so that
the experience is seamless.

Notes:

  • ] test Distributions revealed no issues.
  • Documentation additions seem to work without issue.

This introduces 3 new distribution types, corresponding to
re-parameterizations of the NegativeBinomial distribution as
1. location-scale [Negative binomial distribution (alternative
paramerization),
Stan](https://mc-stan.org/docs/2_29/functions-reference/nbalt.html)
2. loglocation-scale [Negative binomial distribution (log alternative
parameterization),
Stan](https://mc-stan.org/docs/2_29/functions-reference/neg-binom-2-log.html)
3. shape-scale [Negative binomial distribution, Bayesian Data
Analysis (3rd edition), Appendix
A](http://www.stat.columbia.edu/~gelman/book/BDA3.pdf)

These have a number of uses. In particular, 1. and 2. emphasize the
that the negative binomial distribution is a robust form of the Poisson,
which has noted utility for negative binomial regression involving
offsets (a model frequently employed in epidemiology, failure
modeling). The shape-scale parameterization confers superior
numerical stability, which can be useful when working with rare
events (particularly when one wishes to draw samples).

In particular, for rare events, the alternative parameterizations
are robust in the presence of values which might otherwise suffer
from floating point roundoff using the `NegativeBinomial`.
This can be demonstrated by considering the conversion of a
location-scale `NegativeBinomial2` to `NegativeBinomial`

```
using Plots
gr(size=(600,400))
f(μ, ϕ) = ϕ / (μ + ϕ)
p1 = contour(collect(1e-16:5e-17:1e-15), collect(1.0:0.01:10.0),
(x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ",
right_margin=40*Plots.px);
p2 = heatmap(collect(1e-16:5e-17:1e-15), collect(1.0:0.01:10.0),
(x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ",
right_margin=40*Plots.px);
```

I write the above not to malign the choice of the `NegativeBinomial`
in terms of (`r`, `p`), but to emphasize that there are real
advantages to using the alternative parameterizations. Naturally,
conversions are provided between the parameterizations, so that
the experience is seamless.
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

Merging #1536 (c98862a) into master (bb11df8) will decrease coverage by 0.08%.
The diff coverage is 77.90%.

@@            Coverage Diff             @@
##           master    #1536      +/-   ##
==========================================
- Coverage   85.45%   85.37%   -0.09%     
==========================================
  Files         128      131       +3     
  Lines        7819     8035     +216     
==========================================
+ Hits         6682     6860     +178     
- Misses       1137     1175      +38     
Impacted Files Coverage Δ
src/Distributions.jl 100.00% <ø> (ø)
src/univariates.jl 74.07% <ø> (ø)
...nivariate/discrete/negativebinomialpoissongamma.jl 72.00% <72.00%> (ø)
...rc/univariate/discrete/negativebinomiallocation.jl 75.00% <75.00%> (ø)
...univariate/discrete/negativebinomialloglocation.jl 76.00% <76.00%> (ø)
src/conversion.jl 100.00% <100.00%> (ø)
src/univariate/discrete/bernoulli.jl 89.39% <0.00%> (-1.92%) ⬇️
src/univariate/continuous/uniform.jl 92.20% <0.00%> (-0.74%) ⬇️
src/univariate/discrete/binomial.jl 94.28% <0.00%> (-0.28%) ⬇️
src/univariate/discrete/geometric.jl 89.47% <0.00%> (-0.24%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb11df8...c98862a. Read the comment docs.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I made some suggestions. My main suggestion would be to use more descriptive names for the different parameterizations, similar to e.g. Normal/NormalCanon.

Additionally, I think the tests should be extended such that most/all newly added lines are covered.

docs/src/univariate.md Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial2.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial2log.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial2.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial3.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial3.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial3.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial3.jl Outdated Show resolved Hide resolved
src/univariate/discrete/negativebinomial3.jl Outdated Show resolved Hide resolved
andrewjradcliffe and others added 19 commits April 25, 2022 14:55
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@andrewjradcliffe
Copy link
Author

Additionally, I think the tests should be extended such that most/all newly added lines are covered.

With respect to tests, a quick inspection of the NegativeBinomial reveals that it mostly relies on R for the tests. The readme.md file in /test/ref/ seems to indicate that extending the R tests to these additional distributions should proceed analogously to NormalCanon?

Thoughts:

  • What sorts of additional tests should I add in Julia?
  • The R tests are nicely set up, but for this example, will merely verify the math specifying the conversions between parameterizations. Any other suggestions for tests?

@devmotion
Copy link
Member

Yeah, unfortunately the test setup is a bit of a mess, mostly due to historical reasons.

Since NegativeBinomial is already checked with the values from R, I think a good start would be to check that the alternative parameterizations are consistent with NegativeBinomial in Julia. Currently, in this PR only the logpdf values are compared it seems but actually ideally we would check all implemented functions. Sampling should be checked as well, e.g., by computing summary statistics (not sure how it's done for NegativeBinomial).

Regarding R, I guess one could copy the existing references for NegativeBinomial and replace the Julia type and its parameters with the ones from the alternative parameterizations (ie. update the Julia side without changing the R part).

This follows the Poisson distribution's @check_args, and,
admittedly, did not arise until I was pressing some extreme cases
of rare events and a sampler threw an exception due to location
being 0. (The location would never be in fact not zero, but perhaps
numerical roundoff might drive it there. -- I do not know the history
of Poisson distribution's @check_args, but I surmise something similar).
@DoktorMike
Copy link

Just a friendly ping of this PR since I indeed would like to use these alternative parameterizations of the NegativeBinomial. Is there anything major holding a merge back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants