Theoretical `mode` and `modes` fail with some models #1772

aerdely · 2023-09-08T00:12:46Z

The package does not calculate correctly the theoretical mode and/or modes under models like Binomial, Poisson and NegativeBinomial here some examples:

using Distributions

## Example 1
B = Binomial(3, 0.5)
v = collect(0:3);
hcat(v, pdf.(B, v)) # theoretical modes are 1 and 2
pdf(B, 1) == pdf(B, 2) # true, as expected
modes(B) # should return 1 and 2 not just 2
mode(B) # should return 1 not 2
# But surprisingly:
D = DiscreteNonParametric(v, pdf.(B, v))
pdf(D, 1) == pdf(D, 2) # true, as expected
modes(D) # 1 and 2, as expected
mode(D) # 1, as expected

## Example 2
P = Poisson(2.0)
x = collect(0:4);
hcat(x, pdf.(P, x)) # theoretical modes are 1 and 2
pdf(P, 1) == pdf(P, 2) # true, as expected
modes(P) # 1 and 2, as expected
mode(P) # should return 1 not 2

## Example 3
N = NegativeBinomial(3, 0.5)
y = collect(0:3);
hcat(y, pdf.(N, y)) # theoretical modes are 1 and 2
pdf(N, 1) == pdf(N, 2) # should be true
pdf(N, 1) - pdf(N, 2) # very small positive difference
pdf(N, 1) > pdf(N, 2) # so numerically the mode should be 1 but:
mode(N) # oops! 2 instead of 1
modes(N) # should return 1 and 2 not just 2

The text was updated successfully, but these errors were encountered:

itsdfish · 2023-09-08T09:39:51Z

I want to add that the problem for NegativeBinomial is that it uses the fallback method. The problem for Binomial is similar even though it is has its own method.

nalimilan · 2023-09-08T10:35:20Z

That fallback seems dangerous. Maybe we should remove it and replace it with individual methods for distributions that have a single mode?

itsdfish · 2023-09-08T10:45:56Z

Yes. I was just about to make that point. The fallback inappropriately assumes a single mode. A more accurate approach would be to check the pdfs across the support of a distribution:

function modes(d::DiscreteUnivariateDistribution)
    _support = minimum(d):maximum(d)
    pdfs = pdf.(d, _support)
    max_pdf,_ = findmax(pdfs)
    indices = map(p -> p ≈ max_pdf, pdfs)
    return _support[indices]
end

Brute force is inefficient, but it's better than efficiently arriving at an incorrect result. Perhaps someone could make some improvements to the above code, but that general approach would be better. Otherwise, I agree with @nalimilan, the fallback should be removed unless it always provides an accurate result.

nalimilan · 2023-09-08T11:28:53Z

minimum(d):maximum(d) returns a range with a unit step, so it will give extremely inexact results and is unlikely to ever identify two modes even if they exist. Conversely, going over all possible float values will take forever. We should define specific methods for each distribution instead, and if we don't have a good solution for some distributions throwing an error is better.

itsdfish · 2023-09-08T11:35:19Z

My mistake. I updated the example above so that it is restricted to DiscreteUnivariateDistribution. But I think the issue would still apply in cases where the range is large or unbounded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theoretical `mode` and `modes` fail with some models #1772

Theoretical `mode` and `modes` fail with some models #1772

aerdely commented Sep 8, 2023

itsdfish commented Sep 8, 2023

nalimilan commented Sep 8, 2023

itsdfish commented Sep 8, 2023 •

edited

nalimilan commented Sep 8, 2023

itsdfish commented Sep 8, 2023

Theoretical mode and modes fail with some models #1772

Theoretical mode and modes fail with some models #1772

Comments

aerdely commented Sep 8, 2023

itsdfish commented Sep 8, 2023

nalimilan commented Sep 8, 2023

itsdfish commented Sep 8, 2023 • edited

nalimilan commented Sep 8, 2023

itsdfish commented Sep 8, 2023

Theoretical `mode` and `modes` fail with some models #1772

Theoretical `mode` and `modes` fail with some models #1772

itsdfish commented Sep 8, 2023 •

edited