Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logarithm bugs #87

Closed
wants to merge 28 commits into from

Conversation

ericproffitt
Copy link

binomlogpdf was unable to handle 0log(0), fixed using the xlogy function.

using log1p with logbeta results in slight numerical inaccuracies, so changed to using all loggamma.

fixed xlogy, should return zero(log(y)).

Eric Proffitt added 5 commits December 26, 2019 18:27
using `log1p` with `logbeta` results in slight numerical inaccuracies, so changed to using all `loggamma`.

`xlogy` needs to return zero(type(y)) in ternary expression, otherwise if x < 0 it throws complex result error.
This reverts commit 9992f42.
This reverts commit 0f1ec23.
@nalimilan
Copy link
Member

Thanks. Can you add tests for things which were previously incorrect?

@ericproffitt
Copy link
Author

ericproffitt commented Dec 29, 2019

Ok, so after running some tests, the situation was nastier than I thought.

I think I've got everything correct now.

It turned out the simplest way to do it was with a ternary operator which returns convert(typeof(float(p)), -Inf) if k isn't an integer between zero(k) and n, inclusive. It also should be type stable, since the return type is always equal to float(p).

Tests have now been included.

test/basicfuns.jl Outdated Show resolved Hide resolved
@test xlogy(-2, Base.MathConstants.e) ≈ -2
@test xlogy(0, 1) ≈ 0
@test xlogy(-1, 1) ≈ 0
@test xlogy(0, 0) ≈ 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't supposed to work according to the docstring. We could update it to say y ≥ 0, but then it wouldn't be right about the "correct limit" AFAICT. Maybe we should just drop that restriction and just say that zero is returned when x == 0. That's what SciPy does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring is incorrect in this case. Maybe I'm misunderstanding the purpose of having a function like xlogy, but being able to evaluate 0log(0) is I believe intended to be its principal benefit.

I'm not sure I understand what you mean when you say,

...but then it wouldn't be right about the "correct limit" AFAICT

could you expand on this?

I dont believe dropping the constraint on y entirely is correct, as it should error out when y is a negative real, irrespective of what x is.

My appraisal of the situation is that the docstring should be changed to y≥0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed docstring to y≥0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that I don't think it's correct to say that the limit of x * log(y) when x tends towards zero is zero when y == 0, is it? x * log(0) isn't defined when x > 0, so it's not appropriate to speak about its limit at 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see what you're getting at now, thanks for clarifying.

So strictly speaking you're right, the limit would not be defined. However I think the practical benefits of having xlogy evaluate 0log(0) as if you were actually taking the limit xlog(x) as x->0⁺ outweigh the slight lapse in mathematical rigor.

The docstring, as it currently stands, is:

Return x * log(y) for y > 0 with correct limit at x = 0.

The mention of the limit at x = 0 being entirely superfluous when y>0, since it's unnecessary to consider any limits in this context, exact evaluation is always well-defined.

And to reiterate, if it isn't the problem of 0log(0) which we're trying to resolve, then it's not clear to me that the function xlogy provides any benefit over just simply evaluating x * log(y).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not a big deal. Though we could also use a more explicit mention about what happens when x == 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added a more descriptive docstring.

test/misc.jl Outdated Show resolved Hide resolved
test/misc.jl Outdated
for p in push!(rand(50), [0, 1]...)
for n in 0:50
for k in push!(collect(Any, -50:50), randn(10)...)
@test binomlogpdf(n, Float32(p), k) ≈ log(pdf(Binomial(n, p), k))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a tautological test, since Binomial uses binomlogpdf under the hood?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Distributions.binompdf has two methods, if p is of type Union{Int64, Float64}, then it makes a ccall to an underlying C library, which, as far as I can tell, is bug free and always returns the correct answer; please see

src/rmath.jl:70

However when p is of type Real but not of type Union{Int64, Float64}, then it dispatches on the binompdf found in

src/distrs/binom.jl:16

So the above testing code is simply checking that the methods match, under the assumption that the method which dispatches on the C code is correct (which I'm pretty sure it is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. Though can you call binomlogpdf directly instead of using log(pdf(Binomial(...), ...))? Binomial is not defined in this package, but in Distributions. Maybe also add a comment explaining that you test the generic Julia definition against Rmath.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

src/distrs/binom.jl Outdated Show resolved Hide resolved
src/basicfuns.jl Outdated Show resolved Hide resolved
src/distrs/binom.jl Outdated Show resolved Hide resolved
test/misc.jl Outdated Show resolved Hide resolved
test/misc.jl Outdated Show resolved Hide resolved
src/basicfuns.jl Outdated Show resolved Hide resolved
Eric Proffitt and others added 9 commits December 31, 2019 12:53
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
x = -Inf
end

return convert(typeof(float(p)), x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this conversion seems to create problems with dual numbers, as shown by the CI failure. One solution would probably be to do oftype(float(promote(n, p, k)), x) (haven't checked).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting outside my area of expertise, I can make your suggested change if you want, but I can't vouch for its correctness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least if tests pass that would be a good sign...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just made the change

Eric Proffitt and others added 3 commits December 31, 2019 14:49
Co-Authored-By: Lyndon White <oxinabox@ucc.asn.au>
src/basicfuns.jl Outdated Show resolved Hide resolved
Eric Proffitt and others added 3 commits December 31, 2019 15:24
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good as far as I can tell. Though I'd like somebody more familiar with this code to double-check.

test/misc.jl Outdated Show resolved Hide resolved
x = -Inf
end

return convert(float(promote_type(typeof(n), typeof(p), typeof(k))), x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it's better to just compute as much of the general case as possible, i.e. without risking exceptions, and then use the type of that result as the return type. It would be something like

_lpdf = xlogy(k, p) + xlogy(n - k, 1 - p)
if isinteger(k) & (zero(k) <= k <= n)
    return _lpdf + loggamma(n + 1) - loggamma(k + 1) - loggamma(n - k + 1)
else
    return oftype(_lpdf, -Inf)
end

Another thing, by computing 1-p explicitly, we are losing some precision when n is very large. Maybe we need a xlog1py or we should just special case the corner case here in this function and continue to use log1p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really better? I mean, it's much more complex than float and it's still not completely right. If we go to that level of complexity, let's do it 100% correctly by also calling loggamma(one(n-k))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's better to avoid promote_type and just use a computed value of the correct type. I can't come up with a case where xlogy and loggamma would produce different types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me -- but I find it really annoying that we have to rely on soft rules like that... :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've also been annoyed by this. I've been wondering if you could have some kind of language support for branches that enforces the same return type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like @sametype if would be nice. But that would only work if inference is able to figure out the type, and AFAIK we shouldn't rely on it. What we could rely on is the fact that for Number variables, one is defined: the macro could compute the first branch with all variables set to one(var), and then convert the result of the second branch to the type of the result. We probably need a more specific name for that macro...

test/misc.jl Outdated Show resolved Hide resolved
Eric Proffitt and others added 2 commits January 4, 2020 16:38
Co-Authored-By: Andreas Noack <andreas@noack.dk>
@nalimilan
Copy link
Member

@andreasnoack Are you OK with the PR?

@andreasnoack
Copy link
Member

I'd still prefer that we rely on the promotion in log instead of using promote_type and float explicitly. the former is more general.

@nalimilan nalimilan mentioned this pull request May 21, 2020
@nalimilan
Copy link
Member

@ericproffitt Do you want to apply what @andreasnoack proposed? BTW, the xlogx changes have just been done by another PR so that can be left out of this one.

@ericproffitt
Copy link
Author

Apologies, I've been unable to set aside enough time to revisit this PR.

Is there someone else who can make the appropriate changes? Otherwise I will revisit when I'm able to.

@andreasnoack
Copy link
Member

It looks like the xlog(x/y) part has been fixed in versions in https://github.com/JuliaStats/LogExpFunctions.jl and binomlogpdf looks like it was fixed by #113 so I'm closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants