Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entropy with isprobvec check #865

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

milankl
Copy link

@milankl milankl commented May 31, 2023

fixes #769 docstrings added, tests missing

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think silently incorrect results should be avoided as much as possible. I want to mention though that based on my experience in Distributions, this PR trades off safety versus performance and convenience: Due to numerical inaccuracies the check can fail even if the user computes the input vector in such a way that in non-floating point math it would be normalized.

A simple benchmark:

master:

julia> using StatsBase, Zygote, BenchmarkTools

julia> @btime entropy($(fill(1e-5, 10^5)));
  472.918 μs (0 allocations: 0 bytes)

julia> _, pb = Zygote.pullback(entropy, fill(1e-5, 10^5));

julia> @btime $pb(1.0);
  74.084 μs (16 allocations: 781.58 KiB)

This PR:

julia> using StatsBase, Zygote, BenchmarkTools

julia> @btime entropy($(fill(1e-5, 10^5)));
  512.763 μs (0 allocations: 0 bytes)

julia> _, pb = Zygote.pullback(entropy, fill(1e-5, 10^5));

julia> @btime $pb(1.0);
  79.138 μs (21 allocations: 781.73 KiB)

A more general comment: Can you add tests?

return -sum(xlogx, p)
end

entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check b as well? In any case, we need

Suggested change
entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b)
entropy(p, b::Real; check::Bool = true) = entropy(p; check = check) / log(b)


Checks whether `p` is a probability vector, i.e. p[i] >= 0 for each index i, and sum(p) ≈ 1.
Taken from `Distributions.isprobvec`."""
isprobvec(p::AbstractVector{<:Real}) = all(x -> x ≥ zero(x), p) && isapprox(sum(p), one(eltype(p)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to support tolerances here due to floating point inaccuracies but I don't see a nice way to forward them to this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Entropy calculation with non-probability vectors
2 participants