Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use julia implementations for pdfs and some cdf-like functions #113

Merged
merged 34 commits into from
Feb 18, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
139d141
Use Julia implementations for (log)pdfs. Adjust Rmath tests to use
andreasnoack May 4, 2021
0bffee8
Use Julia implementations for most cdf-like functions of the Chisq
andreasnoack May 7, 2021
6bfc68d
Change Beta distribution to be using native implementations.
andreasnoack May 18, 2021
60b64f3
Switch binomial cdfs to use native implementations (but not the inver…
andreasnoack May 18, 2021
542a6c4
Use native implementations for most cdf like gamma functions
andreasnoack May 19, 2021
6d66ba3
Just use the gamma defitions for the chisq
andreasnoack May 19, 2021
15b321c
Use the beta definitions for binomial
andreasnoack May 19, 2021
49178c2
Add native implementions for Poisson's cdf-like functions
andreasnoack May 19, 2021
6366eab
Rely on SpecialFunctions promotion handling in beta functions
andreasnoack Oct 24, 2021
b2ed6c9
Rely on gamma_inc_inv promotion in gammainv(c)cdf
andreasnoack Oct 24, 2021
15e20ff
Apply suggestions from code review
andreasnoack Oct 26, 2021
9c22066
Add compat for HypergeometricFunctions
andreasnoack Oct 24, 2021
fd4db7e
Changes from review
andreasnoack Oct 26, 2021
3c942e0
Handle negative arguments in Poisson (log)(c)cdf
andreasnoack Jan 29, 2022
7c67a0d
Handle negative x in Gamma (log)(c)cdf
andreasnoack Jan 29, 2022
90a6404
Handle arguments outside support for Beta and Fdist (log)(c)cdf
andreasnoack Jan 29, 2022
0a54399
Fix Fdist quantile functions
andreasnoack Jan 29, 2022
8cd5a76
Handle arguments out of support in Binomial (log)(c)cdf and apply
andreasnoack Jan 29, 2022
73f20b7
Avoid accidental promotion to Float64 in fdist(log)(c)cdf
andreasnoack Jan 30, 2022
71025f9
Fix typo in "x" calculation for fdist(log)(c)cdf
andreasnoack Jan 30, 2022
5a8d121
Drop redundant low tolerance tests for Beta(1000, 2). We are now
andreasnoack Jan 30, 2022
60ca443
Handle some non-finite edge cases in gamma and binomial
andreasnoack Feb 4, 2022
2578dbe
Only switch to log-version of betalogcdf when the result is tiny
andreasnoack Feb 4, 2022
9ec4aef
Handle the degenerate cases in beta(log)(c)cdf when alpha is zero
andreasnoack Feb 6, 2022
99cd041
Handle the degenerate case of the gamma distribution when k==0.
andreasnoack Feb 6, 2022
b031752
Address review comments
andreasnoack Feb 8, 2022
f3ceda8
Avoid rational return type in gamma(log)(c)cdf
andreasnoack Feb 8, 2022
beb5dcd
Update src/distrs/gamma.jl
andreasnoack Feb 8, 2022
14882a3
Update src/distrs/beta.jl
andreasnoack Feb 8, 2022
fbdb4eb
Update src/distrs/binom.jl
andreasnoack Feb 8, 2022
c3300d2
Update src/distrs/gamma.jl
andreasnoack Feb 8, 2022
46bd90d
Update src/distrs/gamma.jl
andreasnoack Feb 8, 2022
af14aba
Update src/distrs/gamma.jl
andreasnoack Feb 8, 2022
cb1f325
Handle the case when alpha and beta are zero in the betacdf
andreasnoack Feb 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ version = "0.9.15"

[deps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
HypergeometricFunctions = "34004b35-14d8-5ef3-9330-4cdb6864b03a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a [compat] entry.

This also pulls in some additional dependencies since HypergeometricFunctions depends on DualNumbers, it seems? I am just wondering if people are fine with this since my ChainRules PR (#106) was not received positively mainly due to the additional dependency on ChainRulesCore.

On a side note, HypergeometricFunctions would be needed in SpecialFunctions to resolve JuliaMath/SpecialFunctions.jl#317 and JuliaMath/SpecialFunctions.jl#160. There was some discussion about depending on HypergeometricFunctions in JuliaMath/SpecialFunctions.jl#175, but I assume for the very same reasons not everyone would welcome such a dependency in SpecialFunctions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a compat entry (or CompatHelper will do it).

I can try to measure the effect on load time which I assume is the main concern here. I doubt that it will be much and more generally, I think that HypergeomericFunctions is a natural dependency here. Many CDFs have hypergeometric representations so I think this is a bit different from the AD discussion. I don't think it would make sense to have the CDFs defined in a separate package. (I think I'm in favor of #106 btw)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that probably it is more central to the package than AD (even though, of course, I'm in favor of the ChainRules PR 😉 ). I am fine with the dependency but I wanted to raise it due to the reactions in the other PR.

InverseFunctions = "3587e190-3f89-42d0-90ee-14403ec27112"
IrrationalConstants = "92d709cd-6900-40b7-9082-c6be49f344b6"
LogExpFunctions = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
Expand All @@ -13,6 +14,7 @@ SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"

[compat]
ChainRulesCore = "1"
HypergeometricFunctions = "0.3"
InverseFunctions = "0.1"
IrrationalConstants = "0.1"
LogExpFunctions = "0.3.2"
Expand Down
74 changes: 67 additions & 7 deletions src/distrs/beta.jl
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# functions related to beta distributions

using HypergeometricFunctions: _₂F₁

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
betacdf,
betaccdf,
betalogcdf,
betalogccdf,
betainvcdf,
betainvccdf,
# betapdf,
# betalogpdf,
# betacdf,
# betaccdf,
# betalogcdf,
# betalogccdf,
# betainvcdf,
# betainvccdf,
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
betainvlogcdf,
betainvlogccdf

Expand All @@ -22,3 +25,60 @@ function betalogpdf(α::T, β::T, x::T) where {T<:Real}
val = xlogy(α - 1, y) + xlog1py(β - 1, -y) - logbeta(α, β)
return x < 0 || x > 1 ? oftype(val, -Inf) : val
end

function betacdf(α::Real, β::Real, x::Real)
# Handle a degenerate case
if iszero(α)
return float(last(promote(α, β, x, x >= 0)))
devmotion marked this conversation as resolved.
Show resolved Hide resolved
end

return first(beta_inc(α, β, clamp(x, 0, 1)))
end

function betaccdf(α::Real, β::Real, x::Real)
# Handle a degenerate case
if iszero(α)
return float(last(promote(α, β, x, x < 0)))
end

last(beta_inc(α, β, clamp(x, 0, 1)))
end

# The log version is currently based on non-log version. When the cdf is very small we shift
# to an implementation based on the hypergeometric function ₂F₁ to avoid underflow.
function betalogcdf(α::T, β::T, x::T) where {T<:Real}
# Handle a degenerate case
if iszero(α)
return log(last(promote(α, β, x, x >= 0)))
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
end

_x = clamp(x, 0, 1)
p, q = beta_inc(α, β, _x)
if p < floatmin(p)
# see https://dlmf.nist.gov/8.17#E7
return -log(α) + xlogy(α, _x) + log(_₂F₁(promote(α, 1 - β, α + 1, _x)...)) - logbeta(α, β)
elseif p <= 0.7
return log(p)
else
return log1p(-q)
end
end
betalogcdf(α::Real, β::Real, x::Real) = betalogcdf(promote(α, β, x)...)

function betalogccdf(α::Real, β::Real, x::Real)
# Handle a degenerate case
if iszero(α)
return log(last(promote(α, β, x, x < 0)))
end

p, q = beta_inc(α, β, clamp(x, 0, 1))
if q < 0.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this is a bit consistent with betalogcdf - could/should we use an additional branch q < eps(one(q)) here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way to write the complementary version in terms of the hypergeometric in the same way as I could with the logcdf.

return log(q)
else
return log1p(-p)
end
end

betainvcdf(α::Real, β::Real, p::Real) = first(beta_inc_inv(α, β, p))

betainvccdf(α::Real, β::Real, p::Real) = last(beta_inc_inv(β, α, p))
36 changes: 30 additions & 6 deletions src/distrs/binom.jl
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# functions related to binomial distribution

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
binomcdf,
binomccdf,
binomlogcdf,
binomlogccdf,
# binompdf,
# binomlogpdf,
# binomcdf,
# binomccdf,
# binomlogcdf,
# binomlogccdf,
binominvcdf,
binominvccdf,
binominvlogcdf,
binominvlogccdf


# Julia implementations
binompdf(n::Real, p::Real, k::Real) = exp(binomlogpdf(n, p, k))

Expand All @@ -22,3 +22,27 @@ function binomlogpdf(n::T, p::T, k::T) where {T<:Real}
val = min(0, betalogpdf(m + 1, n - m + 1, p) - log(n + 1))
return 0 <= k <= n && isinteger(k) ? val : oftype(val, -Inf)
end

for l in ("", "log"), compl in (false, true)
fbinom = Symbol(string("binom", l, ifelse(compl, "c", "" ), "cdf"))
fbeta = Symbol(string("beta" , l, ifelse(compl, "", "c"), "cdf"))
@eval function ($fbinom)(n::Real, p::Real, k::Real)
if isnan(k)
return last(promote(n, p, k))
end
res = ($fbeta)(max(0, floor(k) + 1), max(0, n - floor(k)), p)

# When p == 1 the betaccdf doesn't return the correct result
# so these cases have to be special cased
if isone(p)
newres = oftype(res, $compl ? k < n : k >= n)
if $l === ""
return newres
else
return log(newres)
end
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
else
return res
end
end
end
29 changes: 9 additions & 20 deletions src/distrs/chisq.jl
Original file line number Diff line number Diff line change
@@ -1,22 +1,11 @@
# functions related to chi-square distribution

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
chisqcdf,
chisqccdf,
chisqlogcdf,
chisqlogccdf,
chisqinvcdf,
chisqinvccdf,
chisqinvlogcdf,
chisqinvlogccdf

# Julia implementations
# promotion ensures that we do forward e.g. `chisqpdf(::Int, ::Float32)` to
# `gammapdf(::Float32, ::Int, ::Float32)` but not `gammapdf(::Float64, ::Int, ::Float32)`
chisqpdf(k::Real, x::Real) = chisqpdf(promote(k, x)...)
chisqpdf(k::T, x::T) where {T<:Real} = gammapdf(k / 2, 2, x)

chisqlogpdf(k::Real, x::Real) = chisqlogpdf(promote(k, x)...)
chisqlogpdf(k::T, x::T) where {T<:Real} = gammalogpdf(k / 2, 2, x)
# Just use the Gamma definitions
for f in ("pdf", "logpdf", "cdf", "ccdf", "logcdf", "logccdf", "invcdf", "invccdf", "invlogcdf", "invlogccdf")
_chisqf = Symbol("chisq"*f)
_gammaf = Symbol("gamma"*f)
@eval begin
$(_chisqf)(k::Real, x::Real) = $(_chisqf)(promote(k, x)...)
$(_chisqf)(k::T, x::T) where {T<:Real} = $(_gammaf)(k/2, 2, x)
end
end
28 changes: 16 additions & 12 deletions src/distrs/fdist.jl
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
# functions related to F distribution

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
fdistcdf,
fdistccdf,
fdistlogcdf,
fdistlogccdf,
fdistinvcdf,
fdistinvccdf,
fdistinvlogcdf,
fdistinvlogccdf

# Julia implementations
fdistpdf(ν1::Real, ν2::Real, x::Real) = exp(fdistlogpdf(ν1, ν2, x))

Expand All @@ -23,3 +11,19 @@ function fdistlogpdf(ν1::T, ν2::T, x::T) where {T<:Real}
val = (xlogy(ν1, ν1ν2) + xlogy(ν1 - 2, y) - xlogy(ν1 + ν2, 1 + ν1ν2 * y)) / 2 - logbeta(ν1 / 2, ν2 / 2)
return x < 0 ? oftype(val, -Inf) : val
end

for f in ("cdf", "ccdf", "logcdf", "logccdf")
ff = Symbol("fdist"*f)
bf = Symbol("beta"*f)
@eval $ff(ν1::T, ν2::T, x::T) where {T<:Real} = $bf(ν1/2, ν2/2, inv(1 + ν2/(ν1*max(0, x))))
@eval $ff(ν1::Real, ν2::Real, x::Real) = $ff(promote(ν1, ν2, x)...)
end
for f in ("invcdf", "invccdf", "invlogcdf", "invlogccdf")
ff = Symbol("fdist"*f)
bf = Symbol("beta"*f)
@eval function $ff(ν1::T, ν2::T, y::T) where {T<:Real}
x = $bf(ν1/2, ν2/2, y)
return x/(1 - x)*ν2/ν1
end
@eval $ff(ν1::Real, ν2::Real, y::Real) = $ff(promote(ν1, ν2, y)...)
end
94 changes: 87 additions & 7 deletions src/distrs/gamma.jl
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# functions related to gamma distribution

using HypergeometricFunctions: drummond1F1

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
gammacdf,
gammaccdf,
gammalogcdf,
gammalogccdf,
gammainvcdf,
gammainvccdf,
# gammapdf,
# gammalogpdf,
# gammacdf,
# gammaccdf,
# gammalogcdf,
# gammalogccdf,
# gammainvcdf,
# gammainvccdf,
gammainvlogcdf,
gammainvlogccdf

Expand All @@ -22,3 +25,80 @@ function gammalogpdf(k::T, θ::T, x::T) where {T<:Real}
val = -loggamma(k) + xlogy(k - 1, xθ) - log(θ) - xθ
return x < 0 ? oftype(val, -Inf) : val
end

function gammacdf(k::T, θ::T, x::T) where {T<:Real}
# Handle the degenerate case
if iszero(k)
return float(last(promote(x, x >= 0)))
end
return first(gamma_inc(k, max(0, x)/θ))
end
gammacdf(k::Real, θ::Real, x::Real) = gammacdf(map(float, promote(k, θ, x))...)

function gammaccdf(k::T, θ::T, x::T) where {T<:Real}
# Handle the degenerate case
if iszero(k)
return last(promote(k, θ, x, x < 0))/sqrt(one(θ))
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
end
return last(gamma_inc(k, max(0, x)/θ))
end
gammaccdf(k::Real, θ::Real, x::Real) = gammaccdf(map(float, promote(k, θ, x))...)

gammalogcdf(k::Real, θ::Real, x::Real) = _gammalogcdf(map(float, promote(k, θ, x))...)
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved

# Implemented via the non-log version. For tiny values, we recompute the result with
# loggamma. In that situation, there is little risk of significant cancellation.
function _gammalogcdf(k::Float64, θ::Float64, x::Float64)
# Handle the degenerate case
if iszero(k)
return log(last(promote(k, θ, x, x >= 0))/sqrt(one(θ)))
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
end

xdθ = max(0, x)/θ
l, u = gamma_inc(k, xdθ)
if l < floatmin(Float64) && isfinite(k) && isfinite(xdθ)
return -log(k) + k*log(xdθ) - xdθ + log(drummond1F1(1.0, 1 + k, xdθ)) - loggamma(k)
elseif l < 0.7
return log(l)
else
return log1p(-u)
end
end
function _gammalogcdf(k::T, θ::T, x::T) where {T<:Union{Float16,Float32}}
return T(_gammalogcdf(Float64(k), Float64(θ), Float64(x)))
end

gammalogccdf(k::Real, θ::Real, x::Real) = _gammalogccdf(map(float, promote(k, θ, x))...)

# Implemented via the non-log version. For tiny values, we recompute the result with
# loggamma. In that situation, there is little risk of significant cancellation.
function _gammalogccdf(k::Float64, θ::Float64, x::Float64)
# Handle the degenerate case
if iszero(k)
return log(last(promote(k, θ, x, x < 0))/sqrt(one(θ)))
andreasnoack marked this conversation as resolved.
Show resolved Hide resolved
end

xdθ = max(0, x)/θ
l, u = gamma_inc(k, xdθ)
if u < floatmin(Float64)
return loggamma(k, xdθ) - loggamma(k)
elseif u < 0.7
return log(u)
else
return log1p(-l)
end
end

function _gammalogccdf(k::T, θ::T, x::T) where {T<:Union{Float16,Float32}}
return T(_gammalogccdf(Float64(k), Float64(θ), Float64(x)))
end

function gammainvcdf(k::Real, θ::Real, p::Real)
_k, _θ, _p = promote(k, θ, p)
return _θ*gamma_inc_inv(_k, _p, 1 - _p)
end

function gammainvccdf(k::Real, θ::Real, p::Real)
_k, _θ, _p = promote(k, θ, p)
return _θ*gamma_inc_inv(_k, 1 - _p, _p)
end
20 changes: 15 additions & 5 deletions src/distrs/pois.jl
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# functions related to Poisson distribution

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
poiscdf,
poisccdf,
poislogcdf,
poislogccdf,
# poispdf,
# poislogpdf,
# poiscdf,
# poisccdf,
# poislogcdf,
# poislogccdf,
poisinvcdf,
poisinvccdf,
poisinvlogcdf,
Expand All @@ -20,3 +21,12 @@ function poislogpdf(λ::T, x::T) where {T <: Real}
val = xlogy(x, λ) - λ - loggamma(x + 1)
return x >= 0 && isinteger(x) ? val : oftype(val, -Inf)
end

# Just use the Gamma definitions
poiscdf(λ::Real, x::Real) = gammaccdf(max(0, floor(x + 1)), 1, λ)

poisccdf(λ::Real, x::Real) = gammacdf(max(0, floor(x + 1)), 1, λ)

poislogcdf(λ::Real, x::Real) = gammalogccdf(max(0, floor(x + 1)), 1, λ)

poislogccdf(λ::Real, x::Real) = gammalogcdf(max(0, floor(x + 1)), 1, λ)
3 changes: 2 additions & 1 deletion src/distrs/tdist.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# functions related to student's T distribution

# R implementations
# For pdf and logpdf we use the Julia implementation
using .RFunctions:
# tdistpdf,
# tdistlogpdf,
tdistcdf,
tdistccdf,
tdistlogcdf,
Expand Down