Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use julia implementations for pdfs and some cdf-like functions #113

Merged
merged 34 commits into from Feb 18, 2022

Conversation

andreasnoack
Copy link
Member

I think we should begin to use Julia implementations when they are available. There is is really no need to call out to Rmath when we have implementations available. Eventually, we should then also be able to drop Rmath as a package dependency and just use it for testing although there is still some work to do. However, SpecialFunctions now has some of the more tricky functions available so we can get quite far. This has a conflict with #112 so it would be good to get than one merged first and I can update this PR afterwards.

@codecov-commenter
Copy link

codecov-commenter commented May 13, 2021

Codecov Report

Merging #113 (cb1f325) into master (13e231a) will increase coverage by 9.34%.
The diff coverage is 95.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
+ Coverage   39.30%   48.64%   +9.34%     
==========================================
  Files          13       13              
  Lines         430      516      +86     
==========================================
+ Hits          169      251      +82     
- Misses        261      265       +4     
Impacted Files Coverage Δ
src/distrs/tdist.jl 100.00% <ø> (ø)
src/distrs/binom.jl 78.57% <62.50%> (-21.43%) ⬇️
src/distrs/gamma.jl 97.91% <97.61%> (-2.09%) ⬇️
src/distrs/beta.jl 100.00% <100.00%> (ø)
src/distrs/chisq.jl 100.00% <100.00%> (ø)
src/distrs/fdist.jl 100.00% <100.00%> (ø)
src/distrs/pois.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13e231a...cb1f325. Read the comment docs.

@andreasnoack
Copy link
Member Author

I've added methods for Poisson via the definitions for gamma so this now supersedes #112. It requires JuliaRegistries/General#37083 for the tests to pass so I'll restart once that PR is merged.

@andreasnoack andreasnoack marked this pull request as ready for review May 19, 2021 21:00
@andreasnoack andreasnoack reopened this May 19, 2021
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments but I'm too tired for a thorough review now. I'll check it more thoroughly tomorrow.

@@ -3,6 +3,7 @@ uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
version = "0.9.8"

[deps]
HypergeometricFunctions = "34004b35-14d8-5ef3-9330-4cdb6864b03a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a [compat] entry.

This also pulls in some additional dependencies since HypergeometricFunctions depends on DualNumbers, it seems? I am just wondering if people are fine with this since my ChainRules PR (#106) was not received positively mainly due to the additional dependency on ChainRulesCore.

On a side note, HypergeometricFunctions would be needed in SpecialFunctions to resolve JuliaMath/SpecialFunctions.jl#317 and JuliaMath/SpecialFunctions.jl#160. There was some discussion about depending on HypergeometricFunctions in JuliaMath/SpecialFunctions.jl#175, but I assume for the very same reasons not everyone would welcome such a dependency in SpecialFunctions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a compat entry (or CompatHelper will do it).

I can try to measure the effect on load time which I assume is the main concern here. I doubt that it will be much and more generally, I think that HypergeomericFunctions is a natural dependency here. Many CDFs have hypergeometric representations so I think this is a bit different from the AD discussion. I don't think it would make sense to have the CDFs defined in a separate package. (I think I'm in favor of #106 btw)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that probably it is more central to the package than AD (even though, of course, I'm in favor of the ChainRules PR 😉 ). I am fine with the dependency but I wanted to raise it due to the reactions in the other PR.

src/distrs/beta.jl Show resolved Hide resolved
Comment on lines 38 to 39
# The log version is currently based on non-log version. When the cdf is very small we shift
# to a implementation based on the hypergeometric function ₂F₁ to avoid underflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference for these cutoffs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not. My thinking is that using the non-log version is fine as long as it doesn't underflow. I realize that I'm using floatmin() instead of eps() in a similar situation for one of the other distributions.

src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/binom.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/binom.jl Outdated Show resolved Hide resolved
src/distrs/fdist.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
src/distrs/tdist.jl Outdated Show resolved Hide resolved
@blackeneth
Copy link

Let me butt in here to comment. Andreas has functions (pdf cdf ccdf logpdf logccdf invcdf invccdf invlogcdf invlogccdf) for Beta, Binomial, Chi Squared, F dist, Gamma, Normal, Poisson, and Student's T distribution.

A long ago PR in Distributions.jl has functions for Beta, Binomial, Cauchy, Chi Squared, F dist, Gamma, Geometric, HyperGeometric, logistic, lognormal, negative binomial, normal, poisson, student's t, uniform, weibull, and rayleigh distributions.

These pull requests always seem to stall. Review comments from devmotion tend to be along the lines of 1) small code optimizations, 2) questions on the "source" to certain algorithm constants. Multiply that by 8 for the all the distributions that Andreas is promoting. It results in a lot of little issues to fix/improve/modify. Review comments are valuable and it is leads to improved code, but one must manage the feedback process.

Perhaps it would be better to do a separate pull request for each distribution. It is more manageable to fix/modify one function at a time, rather than a set of 8. Learnings from fixing/modifying the first distribution would carry on to the second and subsequent distributions, so hopefully the process would be smoother. It is valuable to reference the code back to research papers; however, lacking that, we could depend on the test results matching the existing R functions.

Also, one might consider a parallel release strategy. Keep the R functions for now. Release a pure Julia version in parallel -- this is the "Beta test" -- let the community use it, study it, and identify problems (such as corner cases). After the Beta period, swap them for the R functions.

A long term goal has been to eliminate the dependence on the R functions. Don't let the perfect be the enemy of the good. These functions need to make it over the finish line.

@andreasnoack
Copy link
Member Author

I think it's fine. @devmotion comments here are very valid and should be addressed. The main complication is that some of the changes require adjustments in SpecialFunctions which makes it slightly more work but it's not that much. I had to look into some other things for a while but I'll get this completed.

@fernando-duarte
Copy link

Tried to fix support for betalogpdf, cdf can use a similar fix I think
https://github.com/JuliaStats/StatsFuns.jl/pull/113/files#r663370467

bdeonovic added a commit to bdeonovic/SpecialFunctions.jl that referenced this pull request Jul 28, 2021
Chain rules for partial derivatives of incomplete beta. Code is by @arzwa and taken from 
https://github.com/arzwa/IncBetaDer/blob/main/src/beta_inc_grad.jl
Related to effort to provide julia implementations of pdf/cdf like functions (JuliaStats/StatsFuns.jl#113) and JuliaStats/Distributions.jl#1334
@Pramodh-G
Copy link

Pramodh-G commented Sep 3, 2021

Hello,
I stumbled across this issue while benchmarking R vs Distributions.jl sampling. I think having pure julia implementations can be really useful to the ecosystem as a whole. I want to help out with this issue wherever I can; Is there anything that I can do?

PS: I have little experience in writing numerical software but am fluent in julia; given Rmath/similar implementations, I would be happy to port it to native julia :).

@andreasnoack
Copy link
Member Author

Yay! All tests pass now and I believe I have addressed the review comments so I think this one is ready. It only took me about eight months.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fixes in SpecialFunctions are necessary for the tests to pass, I guess it would be reasonable to update the SpecialFunctions compat entry (maybe the fixes should be backported to SpecialFunctions 1 if we want to keep support for SpecialFunctions 1.X)?

src/distrs/beta.jl Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
@andreasnoack
Copy link
Member Author

andreasnoack commented Feb 8, 2022

I theory we should probably backport the SpecialFunctions PRs since I consider all of them bug fixes.

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
src/distrs/beta.jl Outdated Show resolved Hide resolved
src/distrs/binom.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
src/distrs/gamma.jl Outdated Show resolved Hide resolved
andreasnoack and others added 6 commits February 8, 2022 11:36
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@andreasnoack
Copy link
Member Author

@devmotion The fixes have been backported in JuliaRegistries/General#54891 so are we good to go here?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Great PR 🎉

@andreasnoack andreasnoack merged commit 28d732a into master Feb 18, 2022
@andreasnoack andreasnoack deleted the an/nopdf branch February 18, 2022 11:04
devmotion added a commit that referenced this pull request Apr 1, 2022
devmotion added a commit that referenced this pull request Apr 1, 2022
devmotion added a commit that referenced this pull request Apr 8, 2022
* Revert "Use julia implementations for pdfs and some cdf-like functions (#113)"

This reverts commit 28d732a.

* Fix test errors on Julia 1.0

* Run CI on Julia 1.0

* Bump version

* Update Project.toml
devmotion pushed a commit that referenced this pull request Apr 20, 2022
* Use Julia implementations for (log)pdfs. Adjust Rmath tests to use
testsets to give more informative errors.

* Use Julia implementations for most cdf-like functions of the Chisq
distribution.

* Change Beta distribution to be using native implementations.

* Switch binomial cdfs to use native implementations (but not the inverses)

* Use native implementations for most cdf like gamma functions

* Just use the gamma defitions for the chisq

* Use the beta definitions for binomial

* Add native implementions for Poisson's cdf-like functions

* Rely on SpecialFunctions promotion handling in beta functions

* Rely on gamma_inc_inv promotion in gammainv(c)cdf

* Apply suggestions from code review

* Add compat for HypergeometricFunctions

* Handle negative arguments in Poisson (log)(c)cdf

* Handle negative x in Gamma (log)(c)cdf

* Handle arguments outside support for Beta and Fdist (log)(c)cdf

* Fix Fdist quantile functions

* Handle arguments out of support in Binomial (log)(c)cdf and apply
the floor function to make the results correct for non-integer arguments

* Avoid accidental promotion to Float64 in fdist(log)(c)cdf

* Fix typo in "x" calculation for fdist(log)(c)cdf

* Drop redundant low tolerance tests for Beta(1000, 2). We are now
generally more accurate than Rmath so failure are because of Rmath.

* Handle some non-finite edge cases in gamma and binomial

* Only switch to log-version of betalogcdf when the result is tiny

* Handle the degenerate cases in beta(log)(c)cdf when alpha is zero

* Handle the degenerate case of the gamma distribution when k==0.
Also, only calculate the log(c)cdf based on the hypergeometric
function when p is zero or subnormal

* Address review comments

* Avoid rational return type in gamma(log)(c)cdf

* Handle the case when alpha and beta are zero in the betacdf
devmotion added a commit that referenced this pull request Apr 29, 2022
* Use julia implementations for pdfs and some cdf-like functions (#113)

* Use Julia implementations for (log)pdfs. Adjust Rmath tests to use
testsets to give more informative errors.

* Use Julia implementations for most cdf-like functions of the Chisq
distribution.

* Change Beta distribution to be using native implementations.

* Switch binomial cdfs to use native implementations (but not the inverses)

* Use native implementations for most cdf like gamma functions

* Just use the gamma defitions for the chisq

* Use the beta definitions for binomial

* Add native implementions for Poisson's cdf-like functions

* Rely on SpecialFunctions promotion handling in beta functions

* Rely on gamma_inc_inv promotion in gammainv(c)cdf

* Apply suggestions from code review

* Add compat for HypergeometricFunctions

* Handle negative arguments in Poisson (log)(c)cdf

* Handle negative x in Gamma (log)(c)cdf

* Handle arguments outside support for Beta and Fdist (log)(c)cdf

* Fix Fdist quantile functions

* Handle arguments out of support in Binomial (log)(c)cdf and apply
the floor function to make the results correct for non-integer arguments

* Avoid accidental promotion to Float64 in fdist(log)(c)cdf

* Fix typo in "x" calculation for fdist(log)(c)cdf

* Drop redundant low tolerance tests for Beta(1000, 2). We are now
generally more accurate than Rmath so failure are because of Rmath.

* Handle some non-finite edge cases in gamma and binomial

* Only switch to log-version of betalogcdf when the result is tiny

* Handle the degenerate cases in beta(log)(c)cdf when alpha is zero

* Handle the degenerate case of the gamma distribution when k==0.
Also, only calculate the log(c)cdf based on the hypergeometric
function when p is zero or subnormal

* Address review comments

* Avoid rational return type in gamma(log)(c)cdf

* Handle the case when alpha and beta are zero in the betacdf

* Fix compat entries

* Remove precompile statement

* Update version number

Co-authored-by: Andreas Noack <andreas@noack.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants