Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVN cdf #114

Closed
wants to merge 3 commits into from
Closed

MVN cdf #114

wants to merge 3 commits into from

Conversation

blackeneth
Copy link

Adds function "qsimvnv" to /distrs -- this function is a Julia translation of the Matlab "qsimvnv" function, which computes the Multivariate Normal CDF over a rectangular area. The original code is by Prof. Alan Genz of the University of Washington, who allows redistribution of his code if credit is given in the source code (which has been done). A private function _chlrdr is also added, which re-arranges the integration limits and computes the Cholesky decomposition.

A test set is present in /test, called MVN.jl. runtests.jl has been modified to call it. Given that the definition of covariance matrices takes a lot of code space, it was broken out as a separate file for clarity. Total of 33 tests (all pass).

This is my first PR. Please be kind to my mistakes.

This function is a translation of the same Matlab function (qsimvnv), code by Prof. Alan Genz (used will permission / credit). 
It uses Monte Carlo integration to compute the Multivariate Normal (MVN) CDF over a rectangular region.
Types added to arguments for qsimvnv() and _chldr() ;  tests needed to be updated to be consistent with these types
@tpapp
Copy link
Contributor

tpapp commented Jun 16, 2021

allows redistribution of his code if credit is given in the source code (which has been done)

Note however that the package is under the MIT "Expat" license, and it is unclear if Prof. Genz is also OK with relicensing derived code.

@blackeneth
Copy link
Author

The tests are failing because the LinearAlgebra depency isn't in the test environment (Primes is also needed). After studying travis, Google Actions, continuous integration, ... I don't know how to fix it. It seems like CompatHelper should fix it?

As for the license, StatsFuns has already released code based on Prof. Genz's work -- tvpack.jl. MATLAB includes his code in their commercial product. But I will write him directly for clarification.

@blackeneth
Copy link
Author

I received a reply from Professor Genz:

Hello Andy,

Thanks for your work with the Julia translation of qsimvnv. I am happy to have the code derived from my original >implementations being redistributed under the MIT/Expat license.

Sincerely,

Alan Genz

So we are good to go on the licensing front. Still need to figure out how to fix the LinearAlgebra depency causing the tests to fail.

qsimvnv: 
Input types now AbstractArray{<:Real}, AbstractVector{<:Real}, AbstractVector{<:Real}
Updated docstring to reflect that (removed AbstractPDMat)
Since BigFloat is a Real, but the function won't finish if used, check for BigFloat and print a warning if found, then convert it to Float64

MVN: 
Corrected test JuliaStats#4 (test passed, but for wrong matrix and result) 
added tests 15 & 16 to check for the 2- and 3-dimensional shortcut code for Orthant probabilities
added note to ignore warnings on test 5 & 6 -- the isposdef() check is wrong (test 5) or irrelevant (test 6)
added test to check for BigFloat warning message
@devmotion
Copy link
Member

I haven't checked the implementation but IMHO this PR adds too many heavy dependencies. In particular, I am worried about the circular dependency on Distributions. Can they be avoided? Otherwise it might be better to add this functionality to Distributions or a separate package.

@blackeneth
Copy link
Author

The dependency on the Distributions package is the Normal() distribution, which was "moved" some time ago from StatsFuns to Distributions. Although "norm.jl" is still in StatsFuns; I could call that directly or compute it via an erfc() call; don't know how you'd feel about that. In general, I prefer to call the "Plan Of Record" (POR) code -- the one official function, as it will get maintained over time and get bug fixes. So I reference that function, I get those fixes too.

The dependency on LinearAlgebra is a natural from using vectors for a multivariate distribution . The dependency on Random comes from using quasi-random integration. The dependency on Primes is due to the construction of Richtmyer generators for the integration; this could be replaced by a tuple of, say, the first 1000 prime numbers. Although I figured any statistician would probably have Primes loaded anyhow.

Prior discussions on Discourse steered me away from adding it to Distributions and instead putting it into StatsFuns. That said, once qsimvnv.jl was added to StatsFuns, it would be fairly trivial to add the CDF function the MvNormal distribution.

Some time ago StatsFuns was split from Distributions. @andreasnoack has discussed undoing that and moving the statistics functions back to Distributions. Perhaps MVN could be the first one -- or, more likely, participate in the R-replacement stats funs that @andreasnoack is implementing moving back to Distributions (if you decide to do that). Not my decision, but I would of course cooperate with whatever direction is set.

@tpapp
Copy link
Contributor

tpapp commented Jul 10, 2021

FWIW, I still think that a mini-package for this specific functionality would be best.

@blackeneth
Copy link
Author

So @devmotion, @tpapp -- what's the call? Into StatsFuns.jl or as a separate mini-package?

If it's "mini-package":

  • I would appreciate you looking over the code and providing any suggestions for improvements.
  • would I create this mini-package under my github account?
  • I would fork tvpack.jl and put it in the mini-package -- that way, all of the "vector" functions are in one package

@tpapp
Copy link
Contributor

tpapp commented Aug 13, 2021

@blackeneth: I would recommend a specific, separate package for this. Yes, create it under your github account, then register if you want to.

I can provide some code review on superficial API details once you have set up this package (just ping me), but the actual computation I am not familiar with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants