Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect calculation of logpdf for multivariate normal distribution #19

Closed
lindahua opened this issue Feb 25, 2013 · 10 comments
Closed
Labels

Comments

@lindahua
Copy link
Contributor

I implemented a multivariate gaussian distribution in my Bayesian package, and compared the results from my implementation with those yielded by Distributions.jl. I found that the logpdf method for MultivariateNormal produces incorrect results.

Here is a simple example:

mu = zeros(3)
c = [4. -2. -1.; -2. 5. -1.; -1. -1. 6.]
g = MultivariateNormal(mu, c)
x = [3., 4., 5.]
logpdf(g, x)

This produces -17.348506846053837.

The correct result should be -15.75539253001834. -- You may confirm this with MATLAB or other scientific software.

Also, cond(c) = 3.43, so it is in pretty good condition. Hence, such a big difference should not be due to numerical errors. There should be a bug somewhere in this package.

I think this package needs a thorough test suite to ensure the correctness of implementation -- the role of this package is fundamental for statistics.

@lindahua
Copy link
Contributor Author

I made a quick correction at 19d958e

@dmbates
Copy link
Contributor

dmbates commented Feb 25, 2013

Thanks.

@johnmyleswhite
Copy link
Member

Yes, we badly need tests. I would say that everything that's not being brought in from R should be treated as alpha or even pre-alpha software.

@andreasnoack
Copy link
Member

@lindahua I don't think the old version was wrong. Please note that the mean is standardized with respect to the cholesky factor of the covariance matrix and not the covariance itself. Also, in the new version the result of your own example is

julia> logpdf(g,x)
-21.291568715206875

I have just tried to reproduce the MATLAB result and first i couldn't, but then I tried

julia> log(1/(sqrt(2pi)^3*sqrt(det(c)))*exp(-0.5*x'*(c\x)))
1-element Float64 Array:
 -15.7554

The problem is that the pdf is tiny, 1.43721e-7, and hence the calculation should be done in logs. MATLAB and Mathematica don't do that and get results with large errors.

@andreasnoack
Copy link
Member

Okay, please ignore last part of my message. I should have thought a little more about it. The new and my old versions are still wrong but my idea that MATLAB and Mathematica had it wrong was not that smart. I'll think it through after lunch.

@andreasnoack
Copy link
Member

The problem was the transpose of the cholesky factor. I have pushed a fix 19d958e.

@lindahua
Copy link
Contributor Author

@andreasnoackjensen Oh, it is my fault I did not test the correction myself. The problem was that I did not notice it was d.covchol.LR instead of d.covchol. The correct version of the quadratic part should be
(x - d.mean) * (d.covchol \ (x - d.mean)). Instead of (x - d.mean) * (d.covchol.LR \ (x - d.mean))

@lindahua
Copy link
Contributor Author

The pdf is not very large (at the level of 1.4e-7, but this is not enough to produce such big difference. Actually, a correct implementation will produce very accurate result as long as the pdf is not over/underflow.

@andreasnoack
Copy link
Member

You are right about the 1.4e-7. I should have waited half an hour and looked at it again before posting. However, hopefully we now have a correct version. You are also completely right about the tests. It is so easy to get these things wrong.

@lindahua
Copy link
Contributor Author

@andreasnoackjensen I made a revision at 819db1f

It is now correct (I added a test to ensure this) and much faster (see my gist ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants