Bug with residual variances for PCA/PPCA #156

mgmverburg · 2021-06-03T11:34:14Z

Hi, so I noticed a bug I believe with PCA and PPCA
So PCA implements tresidualvar described in the docs as "The total residual variance."
And PPCA implements var described in the docs as "The total residual variance."

However, I noticed the implementation varies. With PPCA, with the :ml method selected, when you have identical projection for both PCA and PPCA, the variance given by tresidualvar of PCA is much smaller than that of the variance given by var of PPCA. I went through the code, and indeed, with PPCA it doesn't divide the variance or the V by the total number of observations as it does in the PCA implementation in this line .
I am not sure if this is intentional, but it certainly is confusing, so at the very least the docs would need to be updated.

Additionally, when doing PPCA with :em selected, one also doesn't get the same. I haven't yet fully figured out why.

Anyway, to easily reproduce the issue I am talking about, I made a MWE:

begin
     Random.seed!(1)
     n = 1000
     nr_d = 4 # number of variables
     nr_f = 2 # number of latent factors
     loadings_matrix = rand(Normal(0,1), nr_d, nr_f)
     latent_factor = rand(MvNormal([-1, 1],LinearAlgebra.I), n)'
     y = latent_factor*loadings_matrix' + rand(Normal(0, 0.1), n, nr_d)
     M_PCA = MultivariateStats.fit(PCA, y'; method=:svd)
     println(tresidualvar(M_PCA))
     M_PPCA = MultivariateStats.fit(PPCA, y', method=:ml, maxoutdim=2, maxiter=10000000)
     println(var(M_PPCA))
     M_PPCA_em = MultivariateStats.fit(PPCA, y', method=:em, maxoutdim=2, maxiter=10000000)
     println(var(M_PPCA_em))
 end

The text was updated successfully, but these errors were encountered:

wildart · 2021-06-22T21:23:09Z

Indeed, because data for :ml method wasn't scaled before SVD, the eigenvalue must be scaled afterward for getting correct variances. And that wasn't done. This line should be following V = abs2.(λ[ord]) ./ (n-1).

After that, the explained variance is reported correctly.

julia> M_PPCA2 = MultivariateStats.fit(PPCA, y', method=:ml, maxoutdim=2, maxiter=10000000)
Probabilistic PCA(indim = 4, outdim = 2, σ² = 0.0103621926397605)

wildart added the bug label Jun 22, 2021

wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Jan 24, 2022

fixed JuliaStats#156

b81596d

wildart closed this as completed in a98fa12 Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with residual variances for PCA/PPCA #156

Bug with residual variances for PCA/PPCA #156

mgmverburg commented Jun 3, 2021

wildart commented Jun 22, 2021 •

edited

Loading

Bug with residual variances for PCA/PPCA #156

Bug with residual variances for PCA/PPCA #156

Comments

mgmverburg commented Jun 3, 2021

wildart commented Jun 22, 2021 • edited Loading

wildart commented Jun 22, 2021 •

edited

Loading