You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, so I noticed a bug I believe with PCA and PPCA
So PCA implements tresidualvar described in the docs as "The total residual variance."
And PPCA implements var described in the docs as "The total residual variance."
However, I noticed the implementation varies. With PPCA, with the :ml method selected, when you have identical projection for both PCA and PPCA, the variance given by tresidualvar of PCA is much smaller than that of the variance given by var of PPCA. I went through the code, and indeed, with PPCA it doesn't divide the variance or the V by the total number of observations as it does in the PCA implementation in this line .
I am not sure if this is intentional, but it certainly is confusing, so at the very least the docs would need to be updated.
Additionally, when doing PPCA with :em selected, one also doesn't get the same. I haven't yet fully figured out why.
Anyway, to easily reproduce the issue I am talking about, I made a MWE:
begin
Random.seed!(1)
n =1000
nr_d =4# number of variables
nr_f =2# number of latent factors
loadings_matrix =rand(Normal(0,1), nr_d, nr_f)
latent_factor =rand(MvNormal([-1, 1],LinearAlgebra.I), n)'
y = latent_factor*loadings_matrix'+rand(Normal(0, 0.1), n, nr_d)
M_PCA = MultivariateStats.fit(PCA, y'; method=:svd)
println(tresidualvar(M_PCA))
M_PPCA = MultivariateStats.fit(PPCA, y', method=:ml, maxoutdim=2, maxiter=10000000)
println(var(M_PPCA))
M_PPCA_em = MultivariateStats.fit(PPCA, y', method=:em, maxoutdim=2, maxiter=10000000)
println(var(M_PPCA_em))
end
The text was updated successfully, but these errors were encountered:
Indeed, because data for :ml method wasn't scaled before SVD, the eigenvalue must be scaled afterward for getting correct variances. And that wasn't done. This line should be following V = abs2.(λ[ord]) ./ (n-1).
After that, the explained variance is reported correctly.
Hi, so I noticed a bug I believe with PCA and PPCA
So PCA implements
tresidualvar
described in the docs as "The total residual variance."And PPCA implements
var
described in the docs as "The total residual variance."However, I noticed the implementation varies. With PPCA, with the
:ml
method selected, when you have identical projection for both PCA and PPCA, the variance given bytresidualvar
of PCA is much smaller than that of the variance given byvar
of PPCA. I went through the code, and indeed, with PPCA it doesn't divide the variance or theV
by the total number of observations as it does in the PCA implementation in this line .I am not sure if this is intentional, but it certainly is confusing, so at the very least the docs would need to be updated.
Additionally, when doing PPCA with
:em
selected, one also doesn't get the same. I haven't yet fully figured out why.Anyway, to easily reproduce the issue I am talking about, I made a MWE:
The text was updated successfully, but these errors were encountered: