-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Singular values & proportion of explained variance #46
Comments
In the code, we have Line 246 in 307c7e2
where Recall that only the first PCs are computed, so you can't sum over only the first values to get the total variance. Does the result of |
Thank you for you reply. I was surprised by the fact that the total proportion of explained variance was not 1, but after your reply, I realized that the function uses a truncated SVD, so, as you said, only the first K PCs are taken into account. Am I correct? Best, |
Yes, only the first K PCs are used. |
Did you forget to square the singular values? |
Thanks! No I did not. From the scree plot: The first few axes along sum up to over 1, and this is confirmed by the singular values:
Very much appreciate the help (and the package). Ludo |
Weird indeed. What is the size of your data? |
Hi Florian, it is only 52k SNPs. I sent it by email to ******.21@gmail.com. Hopefully it is the right address! Ludo |
It seems that your genotypes do not follow HWE at all (too few 1s). |
That makes sense. Is the HWE assumption for the imputation or within the pca? Would it be correct to take the total estimated variance over the first say 100 axes as the "total explained variance" and then look at each of those axes at a proportion of that variance, basically scaling it? |
For the scaling used in the PCA (sqrt(2p(1-p))), it is supposed to give you variables that have variance 1 under HWE. No, I don't think it would be correct. But usually, the singular values (or square of them) decrease almost linearly so that you can extrapolate the rest of them and therefore the total variance. |
Ok, I think that should work. You can extrapolate the variance explained like this (using K = 100 as input): y <- cumsum(x$singular.values^2)
plot(y, log = "")
y2 <- splinefun(seq_along(y), y, method = "monoH.FC")(101:(nrow(x$scores) - 1))
plot(c(y, y2))
EV <- x$singular.values^2 / max(y2) |
Thank you so much, very grateful for your help. That is awesome! Interestingly, simply scaling using the total variance give relatively similar results. See below for the first five axes:
Thank you. Ludo |
There is now a better estimate of the total variance used in v4.4. |
Hi,
In the pcadapt documentation, the singular.values vector in a pcadapt object is described as "the vector containing the K ordered squared root of the proportion of variance explained by each PC". The values for the y-axis on the scree plot are also computed by "squaring" the singular values.
However, should the proportion of explained variance for the i-th PC not rather be computed as:
?
I am sorry if I missed something and if I am mistaken.
Best regards,
Yann
The text was updated successfully, but these errors were encountered: