Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Can't compute SVD #48

Closed
jrglass6 opened this issue Mar 27, 2020 · 18 comments
Closed

ERROR: Can't compute SVD #48

jrglass6 opened this issue Mar 27, 2020 · 18 comments

Comments

@jrglass6
Copy link

Hello,

I am getting the "Can't compute SVD. Are there SNPs or individuals with missing values only?" error for one of my SNP datasets after removing individuals and filtering for missing data using VCFtools and PLINK. When I look at the corresponding "imiss" and "lmiss" files, there are no individuals or loci with 100% missing data. I'm able to perform PCA on the same dataset using other programs. Is there a better way to troubleshoot what specifically is flagging this error? I'm using the exact same code to remove individuals from another dataset and pcadapt works fine.

Cmel_FR16.lmiss.txt
Cmel_FR16.imiss.txt

@privefl
Copy link
Member

privefl commented Mar 27, 2020

The decomposition usually fails when there are variants with no variation.
It should be either all missing or very low MAF (but we filter for that by default).

@jrglass6
Copy link
Author

jrglass6 commented Mar 27, 2020 via email

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Function pcadapt() comes with a parameter min.maf with default value of 0.05.
Are you using this?

@jrglass6
Copy link
Author

I was not because I was already filtering for min.maf using VCFtools, but I just tried re-running pcadapt with min.maf = 0.05 and it gave the same error (Can't compute SVD).

Looking at the missing sites file (Cmel_FR16.lmiss.txt) there is no indication that there are any variants with all missing data.

@jrglass6
Copy link
Author

The error is somehow related to removing a large number (78) of individuals at once. When I run the same filtering steps without removing all of the individuals (or even some of the individuals) I do not get the error. But, looking at the "Cmel_FR16.imiss" file, I can't tell whether one or more individuals are problematic because none of them have a significant amount of missing data.

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Please try running pcadapt:::get_af() on your input.

@jrglass6
Copy link
Author

As "input" you mean the .bed file, correct? It's unable to calculate the allele frequencies and I get the following error. Mine is an R object called "Cmel.pcad2"

Error in pcadapt:::get_af(Cmel.pcad2) :
Expecting an external pointer: [type=character].

This is the input:
[1] "/path/to/directory/Cmel.pcad.bed"
attr(,"n")
[1] 80
attr(,"p")
[1] 28172
attr(,"class")
[1] "pcadapt_bed"

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Sorry, you need to use

n <- attr(input, "n")
p <- attr(input, "p")
xptr <- structure(pcadapt:::bedXPtr(input, n, p), n = n, p = p, class = "xptr_bed")

@jrglass6
Copy link
Author

Sorry but I'm having issues installing bigsnpr to get structure() to run. It's running into errors and not installing from CRAN or from GitHub.

I calculated the allele frequencies using VCFtools instead and have attached the file here.

Cmel_FR16.frq.txt

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Sorry, not {bigsnpr}, but {pcadapt}.

@privefl
Copy link
Member

privefl commented Mar 27, 2020

So, you have variants with no variation indeed.
I don't understand why you get the error when using min.maf.

@jrglass6
Copy link
Author

So does pcadapt not work if there are alleles that are fixed?

I also don't understand why I am getting this error, even after filtering for min.maf both in VCFtools and now in pcadapt...

Is it possible that there are some individuals that have unique scaffolds, and for some reason VCFtools is not removing those scaffold (CHROM) names so it looks like there should be data associated with a scaffold but there isn't? I have been investigating this, but looking at the VCF files the scaffold numbers are reduced when I take out the individuals, so I don't think this is the issue.

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Do you know how to use the debugger in RStudio?
You can use debugonce(pcadapt) and then run it, you will be able to run it line by line and to see the values of everything that is computed.

Otherwise, are you able to share your data?

I see now that your sample size if very small (n = 80), you should try increasing min.maf to 0.1.

@jrglass6
Copy link
Author

jrglass6 commented Mar 27, 2020

I tried increasing min.maf to 0.1 and still no luck :-(

I'm happy to share the .vcf file with you. Can you access it here via GoogleDrive?

Ran debugonce(pcadapt) and it gives the following output on this code:
Cmel.pcad3 <- pcadapt(Cmel.pcad2, K = 100, min.maf = 0.1, method = "mahalanobis")

function (input, K = 2, method = "mahalanobis", min.maf = 0.05,
ploidy = 2, LD.clumping = NULL, pca.only = FALSE, tol = 1e-04)
{
if (missing(input)) {
appDir <- system.file("shiny-examples/app-pcadapt",
package = "pcadapt")
if (appDir == "") {
stop("Could not find Shiny app in pcadapt.", call. = FALSE)
}
shiny::runApp(appDir, display.mode = "normal")
}
else {
if (any(grepl("^pcadapt_", class(input)))) {
UseMethod("pcadapt")
}
else {
stop("Remember to always use read.pcadapt() before using pcadapt().")
}
}
}

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Are you using K = 100??
You have only 80 individuals, you can't get more than 80, and should probably not ask more than 20.

@jrglass6
Copy link
Author

jrglass6 commented Mar 27, 2020

Previously I had several hundred individuals so tried a first-run using a high value of K and didn't change it here.

Then I have been re-running the analysis using an appropriate value of K.

*Update. You are 100% right. I just re-ran with an appropriate value of K (which I thought I had done before but apparently hadn't -- I've been trying all different types of datasets!). Oh my gosh. It makes intuitive sense, I just didn't even think about the K value since I've always had >100 individuals and I assumed the error had to do with missing data. I am so sorry.

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Ok, I'll look into it later.
Can you provide the bed/bim/fam instead of the vcf?

@privefl
Copy link
Member

privefl commented Mar 27, 2020

Good you found the problem.

It is probably my fault as I'm trying to catch an error to make it more explicit, but actually an other error happened and the actual error message should have been more explicit than "Can't compute SVD".

@privefl privefl closed this as completed Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants