Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault issue when calling all loci #491

Closed
elizeng opened this issue May 27, 2022 · 4 comments
Closed

Segmentation fault issue when calling all loci #491

elizeng opened this issue May 27, 2022 · 4 comments

Comments

@elizeng
Copy link

elizeng commented May 27, 2022

I am running into segmentation fault issues that I do not usually run into, and this could be due to the fact that I am trying to call all possible loci, which I have done before and never encountered any issue. I am using the latest version (and also tried it on 0.935 as well), and running it on a cluster node using 512gb memory which should be sufficient for the job.

So if anyone has any input on this issue I would greatly appreciate hearing from you.

Thank you.

-> angsd version: 0.937-81-g88fd84d (htslib: 1.14-9-ge769401) build(May 27 2022 16:43:01)
        -> /u/yxeng/programs/sandbox/angsd/angsd -GL 1 -doDepth 1 -doCounts 1 -doGeno 2 -doPost 1 -doGlf 2 -doMaf 2 -doMajorMinor 1 --ignore-RG 0 -bam /u/yxeng/seabird/scripts/snpcalling/APT_tern_bamfilelist_5individuals.filelist -minMapQ 30 -minQ 30 -minInd 4 -setMinDepthInd 3 -geno_minDepth 3 -out /scratch/yxeng/seabird/species_analysis/APT/snps/APT_snpcalling/APT_tern/APT_tern_round3_gatk/APT_tern_round3_gatk -doPlink 2 -doBcf 1 -nThreads 24
        -> Inputtype is BAM/CRAM
[multiReader] 5 samples in 5 input files

[bcfoutput]     Please add the following parameters
                 '-gl 1 -dopost 1 -domajorminor 1 -domaf 1 -dobcf 1 --ignore-RG 0 -dogeno 1 -docounts 1'

        -> Inputtype is BAM/CRAM
        -> Parsing 5 number of samples
/var/spool/pbs/mom_priv/jobs/310100.kunanyi-ohpc.tpac.org.au.SC: line 16: 15683 Segmentation fault      /u/yxeng/programs/sandbox/angsd/angsd -GL 1 -doDepth 1 -doCounts 1 -doGeno 2 -doPost 1 -doGlf 2 -doMaf 2 -doMajorMinor 1 --ignore-RG 0 -bam /u/yxeng/seabird/scripts/snpcalling/APT_tern_bamfilelist_5individuals.filelist -minMapQ 30 -minQ 30 -minInd 4 -setMinDepthInd 3 -geno_minDepth 3 -out /scratch/yxeng/seabird/species_analysis/APT/snps/APT_snpcalling/APT_tern/APT_tern_round3_gatk/APT_tern_round3_gatk -doPlink 2 -doBcf 1 -nThreads 24
@TeresaPegan
Copy link

In my experience ANGSD has segmentation faults for all kinds of problems that don't have anything to do with memory. The program is suggesting that you add a bunch of parameters to your command -- maybe try that and see if it fixes the problem?
I use ANSGD to get GLs for all loci in my genomes (~1GB in length) all the time, so I don't think it has anything to do with that. Recently, whenever I have gotten a segfault it has been because I entered a command wrong. (Like accidentally trying "angsd index" instead of "angsd sites index" for a site file I wanted to index resulted in a segfault.)
Good luck!
-Teresa

@ANGSD
Copy link
Owner

ANGSD commented Jun 26, 2022

Dear @elizeng , it is quite difficult to see what is causing this.

I would do the following.

  1. Disable the threading part
  2. See if you can recreate the error.
  3. If you can recreate the error then, try to disable the bcf writing.

Let me know how this does.

Best

@elizeng
Copy link
Author

elizeng commented Jun 30, 2022

Dear @elizeng , it is quite difficult to see what is causing this.

I would do the following.

1. Disable the threading part

2. See if you can recreate the error.

3. If you can recreate the error then, try to disable the bcf writing.

Let me know how this does.

Best

I have tried your suggestion, and after disabling just threading, the error was recreated.
Currently I am rerunning it while disabling bcf writing, and it seems to work. But for the next step I will need a vcf output which was what the bcf output was first included. So I am not sure how this is going to work for me. I have a plink output as well, but my understanding is that plink output has much less information than a bcf/vcf output.

In the grand scheme of things, my main aim here is to get one of the following:

  1. SNPs and the non-variant regions
  2. All loci

This will be for running SMC++ which requires all loci, variant and non-variant, for the model. As such, is there a way that ANGSD outputs the non-variant sites as a list and the specific region?

Cheers,
Elize

@elizeng
Copy link
Author

elizeng commented Jun 30, 2022

Update: The run with ANGSD seems to be going fine after removing the -dobcf flag but has not finished.

Though for my issue, I have found the solution, which is to use bedtools complement to get the sites that are no called as variants. And as such, can skip the need to call all sites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants