Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute HTSJDK In Non-Strict Mode #254

Closed
DarioS opened this issue Sep 29, 2019 · 1 comment
Closed

Execute HTSJDK In Non-Strict Mode #254

DarioS opened this issue Sep 29, 2019 · 1 comment

Comments

@DarioS
Copy link

DarioS commented Sep 29, 2019

Could an example of how to make HTSJDK run non-strictly be added to the user guide? For TCGA whole genome sequencing, different centres used different versions of hg19 with a different number of contigs. This causes an exception like

INFO	2019-09-30 02:43:41	TwoBitBufferedReferenceSequenceFile	Caching reference genome contig 22
[Mon Sep 30 02:44:13 AEST 2019] gridss.SoftClipsToSplitReads done. Elapsed time: 16.12 minutes.
Runtime.totalMemory()=4052221952
Exception in thread "main" java.lang.IllegalArgumentException: Reference index for 'NC_007605' not found in sequence dictionary.

In HTSJDK, it comes from


            if (NO_ALIGNMENT_REFERENCE_INDEX == referenceIndex) {

                if (strict) {

                    throw new IllegalArgumentException("Reference index for '" + referenceName + "' not found in sequence dictionary.");

                }

Indeed, the BAM file doesn't have an SQ entry for that contig (sample processed by Baylor College of Medicine), but the FASTA file (provided by Broad Institute) does have a record for it.

I don't care much about chromosomes other than 1, 2, 3, …., X, Y, MT, so I hope that when I run GRIDSS, I can force HTSJDK to ignore these discrepancies by specifying a simple option.

@d-cameron
Copy link
Member

You'll need to add VALIDATION_STRINGENCY=LENIENT or VALIDATION_STRINGENCY=SILENT to the java calls in the driver script.

--picardoptions pass-through option added to driver script: e.g.

 ../scripts/gridss.sh --picardoptions "COMPRESSION_LEVEL=0 VALIDATION_STRINGENCY=LENIENT"  -t 4 -b wgEncodeDacMapabilityConsensusExcludable.bed -r ../hg19.fa -w out -o out/gridss.full.chr12.1527326.DEL1024.vcf -a out/gridss.full.chr12.1527326.DEL1024.assembly.bam -j ../target/gridss-2.6.2-gridss-jar-with-dependencies.jar --jvmheap 8g chr12.1527326.DEL1024.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants