tools that expect unaligned reads shouldn't validate the sequence dictionary #4131

lbergelson · 2018-01-11T20:00:10Z

Tools that take an unaligned bam shouldn't expect that the bam has contigs that match the reference.

These include BwaAndMarkDuplicatesSpark, BwaSpark, and ReadsPipelineSpark (when running in alignment mode.)

The text was updated successfully, but these errors were encountered:

nyl2002 · 2018-01-16T02:43:10Z

Are the errors below part of this, when starting BwaSpark with spark-submit?
I activated "--disable-sequence-dictionary-validation true", but that doesn't help.

It is very unclear, why a BAM is not recognized as a BAM file. I have tried all kinds of ways to make sure that it is a BAM and not a SAM file.
The documentation for BwaSpark also says "BAM/SAM/CRAM file containing reads", so if SAM files are really not possible, that should probably be changed.
...
Even on verbosity DEBUG, the comments are not at all helpful to get at the problem.
E.g. "Cannot retrieve file pointers within SAM text files."
Is that a general statement about SAM files? Or does it only say, that in this specific SAM file (which is actually a BAM file), file pointers cannot be found?
What pointers are meant exactly?
How could this be fixed?

"SamReaderFactory	Unable to detect file format from input URL or stream, assuming SAM format."
Which URL?
Which stream?
Why would this happen? What could be the error?
The SAM/BAM distinction seems very unclear. It would be more helpful, if some specific missing aspect (e.g. not queryname sorted) would be clearly declared as the culprit.
...
00:29 DEBUG: [kryo] Write: SAMFileHeader{VN=1.5, SO=queryname}
...
WARNING	2018-01-16 02:11:25	SamReaderFactory	Unable to detect file format from input URL or stream, assuming SAM format.
...
java.lang.UnsupportedOperationException: Cannot retrieve file pointers within SAM text files.
	at htsjdk.samtools.SAMTextReader.getFilePointerSpanningReads(SAMTextReader.java:185)
...

lbergelson · 2018-01-17T16:44:20Z

@nyl2002 What's your command line that's hitting problems? Are you trying to run BWA-MEM spark on a SAM file or on a BAM file?

I agree that we should change documentation and produce a better error message if it's failing on SAM files.

previously, tools that align reads required you to manually disable sequence dictionary validation if you didn't, they would fail because the unaligned bam didn't have the required sequence dictionary extracting out a SequenceDictionaryValidationArgumentCollection and providing a method for GATKSparkTools to configure it ReadsPipeline couldn't easily make use of this, so instead it overrides the method that does validation BwaSpark / BwaAndMarkDuplicatesPipelineSpark now do not require or allow dictionary validation fixes #4131

* previously, tools that align reads required you to manually disable sequence dictionary validation if you didn't, they would fail because the unaligned bam didn't have the required sequence dictionary * extracting out a SequenceDictionaryValidationArgumentCollection and providing a method for GATKSparkTools to configure it ReadsPipeline couldn't easily make use of this, so instead it overrides the method that does validation * BwaSpark / BwaAndMarkDuplicatesPipelineSpark now do not require or allow dictionary validation * fixes #4131

lbergelson mentioned this issue Jan 11, 2018

Error in executing the BwaAndMarkDuplicatesPipelineSpark #4112

Closed

lbergelson added bug Spark labels Jan 11, 2018

lbergelson mentioned this issue Jan 30, 2018

prevent sequence dictionary validation when aligning reads #4308

Merged

lbergelson closed this as completed in #4308 May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools that expect unaligned reads shouldn't validate the sequence dictionary #4131

tools that expect unaligned reads shouldn't validate the sequence dictionary #4131

lbergelson commented Jan 11, 2018 •

edited

nyl2002 commented Jan 16, 2018 •

edited by lbergelson

lbergelson commented Jan 17, 2018

tools that expect unaligned reads shouldn't validate the sequence dictionary #4131

tools that expect unaligned reads shouldn't validate the sequence dictionary #4131

Comments

lbergelson commented Jan 11, 2018 • edited

nyl2002 commented Jan 16, 2018 • edited by lbergelson

lbergelson commented Jan 17, 2018

lbergelson commented Jan 11, 2018 •

edited

nyl2002 commented Jan 16, 2018 •

edited by lbergelson