Adam2Fastq should output reverse complement when 0x10 flag is set for read #980

Closed
jpdna opened this Issue Mar 25, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@jpdna
Member

jpdna commented Mar 25, 2016

The expected output of bam2fastq conversion is to reverse complement the output sequence when the alignment record was marked as reverse complemented with the 0x10 flag.in the SAM/BAM file.

Currently this does not happen, as the same seq string as is found in the BAM file is output in fastq regardless of 0x10 flag. Qualities need to be reversed as well.

Samtools follows this standard and recent discussion with Shop from Simons project brought this issue up as a common mistake in bam2fastq utilities.

@jpdna jpdna self-assigned this Mar 25, 2016

@arahuja

This comment has been minimized.

Show comment
Hide comment
@arahuja

arahuja Mar 25, 2016

Contributor

@jpdna Looks like the seq and quality scores are reverse complemented/reversed under some conditions - are these not correct?

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/AlignmentRecordConverter.scala#L78-L85

I did some comparisons with adam and picard's bam2fastq to see the same results, I think one of these tests is checked in

Contributor

arahuja commented Mar 25, 2016

@jpdna Looks like the seq and quality scores are reverse complemented/reversed under some conditions - are these not correct?

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/AlignmentRecordConverter.scala#L78-L85

I did some comparisons with adam and picard's bam2fastq to see the same results, I think one of these tests is checked in

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Mar 26, 2016

Member

@arahuja, thanks - you are correct that in most cases the rev comp is happening as intended, but as described below I think the case of reads with flags set as both reverse and unmapped is currently handled incorrectly in the code and tests.

It happens that the first case I looked at was one where the reverse 0x10 flag was set and also the read was unmapped 0x4. Both samtools and picard currently reverse complement the fastq output in this case. The BAM spec says:

• Bit 0x10 indicates whether SEQ has been reverse complemented and QUAL reversed. 
When bit 0x4 is unset, this corresponds to the strand to which the segment has been 
mapped.  When 0x4 is set,this indicates whether the unmapped read is stored in its
original orientation as it came off the sequencing machine

I interpret the above to to mean that seeing flags 0x10 && 0x4 indicates that when creating the BAM file the aligner chose to represent the SEQ string as reverse complemented despite the fact the read was unmapped, and thus to produce a correct pre-alignment fastq dump true to the original, we need to undo that by reverse complementing again when we make fastq dump.

So, I think that samtools and Picard are correct and the current behavior in ADAM should be changed to match, and the fastq currently generated by ADAM is actually incorrect in these cases and not suitable to be used as input to a new alignment because they will appear to map to the wrong strand.
PR on the way. #982

Member

jpdna commented Mar 26, 2016

@arahuja, thanks - you are correct that in most cases the rev comp is happening as intended, but as described below I think the case of reads with flags set as both reverse and unmapped is currently handled incorrectly in the code and tests.

It happens that the first case I looked at was one where the reverse 0x10 flag was set and also the read was unmapped 0x4. Both samtools and picard currently reverse complement the fastq output in this case. The BAM spec says:

• Bit 0x10 indicates whether SEQ has been reverse complemented and QUAL reversed. 
When bit 0x4 is unset, this corresponds to the strand to which the segment has been 
mapped.  When 0x4 is set,this indicates whether the unmapped read is stored in its
original orientation as it came off the sequencing machine

I interpret the above to to mean that seeing flags 0x10 && 0x4 indicates that when creating the BAM file the aligner chose to represent the SEQ string as reverse complemented despite the fact the read was unmapped, and thus to produce a correct pre-alignment fastq dump true to the original, we need to undo that by reverse complementing again when we make fastq dump.

So, I think that samtools and Picard are correct and the current behavior in ADAM should be changed to match, and the fastq currently generated by ADAM is actually incorrect in these cases and not suitable to be used as input to a new alignment because they will appear to map to the wrong strand.
PR on the way. #982

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 29, 2016

Member

I interpret the above to to mean that seeing flags 0x10 && 0x4 indicates that when creating the BAM file the aligner chose to represent the SEQ string as reverse complemented despite the fact the read was unmapped, and thus to produce a correct pre-alignment fastq dump true to the original, we need to undo that by reverse complementing again when we make fastq dump.

This sounds like a reasonable read to me. If I had my druthers, SAM would strictly not allow certain fields (e.g., 0x4) to be set unless the read was mapped, but alas, such is not the case.

Member

fnothaft commented Mar 29, 2016

I interpret the above to to mean that seeing flags 0x10 && 0x4 indicates that when creating the BAM file the aligner chose to represent the SEQ string as reverse complemented despite the fact the read was unmapped, and thus to produce a correct pre-alignment fastq dump true to the original, we need to undo that by reverse complementing again when we make fastq dump.

This sounds like a reasonable read to me. If I had my druthers, SAM would strictly not allow certain fields (e.g., 0x4) to be set unless the read was mapped, but alas, such is not the case.

@fnothaft fnothaft closed this in b8e36b2 Mar 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment