[ADAM-946] Fixes to FlagStat for Samtools concordance issue #954

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
5 participants
@jpdna
Member

jpdna commented Feb 21, 2016

These changes result in perfect concordance between ADAM and Samtools flagstat output stats, addressing corner cases around secondary, supplemental, and unmapped reads as described in #946

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 21, 2016

Can one of the admins verify this patch?

Can one of the admins verify this patch?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 21, 2016

Member

Jenkins, add to whitelist.

Member

fnothaft commented Feb 21, 2016

Jenkins, add to whitelist.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 21, 2016

Member

Stellar! Thanks for sorting this out, @jpdna! Interesting to see that it is the "unmapped, yet have alignment info" reads that are causing the trouble. I've always been unsure as to what is the correct way to handle them.

Member

fnothaft commented Feb 21, 2016

Stellar! Thanks for sorting this out, @jpdna! Interesting to see that it is the "unmapped, yet have alignment info" reads that are causing the trouble. I've always been unsure as to what is the correct way to handle them.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 21, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1092/

Build result: FAILURE

GitHub pull request #954 of commit 6a497b9 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/954/merge^{commit} # timeout=10 > git branch -a --contains 6792655 # timeout=10 > git rev-parse remotes/origin/pr/954/merge^{commit} # timeout=10Checking out Revision 6792655 (origin/pr/954/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 6792655a74f389324d18134be2bca4b8a5f773bfFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1092/

Build result: FAILURE

GitHub pull request #954 of commit 6a497b9 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/954/merge^{commit} # timeout=10 > git branch -a --contains 6792655 # timeout=10 > git rev-parse remotes/origin/pr/954/merge^{commit} # timeout=10Checking out Revision 6792655 (origin/pr/954/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 6792655a74f389324d18134be2bca4b8a5f773bfFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Feb 21, 2016

Member

oops, I see now this doesn't pass the tests.
Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead " I'll investigate ways to make flagstat match with Samtools while still not violating these assumptions.

Member

jpdna commented Feb 21, 2016

oops, I see now this doesn't pass the tests.
Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead " I'll investigate ways to make flagstat match with Samtools while still not violating these assumptions.

@jpdna jpdna closed this Feb 21, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 21, 2016

Member

Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead

I'd need to read the source to be 100%, but I think DecadentRead is a bit over aggressive with input checking and would just remove that requirement.

Member

fnothaft commented Feb 21, 2016

Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead

I'd need to read the source to be 100%, but I think DecadentRead is a bit over aggressive with input checking and would just remove that requirement.

+ // the statistics output by other programs, specifically Samtools Flagstat
+
+ if (samRecord.getReadUnmappedFlag) {
+ builder.setReadMapped(false)

This comment has been minimized.

@massie

massie Feb 22, 2016

Member

This can be simplified to...builder.setReadMapped(!samRecord.getReadUnmappedFlag) .. without needing a conditional.

@massie

massie Feb 22, 2016

Member

This can be simplified to...builder.setReadMapped(!samRecord.getReadUnmappedFlag) .. without needing a conditional.

+ }
+
+ if (samRecord.getNotPrimaryAlignmentFlag) {
+ builder.setPrimaryAlignment(false)

This comment has been minimized.

@massie

massie Feb 22, 2016

Member

Same here... builder.setPrimaryAlignment(!samRecord.getNotPrimaryAlignmentFlag)

@massie

massie Feb 22, 2016

Member

Same here... builder.setPrimaryAlignment(!samRecord.getNotPrimaryAlignmentFlag)

+ }
+
+ if (samRecord.getSupplementaryAlignmentFlag) {
+ builder.setSupplementaryAlignment(true)

This comment has been minimized.

@massie

massie Feb 22, 2016

Member

And here...

@massie

massie Feb 22, 2016

Member

And here...

@jpdna jpdna reopened this Feb 24, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Feb 24, 2016

Owner

Based on the input data, this looked to me like it was a bug in the test that was covered up by fact that unmapped reads simply did not have this flag set. On the on the other hand, interpretation of ReadNegativeStrand seems dubious without mapping, though setting it in that case seems likely to be benign.

Based on the input data, this looked to me like it was a bug in the test that was covered up by fact that unmapped reads simply did not have this flag set. On the on the other hand, interpretation of ReadNegativeStrand seems dubious without mapping, though setting it in that case seems likely to be benign.

@@ -249,7 +249,7 @@ class AlignmentRecordConverterSuite extends FunSuite {
.split('\n')
assert(!secondRecord.getReadMapped)
- assert(!secondRecord.getReadNegativeStrand)

This comment has been minimized.

@jpdna

jpdna Feb 24, 2016

Member

Given the input data, this seems likely a bug in the test that was previously covered up by fact that for unmapped reads the code previously did not set this flag even if present in input.

@jpdna

jpdna Feb 24, 2016

Member

Given the input data, this seems likely a bug in the test that was previously covered up by fact that for unmapped reads the code previously did not set this flag even if present in input.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 24, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1094/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1094/
Test PASSed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Feb 24, 2016

Member

I've pushed changes to address flag setting code simplification @massie suggests above and also modified tests in a way that the code now passes the tests. The tests modifications seem to remove a bug or unnecessary requirements, but should be reviewed by others to make sure no downstream assumptions are violated.

Member

jpdna commented Feb 24, 2016

I've pushed changes to address flag setting code simplification @massie suggests above and also modified tests in a way that the code now passes the tests. The tests modifications seem to remove a bug or unnecessary requirements, but should be reviewed by others to make sure no downstream assumptions are violated.

@@ -78,7 +78,7 @@ private[adam] object DecadentRead extends Logging with Serializable {
@deprecated("Use RichAlignmentRecord wherever possible in new development.", since = "0.18.0")
private[adam] class DecadentRead(val record: RichAlignmentRecord) extends Logging {
// Can't be a primary alignment unless it has been aligned
- require(!record.getPrimaryAlignment || record.getReadMapped, "Unaligned read can't be a primary alignment")
+ //require(!record.getPrimaryAlignment || record.getReadMapped, "Unaligned read can't be a primary alignment")

This comment has been minimized.

@massie

massie Feb 24, 2016

Member

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

@massie

massie Feb 24, 2016

Member

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

This comment has been minimized.

@jpdna

jpdna Feb 24, 2016

Member

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

Done. - removed the require line that had been commented-out

@jpdna

jpdna Feb 24, 2016

Member

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

Done. - removed the require line that had been commented-out

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 24, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1095/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1095/
Test PASSed.

+ // are unclear when the read is not mapped or reference is not defined,
+ // it is nonetheless favorable to set these flags in the ADAM file
+ // in same way as they appear in the input BAM inorder to match exactly
+ // the statistics output by other programs, specifically Samtools Flagstat

This comment has been minimized.

@heuermh

heuermh Feb 24, 2016

Member

I agree with this

@heuermh

heuermh Feb 24, 2016

Member

I agree with this

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

LGTM

Member

heuermh commented Feb 24, 2016

LGTM

@heuermh heuermh added this to the 0.19.0 milestone Feb 24, 2016

@heuermh heuermh changed the title from Fixes to FlagStat for Samtools concordance issue #946 to [ADAM-946] Fixes to FlagStat for Samtools concordance issue Feb 24, 2016

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

Fixes #946

Member

heuermh commented Feb 24, 2016

Fixes #946

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

Thank you, @jpdna. Merged manually.

Member

heuermh commented Feb 24, 2016

Thank you, @jpdna. Merged manually.

@heuermh heuermh closed this Feb 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment