New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MarkDuplicates fails if library name is not set #934

Closed
fnothaft opened this Issue Feb 9, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@fnothaft
Member

fnothaft commented Feb 9, 2016

Reported by @jpdna. See https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/MarkDuplicates.scala#L73. If the library name is not set for a read group, we will call .get on a None, which throws an exception. The "correct" behavior is to treat the library ID as null in this case. This isn't ideal, but this is equivalent to our MarkDuplicates implementation pre-#906, and works correctly in the degenerate case where all reads are sequenced from a single library.

@fnothaft fnothaft added this to the 0.19.0 milestone Feb 9, 2016

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 9, 2016

[ADAM-934] Properly handle unset library name during duplicate marking
Resolves #934. If the library name for a read group is not set, we will use a
null string during the groupBy. This is equivalent to our pre-#906
implementation. Additionally, this commit adds logging that prints a warning
message for the user if there are read groups whose library ID is not set.

@heuermh heuermh closed this in 883ac4d Feb 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment