New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding tag types in the ADAMRecord attributes, adding the 'tags' command #99
Encoding tag types in the ADAMRecord attributes, adding the 'tags' command #99
Conversation
All automated tests passed. |
* tag. | ||
*/ | ||
object PrintTags extends AdamCommandCompanion { | ||
val commandName: String = "tags" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be print_tags, no?
Thanks @tdanford! Overall, looks good; I've dropped my comments inline. |
Guys, let me know if you think I didn't address any of your comments/questions in the latest commit -- otherwise, tell me if you want to merge and I can rebase this down to one commit first. |
One or more automated tests failed |
All automated tests passed. |
@tdanford , this looks great to me now. If you can squash down, I will merge. |
…mmand. This commit fixes issue 92 (bigdatagenomics#92). The old style of encoding the "optional fields" from the SAM/BAM was to store them as key=value pairs in the ADAMRecord.attributes string. However, this loses information about the _type_ of the tag/value, which is necessary if we want to reconstruct the original value type (for example, for re-exporting BAM files from ADAM files). This update is non-backwards-compatible, changing the format of the attributes field to tag:type:value and introducing a new Attribute class for parsing and handling these values. It also adds functions to AdamRDDFunctions to allow for filtering and subsetting of reads based on their tags, or to count the number of distinct tags or tag-values across a set of reads.
Thanks @fnothaft, should be good to go. |
Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
Thanks @tdanford ! |
Thank you, Frank! |
This commit fixes issue #92
The old style of encoding the "optional fields" from the SAM/BAM was to store
them as key=value pairs in the ADAMRecord.attributes string. However, this
loses information about the type of the tag/value, which is necessary if
we want to reconstruct the original value type (for example, for re-exporting
BAM files from ADAM files).
This update is non-backwards-compatible, changing the format of the attributes
field to tag:type:value and introducing a new Attribute class for parsing and
handling these values. It also adds functions to AdamRDDFunctions to allow for
filtering and subsetting of reads based on their tags, or to count the number of
distinct tags or tag-values across a set of reads.