Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding tag types in the ADAMRecord attributes, adding the 'tags' command #99

Merged
merged 1 commit into from Feb 14, 2014

Conversation

tdanford
Copy link
Contributor

This commit fixes issue #92

The old style of encoding the "optional fields" from the SAM/BAM was to store
them as key=value pairs in the ADAMRecord.attributes string. However, this
loses information about the type of the tag/value, which is necessary if
we want to reconstruct the original value type (for example, for re-exporting
BAM files from ADAM files).

This update is non-backwards-compatible, changing the format of the attributes
field to tag:type:value and introducing a new Attribute class for parsing and
handling these values. It also adds functions to AdamRDDFunctions to allow for
filtering and subsetting of reads based on their tags, or to count the number of
distinct tags or tag-values across a set of reads.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/111/

* tag.
*/
object PrintTags extends AdamCommandCompanion {
val commandName: String = "tags"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be print_tags, no?

@fnothaft
Copy link
Member

Thanks @tdanford! Overall, looks good; I've dropped my comments inline.

@massie
Copy link
Member

massie commented Feb 13, 2014

I just reviewed this but didn't have any questions that @fnothaft didn't already ask. Looks really good to me, @tdanford .

@tdanford
Copy link
Contributor Author

Thanks @massie and @fnothaft -- I'm struggling to write a test for this plugin PR at the moment, but when I get it working I'll come back and try to clean up these issues.

@tdanford
Copy link
Contributor Author

Guys, let me know if you think I didn't address any of your comments/questions in the latest commit -- otherwise, tell me if you want to merge and I can rebase this down to one commit first.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/116/

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/118/

@fnothaft
Copy link
Member

@tdanford , this looks great to me now. If you can squash down, I will merge.

…mmand.

This commit fixes issue 92 (bigdatagenomics#92).

The old style of encoding the "optional fields" from the SAM/BAM was to store
them as key=value pairs in the ADAMRecord.attributes string. However, this
loses information about the _type_ of the tag/value, which is necessary if
we want to reconstruct the original value type (for example, for re-exporting
BAM files from ADAM files).

This update is non-backwards-compatible, changing the format of the attributes
field to tag:type:value and introducing a new Attribute class for parsing and
handling these values.  It also adds functions to AdamRDDFunctions to allow for
filtering and subsetting of reads based on their tags, or to count the number of
distinct tags or tag-values across a set of reads.
@tdanford
Copy link
Contributor Author

Thanks @fnothaft, should be good to go.

fnothaft added a commit that referenced this pull request Feb 14, 2014
Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
@fnothaft fnothaft merged commit b23728f into bigdatagenomics:master Feb 14, 2014
@fnothaft
Copy link
Member

Thanks @tdanford !

@tdanford
Copy link
Contributor Author

Thank you, Frank!

@tdanford tdanford deleted the attributeParser branch February 14, 2014 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants