-
Notifications
You must be signed in to change notification settings - Fork 7
[WIP] Ann io #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Ann io #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this first of all! The structure seems fine to me!
Regarding the name: I still think that "ann_io" is sounds like the name of a secret agent 🕵🏻♀️ but I also don't have a great replacement, so we can stick with this for now. We can also discuss with @smehringer to see what she thinks.
Note that I have changed a small thing about reader_base
in #47, so the inheritance works a little different now.
itemRgb, //!< An RGB value to determine the color of the displayed track in the browser. | ||
blockCount, //!< The number of blocks (exons) in the BED file. | ||
blockSizes, //!< A list of the block sizes, corresponding to blockCount. | ||
blockStarts, //!< A list of block starts, relative to offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know these use camelCase
in the specification, but it looks very strange to have that mixed with the other formatting styles in this library.
Can we change this to having snake_case
or do you think that will confuse users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it! I just did it this way because I wasn't sure if I should be consistent with the specs or with our code haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should stick with the code style for now. I would love to have a table like this in the documentation at some point:
bio::field:: | bio::fasta | bio::fastq | bio::vcf | bio::bcf | bio::sam | bio::bed |
---|---|---|---|---|---|---|
::id == ::qname | description line | description line | ID | ID | QNAME | name |
::seq | sequence data | sequence data | – | – | SEQ | – |
::chrom == ::rname | – | -- | CHROM | CHROM | RNAME | chrom |
::qual == ::mapq | – | quality data | QUAL | QUAL | MAPQ | – |
::pos == ::chrom_start | – | – | POS | POS | POS | chromStart |
...
So, instead of having individual documentation for all the fields, one big table with the format-specific terminology would be more helpful I think.
For the naming I opened a discussion thread: #51 |
9331634
to
a9c4b3e
Compare
This is a very first draft/WIP for the annotation IO. This will cover annotation file types (BED, bedGraph, wiggle, etc.).
At the moment I've only implemented a very basic BED format (three columns, chrom, chromStart, chromEnd) and the BED header.
TODO