[WIP] Ann io #48

joshuak94 · 2022-04-04T13:57:02Z

This is a very first draft/WIP for the annotation IO. This will cover annotation file types (BED, bedGraph, wiggle, etc.).

At the moment I've only implemented a very basic BED format (three columns, chrom, chromStart, chromEnd) and the BED header.

TODO

Extend BED3 format to full BED format w/ optional fields.
Implement the writer.
Add support for at the very least, bigbed (binary & indexed BED), wig, and bigwig files.
Allow reading BAM files as annotation files??

h-2

Thanks for working on this first of all! The structure seems fine to me!

Regarding the name: I still think that "ann_io" is sounds like the name of a secret agent 🕵🏻‍♀️ but I also don't have a great replacement, so we can stick with this for now. We can also discuss with @smehringer to see what she thinks.

Note that I have changed a small thing about reader_base in #47, so the inheritance works a little different now.

h-2 · 2022-04-05T10:42:34Z

include/bio/record.hpp

+    itemRgb,     //!< An RGB value to determine the color of the displayed track in the browser.
+    blockCount,  //!< The number of blocks (exons) in the BED file.
+    blockSizes,  //!< A list of the block sizes, corresponding to blockCount.
+    blockStarts, //!< A list of block starts, relative to offset.


I know these use camelCase in the specification, but it looks very strange to have that mixed with the other formatting styles in this library.
Can we change this to having snake_case or do you think that will confuse users?

I could change it! I just did it this way because I wasn't sure if I should be consistent with the specs or with our code haha.

Yeah, I think we should stick with the code style for now. I would love to have a table like this in the documentation at some point:

bio::field:: bio::fasta bio::fastq bio::vcf bio::bcf bio::sam bio::bed

::id == ::qname description line description line ID ID QNAME name

::seq sequence data sequence data – – SEQ –

::chrom == ::rname – -- CHROM CHROM RNAME chrom

::qual == ::mapq – quality data QUAL QUAL MAPQ –

::pos == ::chrom_start – – POS POS POS chromStart

...

So, instead of having individual documentation for all the fields, one big table with the format-specific terminology would be more helpful I think.

smehringer · 2022-04-16T08:51:46Z

For the naming I opened a discussion thread: #51

h-2 reviewed Apr 5, 2022

View reviewed changes

joshuak94 marked this pull request as draft April 7, 2022 07:16

joshuak94 force-pushed the ann_io branch 2 times, most recently from 9331634 to a9c4b3e Compare July 4, 2022 14:15

joshuak94 added 3 commits November 25, 2022 11:35

Add minimal (3 field) bed reader functionality.

ed9c1db

Add basic header functionality.

9a71943

Add basic writer functionality and testing.

77041f8

joshuak94 force-pushed the ann_io branch from a9c4b3e to 77041f8 Compare November 25, 2022 10:35

joshuak94 closed this Jan 26, 2023

joshuak94 deleted the ann_io branch January 26, 2023 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Ann io #48

[WIP] Ann io #48

Uh oh!

joshuak94 commented Apr 4, 2022 •

edited

Loading

Uh oh!

h-2 left a comment

Uh oh!

h-2 Apr 5, 2022

Uh oh!

joshuak94 Apr 7, 2022

Uh oh!

h-2 Apr 8, 2022

Uh oh!

smehringer commented Apr 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

bio::field::	bio::fasta	bio::fastq	bio::vcf	bio::bcf	bio::sam	bio::bed
::id == ::qname	description line	description line	ID	ID	QNAME	name
::seq	sequence data	sequence data	–	–	SEQ	–
::chrom == ::rname	–	--	CHROM	CHROM	RNAME	chrom
::qual == ::mapq	–	quality data	QUAL	QUAL	MAPQ	–
::pos == ::chrom_start	–	–	POS	POS	POS	chromStart

[WIP] Ann io #48

[WIP] Ann io #48

Uh oh!

Conversation

joshuak94 commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-2 left a comment

Choose a reason for hiding this comment

Uh oh!

h-2 Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

joshuak94 Apr 7, 2022

Choose a reason for hiding this comment

Uh oh!

h-2 Apr 8, 2022

Choose a reason for hiding this comment

Uh oh!

smehringer commented Apr 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

joshuak94 commented Apr 4, 2022 •

edited

Loading