Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve Sam module parsing and printing #134

Open
7 of 19 tasks
agarwal opened this issue Jul 7, 2014 · 1 comment
Open
7 of 19 tasks

improve Sam module parsing and printing #134

agarwal opened this issue Jul 7, 2014 · 1 comment

Comments

@agarwal
Copy link
Member

agarwal commented Jul 7, 2014

  • Define printers. Done in e3d9046.
  • Printer should assure each alignment in stream satisfies the header.
  • Header
    • Define type header. Done in a9d8fb0.
    • Convert list of header_items to a header. Done in a9d8fb0.
    • Sort PG items by their PP field to accurately reflect chain of programs used.
    • Redefine read's type to be: Reader.t -> (header * alignment Or_error.t Pipe.Reader.t) Or_error.t Deferred.t. Done in a9d8fb0.
  • RNAME and RNEXT. Assure given value is in SQ dictionary if any SQ items given. Done in bcfba26.
  • FLAG: Define more structured type. Various fields are meaningless when another is or is not set. It might be more clear to extract this logic into a variant type. Actually, this could apply to the whole alignment type, e.g. if read is single fragment, then RNEXT, PNEXT, and certain FLAG bits are meaningless. But beware the risk of over designing types.
  • CIGAR: Add more checks inparse_cigar.
    • H can only be present as the first and/or last operation.
    • S may only have H operations between them and the ends of the CIGAR string.
    • For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not defined.
    • Sum of lengths of the M/I/S/=/X operations shall equal the length of SEQ
  • QUAL: If given, length must equal that of SEQ. Must not be given if SEQ not given. Done in bcfba26.
  • Optional Fields: Assure unique TAG per alignment. Done in bcfba26.
  • Optional Fields: Improve type.
    • Define type representing defined TAGS. Currently, we're just loosely using string.
    • Use correct OCaml type for the various numeric types. Currently retaining most values as plain strings.
@agarwal
Copy link
Member Author

agarwal commented Sep 2, 2014

For more int types, see uint package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant