New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on output format for interoperability between parsers #4

Closed
alanrice opened this Issue Jul 12, 2014 · 2 comments

Comments

2 participants
@alanrice
Contributor

alanrice commented Jul 12, 2014

Currently outputting an array of objects. Maybe JSON would be better?

@alanrice

This comment has been minimized.

Contributor

alanrice commented Jul 12, 2014

{
  "sequences":[
    {
      "name":"SEQUENCE_1",
      "seq":"MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL"
    },
    {
      "name":"SEQUENCE_2",
      "seq":"SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH"
    }
  ]
}

If unique headers can be assumed then:

{
  "SEQUENCE_1":{
    "seq":"MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL",
    "seqType": "protein",
  },
  "SEQUENCE_2":{
    "seq":"SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH",
    "seqType": "protein",
  }
}
@alanrice

This comment has been minimized.

Contributor

alanrice commented Aug 6, 2014

This is kind of two issues in one: deciding on output format (should be json for interoperability) and also how to deal with non-unique headers.
Non-unique FASTA headers (discussed online in places previously) I think should be dealt with by checking that the sequence is not duplicated also. If it is duplicated then that redundant entry can be ignored, if it is a novel sequence then throw an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment