Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on output format for interoperability between parsers #4

Closed
alanrice opened this issue Jul 12, 2014 · 2 comments
Closed

Decide on output format for interoperability between parsers #4

alanrice opened this issue Jul 12, 2014 · 2 comments
Assignees

Comments

@alanrice
Copy link
Contributor

Currently outputting an array of objects. Maybe JSON would be better?

@alanrice
Copy link
Contributor Author

{
  "sequences":[
    {
      "name":"SEQUENCE_1",
      "seq":"MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL"
    },
    {
      "name":"SEQUENCE_2",
      "seq":"SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH"
    }
  ]
}

If unique headers can be assumed then:

{
  "SEQUENCE_1":{
    "seq":"MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL",
    "seqType": "protein",
  },
  "SEQUENCE_2":{
    "seq":"SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH",
    "seqType": "protein",
  }
}

@alanrice
Copy link
Contributor Author

alanrice commented Aug 6, 2014

This is kind of two issues in one: deciding on output format (should be json for interoperability) and also how to deal with non-unique headers.
Non-unique FASTA headers (discussed online in places previously) I think should be dealt with by checking that the sequence is not duplicated also. If it is duplicated then that redundant entry can be ignored, if it is a novel sequence then throw an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants