-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peak interpretation format #23
Comments
Are interested parties supposed to make suggestions here or in the working document? |
Either one is fine. If your comment is easy to express as an issue in this venue, then that's great. If your comments are better expressed in the context of the document, it is fine to make comments directly in the document. |
OK, so: Currently
I guess this all works, but it is becoming quite the decision tree. Could this not be unified a little bit? The
For the peptide fragment ion types This is certainly the most relevant for most users here... I can think of three variants off the top of my head
etc, and the same rules are used for the ion type and the neutral loss. Certainly the letters and precise choices would need some discussion. Note that this makes everything more extendable since we are not limited to using single letters for the annotationtype. Especially for less important cases one could consider e.g. Examples for combinations
Extra suggestions This could also approach the issue of the sidepeaks currently denoted with square brackets: instead, the side peak could be mapped to the main peak with Original example:
New suggestion
Note that this allows the apex of the peak to be marked with the correct interpretation, rather than the peak entry closest to the mass. So this could be |
This could go even further. Currently the
To give examples: page 17 original:
new:
page 18 orig:
new:
(or correspondingly page 12 orig:
new:
Overall the goal of my proposed modifications is to make complex annotations more easily readable, both to humans and machines. The drawback, if you want to call it one, is increased verbosity. But computers don't care about verbosity, as long as it's well specified. Less suffixes and prefixes. Space to integrate e.g. lipids easily: |
Hi @meowcat thanks for this well-reasoned alternative. I will summarize here in a table what I see as a translation table between the current proposal and your proposed alternative:
What do others think? I think we will likely discuss this in depth in the call this coming Friday. Would you join us, @meowcat ? |
Hi, Note that since this suggestet format is vaguely approaching JSON in optics and scope, another idea would to make it JSON entirely. Some notes to your enhanced suggestion:
|
The idea of using these predicates/operators to describe the annotations looks like a good way to break out of the conflicting "annotation style" issue between domains, and it does go some ways towards improving machine parse-ability while retaining human readability. On the other hand, it makes the common use-cases use a lot more space as we now have to explicitly tag every attribute. I don't think it's reasonable to do away with arbitrary arithmetic expressions. While you could argue that If we do use predicates, we would need to specify what each predicate meant, and whether certain predicates can go "together", for instance if you use One compromise would be to keep using "annotation styles" but just text-encode annotations using the predicate format instead of the compact notation, but this doesn't mandate anything for binary formats where what would be saved could be the annotation data, not necessarily the text-encoding of the annotation itself. I do think that the extensibility idea is a good one though. I apologize for the overly negative tone of this post as it is written in haste. |
I mean, in principle the Say we gather a solid base set of what we currently think is needed and call this (But I'm not good at the technical part of ontologies, so others might disagree on how this should be done.)
My feeling here is that overspecifying things will not help. In the end, what is the purpose of these annotations? 1) an interpretation aid for the reader, 2) an interpretation aid for software, 1.5) an interpretation aid for the reader that is visualized by software, 3) something else? (In my opinion I wouldn't see a reason why I can't have
The same goes for any other annotation (like the compact format) though; I actually see advantages for more streamlined binary serialization with the predicate format over the compact format.
Yes, that's certainly true. For visualization in software this can be circumvented, but in text-format-serialized records it will stay bulky. A shorthand like |
Hi all, I see f{C6H12O6} made it into the specification - any chance we can see s{SMILES} for known substructures? Would greatly enhance the generality of the format. Otherwise we have a big gap between "peptide" and "formula" that could IMO be avoided. |
@meowcat I'm not familiar enough with SMILES to say this with certainty, but I think it uses curly braces to denote charge, which may or may not make |
https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html Charges are expressed like I'm finding this regex which takes into account that There is an extension called "ChemAxon Extended SMILES" (CxSMILES) where curly brackets are used for R-group description, but this is far outside the proposed scope. |
We discussed this at the last weekly call, and agreed to add Returning to the SMILES charge specification, we concluded that the expected net charge of the ion would be written as part of the peak annotation format, but the writer is free to specify any local charges though not all readers will know what to do with them. |
This has been included in mzPAF specification currently under community review. If there are further concerns, report them based on current version: |
See working document for ongoing discussion.
The text was updated successfully, but these errors were encountered: