Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify ionic species further #8

Open
douweschulte opened this issue Apr 5, 2024 · 0 comments
Open

Specify ionic species further #8

douweschulte opened this issue Apr 5, 2024 · 0 comments

Comments

@douweschulte
Copy link

douweschulte commented Apr 5, 2024

Ion charges can be represented in the current version of ProForma (section 7.1 - MS Extensions)

EMEVEESPEK/3[+2Na+,+H+]
EMEVEESPEK/-1[+e-]

Given examples in the specification
This does not specify notation for the ionic species. It does state however that quite high complexity options are valid: the removal of a OH-. It does not state how higher charged ionic species are notated, for example on higher charged metal ions like Fe[III].

EMEVEESPEK/7[+2Fe+3,+H+]
EMEVEESPEK/1[-OH-]
EMEVEESPEK/1[+N1H3+]

Potential uses for a higher complexity notation. Note on the last example: the 3+ here indicates that there are 3 H and that the charge of the whole species is +1

To me it seems logical to specify this as using the full modification Formula: notation (with e allowed as well) followed by the total number of charges for that species. But that implies that this field can have paired square brackets [], positive and negative numbers internally, and this might introduce some visual ambiguity on what the final number is doing.

Additionally there is one example that does not use a sign on the number of the ions. While there are also examples where there is only a sign used. Formalising the notation seems warranted to me.

EMEVEESPEK/-2[2I-]

Given example in the specification using only a number as the number of times an ionic species is present

\/([+-]?\d+)(\[((?:(?:[+-]?\d+)|(?:[+-]))((?:\[\d+[A-Z][A-Za-z]?\d+)|(?:[A-Z][A-Za-z]?\d*))+([+-]\d*),)*((?:(?:[+-]?\d+)|(?:[+-]))((?:\[\d+[A-Z][A-Za-z]?\d+)|(?:[A-Z][A-Za-z]?\d*))+([+-]\d*))\])?

Here is a beast of a regular expression for how this format could look (does not check for use of valid elements)
Here it is in regex101 with some example matches

"/" <number> ("[" ( <number_or_sign> <formula> <number_or_sign> ",")+ "]")?

This is the same in a bit of somewhat more readable BNF like notation

For a bit of background I came upon this when implementing my own parser for ProForma. I have no serious need or use for any of the complexity here, but my code internally allows to specify any chemical formula as ionic species so I was looking into this section to look into how to export the internal peptides back to fully valid ProForma.

douweschulte added a commit to snijderlab/rustyms that referenced this issue Apr 11, 2024
Signed-off-by: Douwe Schulte <d.schulte@uu.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant