New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isotope Philosophy in CDK #330
Comments
On Tue, Jun 6, 2017 at 12:07 PM, John Mayfield ***@***.***> wrote:
Daniel's being doing some validation of InChI's generated by CDK and
noticed the isotopes get lost. This is a known problem that I fixed in
SMILES with a 'AtomMassStrict' flag. I would like some clarification on the
following:
What exactly does/should IsotopeFactory.confgure do and why is it needed?
To add isotope info based on the mass numner.
*and*
How do we represent 'natural abundance'?
Good point. I always assumed (and I think all the code does too), that
when no isotope info is given, it actually means 'natural abundance'...
But that indeed leaves the question, how does the CDK define "we have no
clue"...
The documentation is not clear: IsotopeFactor.configure(atom)
<http://cdk.github.io/cdk/2.0/docs/api/org/openscience/cdk/config/IsotopeFactory.html#configure-org.openscience.cdk.interfaces.IAtom->
In many code examples (for example Groovy CDK) atoms are typed and
configured. This makes it impossible to distinguish [12CH4] vs [CH4] as
[CH4] comes in with a null atom mass which then get's configured to
[12CH4] however to avoid this ugly-ness in SMILES we check what the
'major' isotope is (12C) and omit it.
Not very clean indeed...
So, the factory now takes the major isoptope when no isotope info is given?
That's different behavior from the methods to calculate molecular weights,
AFAIK...
Thus
both [12CH4] and [CH4] become [CH4]. To work round this the 'strict' flag
treats 'null' as the undefined and all carbon 12. This means you can round
trip the listed SMILES providing you don't call IsotopeFactory.configure().
If you do call IsotopeFactory.configure(atom) then [12CH4] and [CH4] both
become [12CH4].
OK, so we need a mechanism to distinguish "natural abundance" and "no clue"
and maybe even "major isotope"?
The same happens with the InChI, at the moment:
[12CH4] and [CH4] becomes InChI=1S/CH4/h1H4
[12CH4] should be InChI=1S/CH4/h1H4/i1+0
[CH4] should be InChI=1S/CH4/h1H4
What does InChI mean if it does not have an /i layer?
Nice catch of this inconsistency... not so nice that it exists...
So, how do you propose to proceed? (Besides properly defining the intention
of the API...)
Egon
…--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen
|
Agreed and I think that is the way forward. We do however need to update certain parts to 'fix' this: I think this condition should be removed: Also I think these methods are wrong... AtomContainerManipulator (alos in MolecularFormulaManipulator). There needs to a method that will add the exact mass of isotopes (if defined) otherwise if null (take their natural abundance weight). |
Another quirk... is elements like [Hs] for which we have the following data:
If asked to get the major isotope then it returns the first one... which isn't correct. I'll make some patches and submit a pull request. |
Daniel's being doing some validation of InChI's generated by CDK and noticed the isotopes get lost. This is a known problem that I fixed in SMILES with a 'AtomMassStrict' flag. I would like some clarification on the following:
What exactly does/should IsotopeFactory.confgure do and why is it needed?
and
How do we represent 'natural abundance'?
The documentation is not clear: IsotopeFactor.configure(atom)
In many code examples (for example Groovy CDK) atoms are typed and configured. This makes it impossible to distinguish
[12CH4]
vs[CH4]
as[CH4]
comes in with anull
atom mass which then get's configured to[12CH4]
however to avoid this ugly-ness in SMILES we check what the 'major' isotope is (12C) and omit it. Thusboth
[12CH4]
and[CH4]
become[CH4]
. To work round this the 'strict' flag treats 'null' as the undefined and all carbon 12. This means you can round trip the listed SMILES providing you don't callIsotopeFactory.configure()
. If you do callIsotopeFactory.configure(atom)
then[12CH4]
and[CH4]
both become[12CH4]
.The same happens with the InChI, at the moment:
[12CH4]
and[CH4]
becomesInChI=1S/CH4/h1H4
[12CH4]
should beInChI=1S/CH4/h1H4/i1+0
[CH4]
should beInChI=1S/CH4/h1H4
IIRC the Molfile calls configure on input so the information is always lost.
I would like to propose deprecating the
IsotopeFactory.configure()
to avoid these problems or perhaps changing it such that 'null' does take default major isotope...The text was updated successfully, but these errors were encountered: