Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isotope Philosophy in CDK #330

Closed
johnmay opened this issue Jun 6, 2017 · 3 comments
Closed

Isotope Philosophy in CDK #330

johnmay opened this issue Jun 6, 2017 · 3 comments

Comments

@johnmay
Copy link
Member

johnmay commented Jun 6, 2017

Daniel's being doing some validation of InChI's generated by CDK and noticed the isotopes get lost. This is a known problem that I fixed in SMILES with a 'AtomMassStrict' flag. I would like some clarification on the following:

What exactly does/should IsotopeFactory.confgure do and why is it needed?
and
How do we represent 'natural abundance'?

The documentation is not clear: IsotopeFactor.configure(atom)

In many code examples (for example Groovy CDK) atoms are typed and configured. This makes it impossible to distinguish [12CH4] vs [CH4] as [CH4] comes in with a null atom mass which then get's configured to [12CH4] however to avoid this ugly-ness in SMILES we check what the 'major' isotope is (12C) and omit it. Thus

both [12CH4] and [CH4] become [CH4]. To work round this the 'strict' flag treats 'null' as the undefined and all carbon 12. This means you can round trip the listed SMILES providing you don't call IsotopeFactory.configure(). If you do call IsotopeFactory.configure(atom) then [12CH4] and [CH4] both become [12CH4].

The same happens with the InChI, at the moment:

[12CH4] and [CH4] becomes InChI=1S/CH4/h1H4
[12CH4] should be InChI=1S/CH4/h1H4/i1+0
[CH4] should be InChI=1S/CH4/h1H4

IIRC the Molfile calls configure on input so the information is always lost.

I would like to propose deprecating the IsotopeFactory.configure() to avoid these problems or perhaps changing it such that 'null' does take default major isotope...

@egonw
Copy link
Member

egonw commented Jun 6, 2017 via email

@johnmay
Copy link
Member Author

johnmay commented Jun 6, 2017

Good point. I always assumed (and I think all the code does too), that
when no isotope info is given, it actually means 'natural abundance'...

Agreed and I think that is the way forward. We do however need to update certain parts to 'fix' this:

I think this condition should be removed:
https://github.com/cdk/cdk/blob/master/base/core/src/main/java/org/openscience/cdk/config/IsotopeFactory.java#L314-L315
It may break other areas but if a used wants to do this can do it explicitly with getMajorIsotope() and then configure based on that.

Also I think these methods are wrong... AtomContainerManipulator (alos in MolecularFormulaManipulator). There needs to a method that will add the exact mass of isotopes (if defined) otherwise if null (take their natural abundance weight).

@johnmay
Copy link
Member Author

johnmay commented Jun 6, 2017

Another quirk... is elements like [Hs] for which we have the following data:

263 0.0
264 0.0
265 0.0
266 0.0
267 0.0
268 0.0
269 0.0
270 0.0
271 0.0
272 0.0
273 0.0
274 0.0
275 0.0
276 0.0
277 0.0

If asked to get the major isotope then it returns the first one... which isn't correct.

I'll make some patches and submit a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants