Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stereo information unavailable when generating SMILES from IAtomcontainer #1013

Open
biotech7 opened this issue Sep 10, 2023 · 10 comments
Open
Assignees

Comments

@biotech7
Copy link

Based on this link https://egonw.github.io/cdkbook/stereo.html#sec:stereo:bond, a simple molecule was constructed. but stereo information unvalaible when converting molecule to smiles. code to reproduce.

IAtomContainer isomer = new AtomContainer();
        isomer.addAtom(new Atom("C"));
        isomer.addAtom(new Atom("Cl"));
        isomer.addAtom(new Atom("Br"));
        isomer.addAtom(new Atom("F"));
        isomer.addAtom(new Atom("I"));
        isomer.addBond(0, 1, IBond.Order.SINGLE);
        isomer.addBond(0, 2, IBond.Order.SINGLE);
        isomer.getBond(1).setStereo(IBond.Stereo.UP);
        isomer.addBond(0, 3, IBond.Order.SINGLE);
        isomer.getBond(2).setStereo(IBond.Stereo.UP);
        isomer.addBond(0, 4, IBond.Order.SINGLE);

        AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(isomer);
        CDKHydrogenAdder adder = CDKHydrogenAdder.getInstance(SilentChemObjectBuilder.getInstance());
        adder.addImplicitHydrogens(isomer);

        SmilesGenerator generator = new SmilesGenerator(SmiFlavor.UniversalSmiles);
        System.out.println(generator.create(isomer));

output: C(Br)(Cl)(F)I

Another trial was conducted as following procedure:stereo-included smiles was introduced and converted to IAtomcontainer, finally
"reversed" to smiles string. stereo info showed up. code to reproduce:

        String smi = "F[C@](Cl)(Br)I";
        IChemObjectBuilder builder = new SilentChemObjectBuilder();
        SmilesParser parser = new SmilesParser(builder);
        isomer = parser.parseSmiles(smi);
        System.out.println(generator.create(isomer));

output: C@@(Cl)(F)I

why the stereo info omitted in the first example?

env. information:
OS: windows 11
CDK: 2.9
IDE: IDEA 2023.2

@egonw
Copy link
Member

egonw commented Sep 10, 2023

@johnmay, I guess .setStereo(IBond.Stereo.UP); needs to be converted to another stereo model, which is used by the SmilesGenerator, right? Because SmiFlavor.UniversalSmiles does include stereochemistry, I just checked.

@johnmay
Copy link
Member

johnmay commented Sep 11, 2023

Well there are no coordinates so up/down bonds are meaningless, I mean I'm not sure how you would expect it to guess what you meant by giving it two UP bonds?

IAtomContainer isomer = new AtomContainer();
isomer.addAtom(new Atom("C"));
isomer.addAtom(new Atom("Cl"));
isomer.addAtom(new Atom("Br"));
isomer.addAtom(new Atom("F"));
isomer.addAtom(new Atom("I"));
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);
        
// if you have an implicit hydrogen you need to put the 'focus' in the list where you want it
IAtom chiralAtom = isomer.getAtom(0); // the 'focus'
isomer.addStereoElement(
  new TetrahedralChirality(
    chiralAtom,
    isomer.getConnectedAtomsList(chiralAtom).toArray(new IAtom[4]),
    ITetrahedralChirality.Stereo.CLOCKWISE));

Oh you should also use the builder otherwise you'll have a log of pain in future when APIs change, the new Atoms are OK but the builder.newAtomContainer() gives you back an IAtomContainer (which is actually an AtomContainer2). The constructors on the Atom/AtomContainer should be protected to avoid you usage but unfortunately the builder was added later. The AtomContainer2 is expected by a lot of code now and you will get some warnings telling you to use it.

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
IAtomContainer isomer = builder.newAtomContainer();
isomer.addAtom(new Atom("C")); // builder.newAtom(IAtom.class, "C");
isomer.addAtom(new Atom("Cl"));
isomer.addAtom(new Atom("Br"));
isomer.addAtom(new Atom("F"));
isomer.addAtom(new Atom("I"));
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);

// if you have an implicit hydrogen
IAtom chiralAtom = isomer.getAtom(0); // the 'focus'
isomer.addStereoElement(
  new TetrahedralChirality(
    chiralAtom,
    isomer.getConnectedAtomsList(chiralAtom).toArray(new IAtom[4]),
    ITetrahedralChirality.Stereo.CLOCKWISE));

@johnmay
Copy link
Member

johnmay commented Sep 11, 2023

Oh on the main branch you can actually do this which is much cleaner:

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
IAtomContainer isomer = builder.newAtomContainer();
isomer.newAtom(IAtom.C, 0);
isomer.newAtom(IAtom.Cl, 0);
isomer.newAtom(IAtom.Br, 0);
isomer.newAtom(IAtom.I, 0);
isomer.newAtom(IAtom.F, 0);
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);

@egonw
Copy link
Member

egonw commented Sep 11, 2023

Well there are no coordinates so up/down bonds are meaningless,

Oh, duh :/

@biotech7
Copy link
Author

biotech7 commented Sep 12, 2023

@egonw @johnmay But in most cases, many "focuses" have no saturated bonds during constructing molecules until final implicit hydrogens addition. so could it be possible to set stereochemistry properties of bond/atom in a molecule rigidly before constructing the entire molecule?
This idea originated from a dilemma of parsing cdxml format strings, such as <b Z="102" B="13" E="2" Order="1" Display="WedgeBegin"/>, which contains few bond/atom info for setting up stereochemistry properties effectively. and occasionally resulting in NullPointerException: A carrier was undefined when creating new TetrahedralChirality().

@johnmay
Copy link
Member

johnmay commented Sep 12, 2023

NullPointerException: A carrier was undefined

Got a link to your code? I feel like with have https://en.wikipedia.org/wiki/XY_problem.

But in most cases, many "focuses" have no saturated bonds during constructing molecules until final implicit hydrogens addition.

From ChemDraw it's only the carbons atoms which don't have a hydrogen count defined. You can handle those much more efficiently than using the CDK's atom typer/hydrogen added. Generally it's preferred if the I/O does this since the valence model depends on the format. For example coming from MOLfile vs SMILES we use a different valence model.

Anyways that is some what redundant. Your error is because you MUST have 4 neighbours for tetrahedral stereochemistry, if there is an implicit hydrogen or lone pair you can set it with. You don't actually need the implicit hydrogen count to know an atom with a wedge only has 3 neighbours.

From CDXML if you have coordinates you can just treat it like a MOLfile (see MDLV2000Reader) and call the StereoElementFactory to assigned stereochemistry from 2D+wedges/3D coordinates. I actually think the more robust way of doing it though is ignoring the wedges an using the Geometry/BondOrdering attribute:

<n
 id="1055"
 p="1157.01 394.05"
 Z="1055"
 Geometry="Tetrahedral"
 AS="R"
 BondOrdering="1063 1064 0 1066"
/>

That's almost a one-to-one translation to the CDK data structure but with bonds rather than atoms. Note the 0 is where the implicit part goes.

@biotech7
Copy link
Author

biotech7 commented Sep 13, 2023

I might need to reconsider my strategy for effective stereo-related molecular building. But fewer choices were found based on JAVA development env..

@johnmay
Copy link
Member

johnmay commented Sep 13, 2023

I might need to reconsider my strategy for effective stereo-related molecular building. But fewer choices were found based on JAVA development env..

?

You do may me think maybe I should ad CDX/CDXML support in CDK. I've done a fair amount of work (in Java) on it for work but more the aspect of cleaning stuff up before getting the chemistry out. Extracting the chemical graph to SMILES etc is quite simple.

https://nextmovesoftware.com/posters/May_SketchySketches_Sheffield_201607.pdf
https://nextmovesoftware.com/posters/Mayfield_SketchySketchesII_Sheffield_202306.pdf

@biotech7
Copy link
Author

The greater the ability, the heavier the task. thanks for your best-practice documents, enriching me much more!

@johnmay
Copy link
Member

johnmay commented Mar 9, 2024

OK to close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants