Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced stereochemistry support #140

merged 13 commits into from Feb 14, 2021

Enhanced stereochemistry support #140

merged 13 commits into from Feb 14, 2021


Copy link

@johnmay johnmay commented Feb 7, 2021

Adds a new "stereo group" field to stereogenic atoms, the group can either be Rac(emic), Rel(ative), or Abs(olute). Normally enhanced stereo support has sub-groups, e.g. Rac1/&1, Rac2/&2, etc however to my knowledge there is no official way to represent these in IUPAC - maybe one could infer it with bracket nesting. I think more likely is the most common use cases would be mixing the different groups rather than different sub-groups and that is supported by IUPAC. For example it makes sense you can have some atoms with absolute stereo and some as racemic and/or relative.

  • An open question I had was whether "rac-" and "rel-" prefixes in isolation should be allowed on structures with multiple stereocenters. I originally handled these by just putting in an unlocanted "dummy" element which meant only the first atom encoutered would be set, I then settled on simply adding a new type "REL/RAC" that instruct the stereo handler to set every stereo atom it can find. I think maybe it should be a warning if you set more than one atom config in this function?
  • The SMILESWriter was hard to unit test since the reader doesn't actually read stereochemistry, but since the changes are minimal should be easy to confirm it's correct.
  • The CXSMILES documentation reports the |r| as the "relative flag", however it's interpretation is to be as though the MOLfile chiral flag is 0. BIOVIA document that these cases should be read as racemic (see Page 251). So I think ChemAxon just goofed it up - note they also distinguish ABS from normal SMILES stereo when matching which doesn't make sense - oh well. Anyways as you can see if all atoms end up being &1 we emit a r as it is cleaner but perhaps given the possibly ambiguity may be confusing.


Here are some generated outputs (running live on CDK depict).


C1(=CC=CC=C1)[C@@H](C)O |$_AV:1;2;3;4;5;6;1;2;O$,r|


CN[C@H]([C@H](O)C1=CC=CC=C1)C |$_AV:1;N;2;1;O;1;2;3;4;5;6;3$,o1:2,3|


CN[C@H]([C@H](O)C1=CC=CC=C1)C |$_AV:1;N;2;1;O;1;2;3;4;5;6;3$,a:2,&1:3|

(1R and S)-1-(1-pentyl-1H-pyrazol-5-yl)ethanol

C(CCCC)N1N=CC=C1[C@@H](C)O |$_AV:1;2;3;4;5;1;2;3;4;5;1;2;O$,r|

…Rac(emic) or Rel(ative). We don't need numbered groups (e.g. &1, &2) because IUPAC doesn't currently have away to specify this (to my knowledge). SMILES output does what it always did, CXSMILES includes the extra information. SRac/SRel flags could be activated to generate a non-standard InChI.
…the element we can just store RS as R (group=Rac) and S (group=Rac). This also then mirros the relative case where "R*" was stemmed to "R".
Copy link

dan2097 commented Feb 8, 2021

Thanks looks good, as discussed a few small changes would be good:

  • CML/StdInChI writer should have the same behaviour as SMILES. I'm a bit dubious as to whether non-standard InChI should allow it either as this is only represented as a flag on the entire molecule, while in principle the molecule could have stereogroups with different relationships. A mixture of absolute and relative stereo also can't be expressed
  • initAll sets all the notExplicitlyDefinedStereoCentreMap, attemptAssignmentOfCisTransRingStereoToFragment when applying cis/trans tries to find at least one atom that's not explicitly defined e.g. "rac-Cis-N4-(2,2-dimethyl-3,4-dihydro-3-oxo-2H-pyrido[3,2-b][1,4]oxazin-6-yl)-N2-[6-[2,6-dimethylmorpholino)pyridin-3-yl]-5-fluoro-2,4-pyrimidinediamine" I'm a bit dubious whether initAll should be remving the stereocenter from notExplicitlyDefinedStereoCentreMap (although attemptAssignmentOfCisTransRingStereoToFragment isn't checking this map anyway, maybe it should)
  • In a small number of names the relative * can be preceded by a redundant ^ e.g. "rac-(3R^,4R^)-3,4-dimethyladipic acid", this should be ignored/removed.

@dan2097 dan2097 merged commit b371404 into dan2097:master Feb 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants