Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One vs. two word esters #4

Closed
dan2097 opened this issue May 20, 2011 · 6 comments
Closed

One vs. two word esters #4

dan2097 opened this issue May 20, 2011 · 6 comments
Labels
enhancement New feature or request major

Comments

@dan2097
Copy link
Owner

dan2097 commented May 20, 2011

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


Omitting the space makes a difference:

[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl]acetate

[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl] acetate

Simper cases exhibit the same behaviour: hexylacetate vs hexyl acetate.

@dan2097
Copy link
Owner Author

dan2097 commented May 20, 2011

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


In IUPAC nomenclature formally only the space separated version is an ester.
In the non-space separated case I do not think there is sufficient information to determine that the ester interpretation was intended.
The absence of a counter ion does make the non-ester suspicious but ultimately if someone wants to talk about an "[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl]acetate" ion they should be able to.
Hence I'm leaning towards working as intended. There probably is room for improvement in ester names with more than one substituent e.g. "ethyl2-aminoacetate" which clearly was intended to be an ester even though the space is missing.

@dan2097
Copy link
Owner Author

dan2097 commented May 20, 2011

Original comment by Steve Chapman (Bitbucket: isomerdesign, ).


I agree with each of your points. The worry in this case, for instance, is that substance is listed (correctly) in the Misuse of Drugs Act but incorrectly in the ACMD report that recommended its addition: http://www.homeoffice.gov.uk/publications/alcohol-drugs/drugs/acmd1/acmd-report-agonists?view=Binary, causing confusion.

I suppose what I'd like is a Google-type intervention of the "did you mean finite **state **machine" when one mistypes finite **stale **machine, or //some //indication the name is suspect.

Another concern is the missing locant defaults to 2, e.g.. phenyldecanoate = 2-phenyldecanoate. Omitting a locant seems increasingly frowned upon by IUPAC unless there is pretty much no possible ambiguity. Not so here. Consider the difference between 3-hexyl decanoate = hex-3-yl decanoate and 3-hexyldecanoate = 3-(hexyl)decanoate. Even if a missing locant does not defeat the parser////, couldn't it whine a little about it?

@dan2097
Copy link
Owner Author

dan2097 commented May 23, 2011

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


Adding detection for ambiguity would be nice although to do so rigorously is not completely straightforward e.g. hexyl is not ambiguous even thought there are non-equivalent carbons from which a carbon could be removed. I would be keen if ambiguity detection were to be introduced to keep to an absolute minimum the amount of false positives. A charge imbalance could be a good reason to produce a warning (although in some databases such structures do exist), but to actually suggest a cause/solution would require adding a rule to detect this particular problem.

While I would be happy to accept contributions to this area of the project I don't think I am going to be able to find the time to look into it personally (my PhD is currently focusing on the automatic extraction of chemical reactions).
I have started looking into the fused ring numbering problem you brought up and will update your original post when/if my generalisation of the code is successful.

@dan2097
Copy link
Owner Author

dan2097 commented May 26, 2011

Original comment by Steve Chapman (Bitbucket: isomerdesign, ).


Thank you, Daniel. I agree it's not a pressing issue--I just felt it should be noted, really. The fused ring numbering problem is more important.

@dan2097
Copy link
Owner Author

dan2097 commented May 26, 2011

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


I'm not sure whether or not its more important but from a completionist point of view the deficiency in fused ring numbering is very annoying.
The version of fused ring numbering I am playing with currently works with 3,4,5,6 membered rings in all combinations and ring sizes >6 involved in 2 or fewer rings. The code for aligning the ring system in the directions with most rings in a line seems to not work quite right yet with 5 member rings and possibly only considering two different variants of the 5 membered rings may not be sufficient for systems where the 5 membered ring is not part of the row with most rings.

@dan2097
Copy link
Owner Author

dan2097 commented Oct 3, 2011

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


I have added heuristics for treating cases where the space is missing as esters. This version is now up on the web service for testing.
The heuristics are:

  • The left most substituent in the wordrule must have no locant and the right most group must be an "ate" or "ite" group
  • If the left most substituent has no locant but other substituents do -->ester
  • If the substituent has the multiplier "mono" -->ester
  • If the parent group has no substitutable positions -->ester
  • If substitution is ambiguous -->ester
  • If the name is of the form alkyl(methanoate|ethanoate|formate|acetate) -->ester

The lattermost rule is required as there is only one possible position for substitution on these structures.

The detection of ambiguity is pretty good although not completely fool-proof (due to things like double bonds not having been formally assigned yet rather than problems with the atom environment perception algorithm). I'm a bit dubious about this heuristic as it can result in different interpretations of otherwise very similar names e.g. diethylmalonate -->not ester, diethylsuccinate -->ester, but ethylsuccinate --> not ester (as the position for the ethyl is unambiguous)

@dan2097 dan2097 closed this as completed Oct 3, 2011
@dan2097 dan2097 added major enhancement New feature or request labels Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request major
Projects
None yet
Development

No branches or pull requests

1 participant