-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The drawbacks of backtracking #2
Comments
Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097). The common name is in my opinion formally ambiguous. Unfortunately OPSIN does not currently detect ambiguity in chemical names and instead tends to stop looking for possibilities as soon as it has found a sensible outcome. The heuristic that is employed is to start from the rightmost group in the bracket and work right to left checking whether the group has the desired locant. I have changed this heuristic to first check the adjacent group when the following criteria are satisfied:
In my regression sets, especially a set of polymer names, this change makes a uniformly positive improvement, albeit it only effects names that are formally ambiguous. Thanks for the bug report. The fixed version is up on the web service. Let me know if it doesn't perform as expected. Daniel P.S. OPSIN only actually generates one parse for this name as at there is no ambiguity in tokenizing this name/assigning meaning to the tokens. |
Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097). 1-pentyl-3-(1-methylphenylacetyl)indole is probably interpreted incorrectly (as in it produces a structure rather than a valency error). OPSIN currently doesn't treat phenyl as being explicitly phen-1-yl, and it really should do. |
Original comment by Anonymous. Outstanding! Fastest Debug Ever. Thanks for the explanation as well, Daniel. I had just read your paper and had alternative parses on my mind, but in future I'll stick to reporting the symptoms and leave the diagnosis to you. The problem is solved, but just fyi, here are a few more curious/degenerate cases:
All the best, and props to the OPSIN team for this extraordinarily useful service. /Steve |
Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097). Yeah alternative parses are an important part of OPSIN but as you can see from the pie chart in that paper multiple parses are fortunately moderately rare. Ideally multiple parses should only happen if there are multiple ways of interpreting something that are either non trivial or impossible to disambiguate between e.g. 2-methylthiophenyl which could be 2-(methylthio)phenyl or 2-methyl-thiophen-yl.
The change which which stops phenyl being broken down into phen and yl is now live. Thanks for the feedback |
Original report by Steve Chapman (Bitbucket: isomerdesign, ).
Synthetic cannabinoid JWH-251 is commonly named "1-pentyl-3-(2-methylphenylacetyl)indole" which is parsed as "1-pentyl-3-(2-methyl-2-phenylacetyl)indole" rather than "1-pentyl-3-[(2-methylphenyl)acetyl]indole." The latter name evokes the correct depiction of JWH-251. (Credit Lee Fadness for this discovery)
However "1-pentyl-3-(3-methylphenylacetyl)indole" is parsed as "1-pentyl-3-[(3-methylphenyl)acetyl]indole," presumably because the parse that worked for the prior name fails for this one. Pragmatic but //inconsistant//.
Also, this variant parses correctly and consistently regardless of the locant: "1-pentyl-3-(2-methylphenacyl)indole".
The text was updated successfully, but these errors were encountered: