-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CxSMILES compatibility #7
Comments
What do you think is wrong with the first one?
|
Nothing - it's the only one in the set that works :-) |
Looking much better now - here's one that's totally broken once removing the lp part: Original: Hand edited: |
So ... getting there ... here's what it looks like in development (trying out a few different options) vs Depict now - I'm personally not a big fan of this third representation, but these are the only examples that still don't look quite the same (ignoring the shading for the moment - although in the 3rd case the missing shading means loss of information). Some SMILES:
|
? still don't look quite the same - that's not the goal, my goal was to get them looking like they do in patents - all the information is captured and so can always be rendered manually if desired. I don't like the shading as it's not clear when the shaded regions overlap:
CAS have a better semantic depiction but again in the same we don't draw the electrons the goal is not to exactly draw what is semantically captured underneath.
|
Not in CDK's handling and I don't think so in CXSMILES also, you can do constraints on Rgroups (R group logic) but not sure about the repeat variation. Completely makes sense though and you'd want it for a general Markush match. |
Let's remove the junk first:
when it crossed a single bond it doesn't seem to do the check for which side to put it:
|
Not just the bond order? This also ends up on the wrong side:
NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br<http://O.Br>* |m:21:13.14.15.16<http://13.14.15.16>|
(debugging on my phone so it's just a wild guess - which side is the issue? ChemAxon or CDK?)
On Wed, Jan 10, 2018 at 5:31 PM +0100, "John Mayfield" <notifications@github.com<mailto:notifications@github.com>> wrote:
Let's remove the junk first:
NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br* |m:21:13.14|
when it crossed a single bond it doesn't seem to do the check for which side to put it:
NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br* |m:21:13.14.15|
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#7 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD4a_R9rUpu8W8BIcKhGNumNpSsnash2ks5tJOYCgaJpZM4RXbK6>.
|
CDK issue |
I believe I've come up with a fix for the sided-ness of the positional variation. The algorithm currently looks at the atoms where it can go and compute the centre from that. When there is a single bond there is no sided-ness and so it's arbitrary. You can see the 'center of mass' marked in the following examples with an 'X'. I used the existing APIs to add the colors so we can see where it's attaching. |13.14| |13.14.15| |13.14.15.16| |
I'm adding comments to this as we have a new case: which does not work in CDK Depict. I am also not yet a big fan of this style of representing the problem.
What are your thoughts/suggestions? THanks! |
Yes R groups are not supported. I have some internal code at NextMove that does handle them but the issue is the CDK has an explicit type (RGroupQuery) for handling this. Trying to convert the ChemAxon definitions into this is tricky as the concepts don't match exactly (RGroupQuery doesn't store attachments explicitly for example) - Hence for my internal code I just store them on a property of a molecule. |
[H]OCCO.C* |m:6:3.2,Sg:n:1,2,3::ht| looks better |
I think this is possibly a bug in the ChemAxon export. Since the positional variation is part of the repeat then the whole thing needs to be included in the repeat, not just one of the atoms. Of course this depends how you represent the attachment, but since it is present as real node in the SMILES
I was talking to Greg about this at the ICCS, CXSMILES really is poorly designed. It can't for example differentiate spiro vs linear repeats because it only stores the atoms and not the crossing bonds: |
As far as I can tell there is one still broken - will open a separate issue for that. |
We're trying to get our CxSMILES compatible between ChemAxon and CDK, testing visually with CDK Depict. We've run into this - the top example works, but no others (of 27 we've tried) do. Below a selection of the smallest. All 26 SMILES that fail have this "lp" field in the extended SMILES bit, the one that works doesn't. If I select "title" the extended part is just printed out as the title, i.e. it's not recognising it somehow.
C*.CC1=CC=CC=C1 |c:4,6,t:2,m:1:4.5.6|
C*.S=C1NC2=CC=CC=C2N1 |c:6,8,t:4,lp:2:2,4:1,11:1,m:1:8.9|
CCC1=CC=CC=C1.Br*.Br* |c:4,6,t:2,lp:8:3,10:3,m:9:3.4,11:4.5.6.7|
Cl*.Cl*.ClCC1=CC=CC=C1 |c:6,8,t:4,lp:0:3,2:3,4:3,m:1:7.8,3:8.9.10.11|
*C=O.C1CC2C3CCC(C3)C2C1 |lp:2:2,m:0:3.4.5.6.7.8.9.10.11.12|
The text was updated successfully, but these errors were encountered: