Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byproduct generated by multiple R groups SMIRKS pattern #1077

Closed
biotech7 opened this issue May 16, 2024 · 8 comments
Closed

Byproduct generated by multiple R groups SMIRKS pattern #1077

biotech7 opened this issue May 16, 2024 · 8 comments

Comments

@biotech7
Copy link

biotech7 commented May 16, 2024

Hi,
As to Sharpless Allylic Amination reaction, SMIRKS pattern with multiple R groups was compiled, as follows:
[#6:3]-[#6:2]=[#6:1].[$([#6]-c1ccc(cc1)S(=O)(=O)[#7:6]=[Se:5]=[#7:4]S(=O)(=O)c1ccc(-[#6])cc1),$([#6]C([#6])([#6])[#7:6]=[Se:5]=[#7:4]C([#6])([#6])[#6])]>>[*:7]-[#7H:4]-[#6H0,#6H1:3]-[#6:2]=[#6:1]
reaction scheme pattern as:
1715843799047

test reaction smiles: C=C1CCCC1.N(S(C1C=CC(C)=CC=1)(=O)=O)=[Se]=NS(C1C=CC(C)=CC=1)(=O)=O
result: C=C1[CH2](CCC1)N*.N(S(c1cc[c]cc1)(=O)=O)=[Se]=NS(c2ccc(C)cc2)(=O)=O
By running Smirks.apply(mol), byproduct(or unreasonable product) was generated like: N(S(c1cc[c]cc1)(=O)=O)=[Se]=NS(c2ccc(C)cc2)(=O)=O

Is it due to unreasonable SMIRKS pattern or a bug?

CDK: cdk v2.10 SnapShot
java: jdk17.0.7
ide: IDEA 2024.1

@johnmay
Copy link
Member

johnmay commented May 19, 2024

In SMIRKS if you don't remove it in the pattern it doesn't get removed. However there is an option which I think does what you want:

SmirksOption.REMOVE_UNMAPPED_FRAGMENTS

https://github.com/cdk/cdk/blob/main/tool/smarts/src/main/java/org/openscience/cdk/smirks/SmirksOption.java#L75

This is on if you switch to "RDKit" flavour but that will also still updating your valences automatically as well which can be problematic

@biotech7
Copy link
Author

Ok, I'll verify it later. In organsynthesis practices, unreasonable products in a reaction scheme can lead to significant misunderstandings.

@johnmay
Copy link
Member

johnmay commented May 20, 2024

SMIRKS is like a programming language, doing things you didn't ask for is dangerous. But yes it is a common requirement that users want to write loose patterns that will then remove parts they didn't care about.

@biotech7
Copy link
Author

In production env., convertion of some reaction to proper SMIRKS pattern proves to be particularly challenging. e.g.: Shiina Macrolactonization reaction,reaction scheme partially like:
1
and its patterns:
[$([#8:1]-[#6:2]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4]),$([#8:1]-[#6:2]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:3](-[#8:5])=[O:4])]>>[$([O:4]=[#6:3]-1-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8:1]-1),$([O:4]=[#6:3]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6:2]-[#8H0:1]-1)]
but it generated unreasonable product in terms of reactant: OC(CCCCCCCCCCCCCCO)=O
Could the failure of the transformation be attributed to the unmapped atoms on the right-hand side (RHS)?

@johnmay
Copy link
Member

johnmay commented May 21, 2024

That is completely not valid, you generated it with ChemAxon Marvin and there is no linkage between the left and right side since the link atoms are handled with recursive SMARTS. This is a perfectly reasonable reaction SMARTS for matching substructures but you can't use it as a transform (SMIRKS) - these are often confused - see also: https://efficientbits.blogspot.com/2018/04/rdkit-reaction-smarts.html.

To simplify why it doesn't work what you intended was:

(L2>>L2) or (L3>>L3) or (L4>>L4) etc

but you got:

(L2 or L3 or L4)>>(L2 or L3 or L4)

There is nothing linking the left to the right, so if you match L3 it doesn't no add on L3. In general recursive definitions in the product are ignored. You need to split this out in to separate patterns - and for good measure make sure all atoms are mapped otherwise it will delete and add them back in.

[#8:1]-[#6:2]-[#6:3]-[#6:4](-[#8:5])=[O:6]>>[O:6]=[#6:4]-1-[#63:]-[#6:2]-[#8:1]-1 L1
[#8:1]-[#6:2]-[#6:3]-[#6:4]-[#6:5](-[#8:6])=[O:7]>>[O:7]=[#6:5]-1-[#6:4]-[#6:3]-[#6:2]-
[#8:1]-1 L2
[#8:1]-[#6:2]-[#6:3]-[#6:4]-[#6:5]-[#6:6](-[#8:7])=[O:8]>>[O:8]=[#6:6]-1-[#6:5]-[#6:4]-[#6:3]-[#6:2]-
[#8:1]-1 L3
...

Make sense?

@biotech7
Copy link
Author

Got it. Separated link nodes and mapped RHS are both necessary.
At the point of Shiina Macrolactonization reaction,the number of link nodes and atom types are both arbitrary, which increases the difficulty of compilation of universal/corresponding pattern for intramolecular reaction. So, is there any simple algorithm/strategy for compiling such kind of multi-pattern intramolecular reaction?

@johnmay
Copy link
Member

johnmay commented May 22, 2024

You should use more of the expressions, v/D/X/H to constrain the pattern and you can do a generalised one like this. Any atom which isn't present in the pattern is left alone.

You also should be modifying hydrogen counts on oxygen no?

Something like this perhaps:

([O;H1+0,H0-:1][Cv4H2+0:2].[Cv4H2+0:3][Cv4H0+0:4]([O;H1+0,H0-:5])=[OD1H0+0:6])>>[O:6]=[C:4]([OH0+0:1][C:2])[C:3]

@biotech7
Copy link
Author

Thanks,John. Maybe it's a challenging job to summarized a unique algorithm for intramolecular reaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants