Making a template library
Internal documentation - Non-public API
This page will briefly show how to create ring templates for use with the StructureDiagramGenerator. For the main ring system of a molecule being laid out, the ring is first checked against a template library using an identity match. The identity match is very fast allowing a large template library to be searched quickly. The primary use is to lay out rings in de facto orientations.
In the cdk-build-util
project there are two runnable classes that assisting in creating the template library.
# ExtractTemplates
mvn exec:java -Dexec.mainClass="org.openscience.cdk.layout.ExtractTemplates" -Dexec.classpathScope=compile
# SDfileToTemplateLibrary
mvn exec:java -Dexec.mainClass="org.openscience.cdk.layout.SDfileToTemplateLibrary" -Dexec.classpathScope=compile
The first class, ExtractTemplates
takes an input SDfile and strips back molecules to their core ring structure. Each ring extract from a molecule is stored at three levels. These fragments are similar to that described by [Bemis G and Murcko M. (1996)] (http://www.ncbi.nlm.nih.gov/pubmed/8709122) (see MurckoFragmenter
) but their usage is more similar to the templating of peeled rings described by Clark A (2005). The inclusion in the CDK was prompted through discussions with Roger Sayle.
- 1 - skeleton with stubs (substituents)
- 2 - skeleton
- 3 - anonymous
For example, given caffeine
The skeleton with stubs entry is:
The skeleton entry is:
The anonymous entry is:
It is important to note that the implicit hydrogen count is set to '0' for all atoms. Note that in the skeleton with stubs depiction the methyls and carbonyls have been modified.
Each molecule with exactly one ring system in the input is processed like this. Each level is indexed, sorted by frequency and output to an SDfile. The SDfile output is derived from the input name.
mvn exec:java
-Dexec.mainClass="org.openscience.cdk.layout.ExtractTemplates"
-Dexec.classpathScope=compile
-Dexec.args=~/chebi-118.sdf
If the input was ~/chebi-118.sdf
and the output will be ~/chebi-118-templates.sdf
.
This file should be scrutinized for acceptability of templates and some may need orientating. Viewed in MarvinView it is clear that the atoms have abnormal valence. This is expected by the IdentityTemplateLibrary
and should not be fixed.
The SDfile created in the previous step (or created manually) is convenient to manipulate but inefficient for loading and querying. To create a lightweight SMILES file that the IdentityTemplateLibrary
can load, the SDfileToTemplateLibrary
is used.
mvn exec:java
-Dexec.mainClass="org.openscience.cdk.layout.SDfileToTemplateLibrary"
-Dexec.classpathScope=compile
-Dexec.args="~/chebi-118-templates.sdf ~/chebi-118-templates.smi"
The contents of ~/chebi-118-templates.smi
is formatted with the canonical SMILES followed by the X and Y coordinate for each atom ordered the same as the SMILES string.
[C]1[C][C][C]2O[C]3[C][C][C][C][C]3[N][C]2[C]1 3.897, -0.750, 3.897, 0.750, 2.598, 1.500, 1.299, 0.750, -0.000, 1.500, -1.299, 0.750, -2.598, 1.500, -3.897, 0.750, -3.897, -0.750, -2.598, -1.500, -1.299, -0.750, -0.000, -1.500, 1.299, -0.750, 2.598, -1.500
[C][C]1[C]([O])[C]2[C]3[C]4[C][N][C]5[C][C][C][C]([C][C]3C([C])([C])N2[C]1[O])[C]54 4.932, -2.304, 3.743, -1.400, 2.316, -1.864, 1.862, -3.290, 1.435, -0.650, 0.000, -0.191, -1.317, -0.944, -1.737, -2.392, -3.237, -2.441, -3.745, -1.027, -5.094, -0.371, -5.202, 1.126, -3.956, 1.961, -2.608, 1.301, -1.317, 2.048, 0.003, 1.307, 1.442, 1.773, 2.681, 2.628, 1.341, 3.274, 2.316, 0.563, 3.743, 0.100, 4.956, 0.957, -2.557, -0.164
[C]1[C][C][C]2[C]N3[C][C][C]4[C]5[C][C][C][C][C]5[N][C]4[C]3[C][C]2[C]1 5.719, 0.239, 5.719, 1.739, 4.420, 2.489, 3.121, 1.739, 1.822, 2.489, 0.523, 1.739, -0.776, 2.489, -2.075, 1.739, -2.075, 0.239, -3.189, -0.765, -4.681, -0.608, -5.563, -1.821, -4.953, -3.192, -3.461, -3.348, -2.579, -2.135, -1.087, -1.978, -0.776, -0.511, 0.523, 0.239, 1.822, -0.511, 3.121, 0.239, 4.420, -0.511
[C]1[C][C][C]2[C][C]3[C][C]4[C]([C][C][C]5[C][C]6[C][C][C][C][C]6[C][C]54)[C][C]3[C][C]2[C]1 -7.095, -1.962, -7.095, -0.462, -5.796, 0.288, -4.497, -0.462, -3.198, 0.288, -1.899, -0.462, -0.600, 0.288, 0.699, -0.462, 0.699, -1.962, 1.999, -2.712, 3.298, -1.962, 3.298, -0.462, 4.597, 0.288, 4.597, 1.788, 5.896, 2.538, 5.896, 4.038, 4.597, 4.788, 3.298, 4.038, 3.298, 2.538, 1.999, 1.788, 1.999, 0.288, -0.600, -2.712, -1.899, -1.962, -3.198, -2.712, -4.497, -1.962, -5.796, -2.712
[C]1[C][C]2[C][C][C]3[C][C][C]4[C][C][C]5[C][C][C]6[C][C][C]1[C]7[C]2[C]3[C]4[C]5[C]67 3.897, -0.750, 3.897, 0.750, 2.598, 1.500, 2.598, 3.000, 1.299, 3.750, -0.000, 3.000, -1.299, 3.750, -2.598, 3.000, -2.598, 1.500, -3.897, 0.750, -3.897, -0.750, -2.598, -1.500, -2.598, -3.000, -1.299, -3.750, -0.000, -3.000, 1.299, -3.750, 2.598, -3.000, 2.598, -1.500, 1.299, -0.750, 1.299, 0.750, -0.000, 1.500, -1.299, 0.750, -1.299, -0.750, -0.000, -1.500
[C]1[N][N][N][N]1 -1.214, 0.394, -0.750, -1.032, 0.750, -1.032, 1.214, 0.394, 0.000, 1.276
[O][C]1[C]2[C][C][C][C][C]2[C]3[C][C][C][C][C]13 -0.002, -3.116, -0.008, -1.608, -1.218, -0.723, -2.686, -1.026, -3.690, 0.101, -3.215, 1.527, -1.733, 1.832, -0.757, 0.701, 0.749, 0.701, 1.759, 1.810, 3.224, 1.493, 3.682, 0.073, 2.678, -1.038, 1.216, -0.729
[C]1[C][C][C]2[C]([C]1)[C]O[C][C]3[C]4[C][C][C][C]4[C][C][C]23 -4.578, -0.809, -4.136, 0.625, -2.674, 0.959, -1.653, -0.141, -2.096, -1.574, -3.558, -1.908, -1.250, -2.814, 0.245, -2.926, 1.266, -1.826, 1.042, -0.343, 2.341, 0.407, 3.768, -0.056, 4.649, 1.157, 3.768, 2.371, 2.341, 1.907, 1.042, 2.657, -0.257, 1.907, -0.257, 0.407