Skip to content

Making a template library

johnmay edited this page Sep 10, 2014 · 9 revisions

Internal documentation - Non-public API

This page will briefly show how to create ring templates for use with the StructureDiagramGenerator. For the main ring system of a molecule being laid out, the ring is first checked against a template library using an identity match. The identity match is very fast allowing a large template library to be searched quickly. The primary use is to lay out rings in de facto orientations.

In the cdk-build-util project there are two runnable classes that assisting in creating the template library.

# ExtractTemplates
mvn exec:java -Dexec.mainClass="org.openscience.cdk.layout.ExtractTemplates" -Dexec.classpathScope=compile
# SDfileToTemplateLibrary
mvn exec:java -Dexec.mainClass="org.openscience.cdk.layout.SDfileToTemplateLibrary" -Dexec.classpathScope=compile

ExtractTemplates

The first class, ExtractTemplates takes an input SDfile and strips back molecules to their core ring structure. Each ring extract from a molecule is stored at three levels. These fragments are similar to that described by [Bemis G and Murcko M. (1996)] (http://www.ncbi.nlm.nih.gov/pubmed/8709122) (see MurckoFragmenter) but their usage is more similar to the templating of peeled rings described by Clark A (2005). The inclusion in the CDK was prompted through discussions with Roger Sayle.

  • 1 - skeleton with stubs (substituents)
  • 2 - skeleton
  • 3 - anonymous

For example, given caffeine

The skeleton with stubs entry is:

The skeleton entry is:

The anonymous entry is:

It is important to note that the implicit hydrogen count is set to '0' for all atoms. Note that in the skeleton with stubs depiction the methyls and carbonyls have been modified.

Each molecule with exactly one ring system in the input is processed like this. Each level is indexed, sorted by frequency and output to an SDfile. The SDfile output is derived from the input name.

mvn exec:java 
-Dexec.mainClass="org.openscience.cdk.layout.ExtractTemplates" 
-Dexec.classpathScope=compile 
-Dexec.args=~/chebi-118.sdf

If the input was ~/chebi-118.sdf and the output will be ~/chebi-118-templates.sdf.

This file should be scrutinized for acceptability of templates and some may need orientating. Viewed in MarvinView it is clear that the atoms have abnormal valence. This is expected by the IdentityTemplateLibrary and should not be fixed.

SDfileToTemplateLibrary

The SDfile created in the previous step (or created manually) is convenient to manipulate but inefficient for loading and querying. To create a lightweight SMILES file that the IdentityTemplateLibrary can load, the SDfileToTemplateLibrary is used.

mvn exec:java 
-Dexec.mainClass="org.openscience.cdk.layout.SDfileToTemplateLibrary" 
-Dexec.classpathScope=compile 
-Dexec.args="~/chebi-118-templates.sdf ~/chebi-118-templates.smi"

The contents of ~/chebi-118-templates.smi is formatted with the canonical SMILES followed by the X and Y coordinate for each atom ordered the same as the SMILES string.

[C]1[C][C][C]2O[C]3[C][C][C][C][C]3[N][C]2[C]1 3.897, -0.750, 3.897, 0.750, 2.598, 1.500, 1.299, 0.750, -0.000, 1.500, -1.299, 0.750, -2.598, 1.500, -3.897, 0.750, -3.897, -0.750, -2.598, -1.500, -1.299, -0.750, -0.000, -1.500, 1.299, -0.750, 2.598, -1.500
[C][C]1[C]([O])[C]2[C]3[C]4[C][N][C]5[C][C][C][C]([C][C]3C([C])([C])N2[C]1[O])[C]54 4.932, -2.304, 3.743, -1.400, 2.316, -1.864, 1.862, -3.290, 1.435, -0.650, 0.000, -0.191, -1.317, -0.944, -1.737, -2.392, -3.237, -2.441, -3.745, -1.027, -5.094, -0.371, -5.202, 1.126, -3.956, 1.961, -2.608, 1.301, -1.317, 2.048, 0.003, 1.307, 1.442, 1.773, 2.681, 2.628, 1.341, 3.274, 2.316, 0.563, 3.743, 0.100, 4.956, 0.957, -2.557, -0.164
[C]1[C][C][C]2[C]N3[C][C][C]4[C]5[C][C][C][C][C]5[N][C]4[C]3[C][C]2[C]1 5.719, 0.239, 5.719, 1.739, 4.420, 2.489, 3.121, 1.739, 1.822, 2.489, 0.523, 1.739, -0.776, 2.489, -2.075, 1.739, -2.075, 0.239, -3.189, -0.765, -4.681, -0.608, -5.563, -1.821, -4.953, -3.192, -3.461, -3.348, -2.579, -2.135, -1.087, -1.978, -0.776, -0.511, 0.523, 0.239, 1.822, -0.511, 3.121, 0.239, 4.420, -0.511
[C]1[C][C][C]2[C][C]3[C][C]4[C]([C][C][C]5[C][C]6[C][C][C][C][C]6[C][C]54)[C][C]3[C][C]2[C]1 -7.095, -1.962, -7.095, -0.462, -5.796, 0.288, -4.497, -0.462, -3.198, 0.288, -1.899, -0.462, -0.600, 0.288, 0.699, -0.462, 0.699, -1.962, 1.999, -2.712, 3.298, -1.962, 3.298, -0.462, 4.597, 0.288, 4.597, 1.788, 5.896, 2.538, 5.896, 4.038, 4.597, 4.788, 3.298, 4.038, 3.298, 2.538, 1.999, 1.788, 1.999, 0.288, -0.600, -2.712, -1.899, -1.962, -3.198, -2.712, -4.497, -1.962, -5.796, -2.712
[C]1[C][C]2[C][C][C]3[C][C][C]4[C][C][C]5[C][C][C]6[C][C][C]1[C]7[C]2[C]3[C]4[C]5[C]67 3.897, -0.750, 3.897, 0.750, 2.598, 1.500, 2.598, 3.000, 1.299, 3.750, -0.000, 3.000, -1.299, 3.750, -2.598, 3.000, -2.598, 1.500, -3.897, 0.750, -3.897, -0.750, -2.598, -1.500, -2.598, -3.000, -1.299, -3.750, -0.000, -3.000, 1.299, -3.750, 2.598, -3.000, 2.598, -1.500, 1.299, -0.750, 1.299, 0.750, -0.000, 1.500, -1.299, 0.750, -1.299, -0.750, -0.000, -1.500
[C]1[N][N][N][N]1 -1.214, 0.394, -0.750, -1.032, 0.750, -1.032, 1.214, 0.394, 0.000, 1.276
[O][C]1[C]2[C][C][C][C][C]2[C]3[C][C][C][C][C]13 -0.002, -3.116, -0.008, -1.608, -1.218, -0.723, -2.686, -1.026, -3.690, 0.101, -3.215, 1.527, -1.733, 1.832, -0.757, 0.701, 0.749, 0.701, 1.759, 1.810, 3.224, 1.493, 3.682, 0.073, 2.678, -1.038, 1.216, -0.729
[C]1[C][C][C]2[C]([C]1)[C]O[C][C]3[C]4[C][C][C][C]4[C][C][C]23 -4.578, -0.809, -4.136, 0.625, -2.674, 0.959, -1.653, -0.141, -2.096, -1.574, -3.558, -1.908, -1.250, -2.814, 0.245, -2.926, 1.266, -1.826, 1.042, -0.343, 2.341, 0.407, 3.768, -0.056, 4.649, 1.157, 3.768, 2.371, 2.341, 1.907, 1.042, 2.657, -0.257, 1.907, -0.257, 0.407