Skip to content

MobleyLab/chemper

Repository files navigation

Documentation Status Language grade: Python codecov License: MIT

DOI

This repository contains a variety of tools that will be useful in automating the process of chemical perception for the new SMIRKS Native Open Force Field (SMIRNOFF) format as a part of the Open Force Field Initiative [1].

ChemPer can be used to automatically generate SMIRKS patterns to match clustered molecular fragments. For example, you may have calculated bond lengths and force constants for a variety of bonds in one group of molecules. You could use that data to cluster those bonds and then use ChemPer to generate SMIRKS patterns which would allow you to apply those lengths and force constants to a new set of molecules. The algorithms implemented here were inspired by SMARTY and SMIRKY which were proven to be too inefficient for practical use in force field parameterization [2].

For a more extensive history and explanation, see our preprint [3].

Installation

Chemper is available via conda-forge:

conda install -c conda-forge chemper

This command will install all dependencies besides a toolkit for cheminformatics or storing of molecule information. Also install RDKit and/or OpenEye toolkits by running:

conda install -c conda-forge rdkit

and/or

conda install -c openeye openeye-toolkits

Supported Python versions

We test with whatever Python versions are found in .github/workflows/ci.yaml. Chemper may function on some older and/or newer versions as well.

Supported chemiformatics toolkits

We seek to keep this tool independent of cheminformatics toolkit, but currently only support RDKit and OpenEye Toolkits. If you wish to add support please feel free to submit a pull request. Make sure one of these toolkits is installed in your environment before installing chemper.

Documentation

Below are some details on the tools provided in chemper see examples and documentation for more detailed usage examples

SMIRKSifier

This is chempers main function. It takes groups of molecular fragments which should be typed together and generates a heirarchical list of SMIRKS patterns which maintains this typing. chemper's SMIRKSifier takes a list of molecules and groups of atoms based on index and generates a hierarchical list of SMIRKS in just a few lines of code. In the example, general_smirks_for_clusters we cluster bonds in a set of simple hydrocarbons based on order. Then SMIRKSifer turns these clusters into a list of SMIRKS patterns. The following functionalities are used to make the SMIRKSifier possible, but may be useful on their own.

ClusterGraph

The goal of this tool is to store all information about the atoms and bonds that could be in a SMIRKS pattern. These are created assuming you already have a clustered set of molecular subgraphs. As our primary goal is to determine chemical perception for force field parameterization we image the input data being clustered subgraphs based on what parameter we wish to assign those atoms, such as equilibrium bond lengths and force constants. However, you could imagine other reasons for wanting to store how you clustered groups of atoms.

For more detailed examples and illustration of how this works see SMIRKS_from_molecules. Below is a brief example showing the SMIRKS for the bond between two carbon atoms in propane and pentane.

from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.cluster_graph import ClusterGraph

mol1 = mol_toolkit.Mol.from_smiles('CCC')
mol2 = mol_toolkit.Mol.from_smiles('CCCCC')
smirks_atom_lists = [[(0,1)], [(0,1), (1,2)]]
graph = ClusterGraph([mol1, mol2], smirks_atom_lists)
print(graph.as_smirks())
# '[#6AH2X4x0r0+0,#6AH3X4x0r0+0:1]-;!@[#6AH2X4x0r0+0:2]'

The idea with ClusterGraph objects is that they store all possible decorator information for each atom. In this case the SMIRKS indexed atoms for propane (mol1) are one of the terminal and the middle carbons. In pentane (mol2) however atom1 can be a terminal or middle of the chain carbon atom. This changes the number of hydrogen atoms (Hn decorator) on the carbon, thus there are two possible SMIRKS patterns for atom :1 #6AH2X4x0r0+0 or (indicated by the ",") #6AH3X4x0r0+0. But, atom :2 only has one possibility #6AH2X4x0r0+0.

SingleGraph

The goal of this tool was to create an example of how you could create a SMIRKS pattern from a molecule and set of atom indices. While this isn't ultimately useful in sampling chemical perception as they only work for a single molecule, however it is a tool that did not exist to the best of the authors knowledge before. For a detailed example see the single_mol_smirks jupyter notebook.

Here is a brief usage example for creating the SMIRKS pattern for the bond between the two carbon atoms in ethene including atoms one bond away from the indexed atoms. The indexed atoms are the two carbon atoms at indices 0 and 1 in the molecule are assigned to SMIRKS indices :1 and :2 respectively

from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.single_graph import  SingleGraph

mol = mol_toolkit.Mol.from_smiles('C=C') # note this adds explicit hydrogens to your molecule
smirks_atoms = (0,1)
graph = SingleGraph(mol, smirks_atoms, layers=1)
print(graph.as_smirks())
# [#6AH2X3x0r0+0:1](-!@[#1AH0X1x0r0+0])(-!@[#1AH0X1x0r0+0])=!@[#6AH2X3x0r0+0:2](-!@[#1AH0X1x0r0+0])-!@[#1AH0X1x0r0+0]

mol_toolkits

As noted above, we seek to keep chemper independent of the underlying cheminformatics toolkits. mol_toolkits was created to keep all code dependent on the toolkit isolated. It can create molecules from an RDK or OE molecule object or from a SMILES string. It includes a variety of functions for extracting information about atoms, bonds, and molecules. Also included here are subsearchs using indexed SMARTS (or SMIRKS) patterns.

Versions

0.1.0 Alpha Release

This is a first release of the Alpha testing version of chemper. As you can follow in the issue tracker there are still on going problems to resolve. This first release will allow for reference to the concepts and algorithms included here for automated chemical perception. However, the API is still in flux and nothing should be considered permanent at this time.

Version 1.0.0

This release matches the work published in our preprint. While the code is stable and there are tests showing how it should work the science it represents is still in the early stages and big changes to the algorithms and API should be expected in future releases.

Version 1.0.1

This release includes non-behavior-breaking changes to support distribution on Python 3.10.

Contributors

Acknowledgments

CCB is funded by a fellowship from The Molecular Sciences Software Institute under NSF grant ACI-1547580.

References

  1. D.L. Mobley et al. JCTC, 2018, 14(11), pp 6076-6092. (JCTC or bioRxiv)
  2. C. Zanette and C.C. Bannan et al. JCTC 2019 15(1), pp 402-423. (JCTC or ChemRxiv)
  3. C.C. Bannan and D.L. Mobley ChemRxiv 2019 doi:10.26434/chemrxiv.8304578.v1