Skip to content
Switch branches/tags
Go to file


This is an implementation of Eli Lilly Medchem Rules. They were published under "Rules for Identifying Potentially Reactive or Promiscuous Compounds" by Robert F. Bruns and Ian W. Watson, J. Med. Chem. 2012, 55, 9763--9772 as ACS Author choice, i.e. open access at doi 10.1021/jm301008n.

To quote the abstract, "[This approach] describes a set of 275 rules, developed over an 18-year period, used to identify compounds that may interfere with biological assays, allowing their removal from screening sets. Reasons for rejection include reactivity (e.g., acyl halides), interference with assay measurements (fluorescence, absorbance, quenching), activities that damage proteins (oxidizers, detergents), instability (e.g., latent aldehydes), and lack of druggability (e.g., compounds lacking both oxygen and nitrogen). The structural queries were profiled for frequency of occurrence in druglike and nondruglike compound sets and were extensively reviewed by a panel of experienced medicinal chemists. As a means of profiling the rules and as a filter in its own right, an index of biological promiscuity was developed. The 584 gene targets with screening data at Lilly were assigned to 17 subfamilies, and the number of subfamilies at which a compound was active was used as a promiscuity index."

Scrutinizing the SMILES string of a molecule, the program identifies pattern which are are a knock-off for a candidate, such as unwanted elements (e.g., Ag, Hg, Zn), or a too low atom count (less than 7 heavy atoms). More importantly, however, most of the rules are scaled in respect of each other. Thus, the demerits of a butyl, pentyl, hexyl, heptyl, and cyclohexyl group in a molecule equate 10, 25, 50, 100, and 170, respectively. Eventually, the demerits of a molecule are summed up and compared with an arbitrary threshold; the program's adjustable default cut-off equates to 100. (For details adjusting the settings, see the program's documentation.)

After downloading the software, multiple options are offered to install the program. These include a compilation with make e.g., in Cygwin or Linux Ubuntu, or as docker file and are documented in dedicated .md files. In addition to C++, an installation of Ruby is required allowing to perform a basic scrutiny by

ruby Lilly_Medchem_Rules.rb input.smi > okmedchem.smi

where input.smi is your collection of SMILES strings of structures to be checked. In file okmedchem.smi the program lists the molecules whose added demerits are below the critical threshold applied. If applicable, molecules with demerits equal or greater than the threshold will be reported in one or multiple additional files written by the program, e.g., bad0.smi.

The freely accessible publication and its supplementary material at ACS outline the structural pattern scrutinized and their demerits. As a user training, this repository contains multiple reference .smi files in folder test to probe the program and its options. Simulating a screening, applying the default parameters, the collection of 24986 pubchem molecules in file example_molecules.smi yields a set of 4576 acceptable molecules reported in file okmedchem.correct.smi.

Test data table_S3.smi, retrieved from table S3 of publication, and 200_prescriptions_2011.smi, retrieved from a cross-linked Wikipedia project are provided to illustrate the outcome of this set of rules among drugs eventually marketed. These of course represent a stage of development much later than the of screening the program targets.


Implementation of Lilly Medchem Rules - J Med Chem 2012




No packages published