Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
50 lines (34 sloc) 3.14 KB
Welcome to LAST-Utils.
These are LAST utilities, available from
Requirements: ruby 1.8 with OpenBabel bindings (see
Two modi are available: conversion LAST->SMARTS and instantiation SMARTS/SMILES.
1) Mine LAST descriptors and convert output to SMARTS (see LAST README file how to use the fminer frontend binary for LAST) using cpdbdata (see
/path/to/fminer -f14 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.class | ./last-utils.rb 1 "nls" > salm-last.smarts
Note: This should be called from the current directory.
Note: Variants 'msa' and 'nls' produce LAST-SMARTS with optional parts of the structures (with recursive SMARTS), while 'nop' disallows optional parts of the structures (ambiguities only on the atom / edge level).
2) Find instantiations of molecules in a .smi file using the last descriptors we just mined:
/path/to/last-utils/last-utils.rb 2 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi < salm-last.smarts > salm-last.inst
For precise synopsis and information run last-utils.rb without arguments.
SMARTS are regular expressions for chemical fragments. The implementation used here, (LAST-SMARTS), is recursively generated by a depth-first traversal of each LAST graph, starting at node 0. Atoms are represented by their number, e.g. '#6' for carbon. Bonds are represented by their order (1-3).
For an introduction to SMARTS, see e.g.
For every node visited, we demand an explicit but arbitrary branch with '(~*)' IF AND ONLY IF there are n optional branches with n>1 (*).
In case of (*), for each branch bi, i in [1,n], we describe the 1-step ('local') neighborhood of the node by a recursive SMARTS pattern, including the node itself, predecessor,bi, and successor.
The pattern bi is itself a LAST-SMARTS, and the local neighborhoods are combined via disjunction (there must be at least two due to (*)).
Formally, LAST-SMARTS are defined as follows (EBNF):
AN := ’17’ | ’35’ | ’5’ | ’6’ | ’7’ | ’8’ | ’15’ | ’16’ | ’9’ | ’53’
A := (AN ’,’ A) | AN
SB := ’-’ | ’=’ | ’#’ | ’:’
E := (SB ’,’ E) | SB
N := ’[#’ A ’]’
LR := (L ’,’ LR) | L
L := N ’(’ E LS ’)’ (’(’ E N ’)’)+
BN := ’[#’ A ’;$’ ’(’ LR ’)’ ’]’ ’(~*)’+
LS := (N | BN) | LS E (N | BN)
[#7] [#6;$([#6]([#7])([#7 ])=[#6 ]),$([#6]([#7][#6])([#7 ])=[#6 ])](~*) =[#6]
This denotes a nitrogen connected to a carbon double-connected to a carbon. The middle carbon’s local environment is recursively described. It consists of back and forward links, but additionally specifies either a nitrogen or a nitrogen/carbon branch.
Since the standard is “truly recursive”, i.e. nothing inside the $(...) is identified with the outside, we need (~*) to enforce that at least one additional branch is actually attached.
Andreas Maunz, 2010
You can’t perform that action at this time.