LAST-Utils: Tools to process LAST (LAtent STructure mining) output
Ruby
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
test_benzene
test_kekulize
INSTALL
README
last-utils.rb
lu.rb

README

Welcome to LAST-Utils.

These are LAST utilities, available from http://github.com/amaunz/last-utils/tree/master. 
Requirements: ruby 1.8 with OpenBabel bindings (see http://openbabel.org/wiki/Ruby). 


= EXAMPLES:
Two modi are available: conversion LAST->SMARTS and instantiation SMARTS/SMILES.

1) Mine LAST descriptors and convert output to SMARTS (see LAST README file how to use the fminer frontend binary for LAST) using cpdbdata (see http://github.com/amaunz/cpdbdata):
/path/to/fminer -f14 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.class | ./last-utils.rb 1 "nls" > salm-last.smarts
Note: This should be called from the current directory.
Note: Variants 'msa' and 'nls' produce LAST-SMARTS with optional parts of the structures (with recursive SMARTS), while 'nop' disallows optional parts of the structures (ambiguities only on the atom / edge level).

2) Find instantiations of molecules in a .smi file using the last descriptors we just mined:
/path/to/last-utils/last-utils.rb 2 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi < salm-last.smarts > salm-last.inst


= NOTE:
For precise synopsis and information run last-utils.rb without arguments.


= TRANSFORMATION TO SMARTS:
SMARTS are regular expressions for chemical fragments. The implementation used here, (LAST-SMARTS), is recursively generated by a depth-first traversal of each LAST graph, starting at node 0. Atoms are represented by their number, e.g. '#6' for carbon. Bonds are represented by their order (1-3).
For an introduction to SMARTS, see e.g. http://www.daylight.com/meetings/summerschool01/course/basics/smarts.html.

For every node visited, we demand an explicit but arbitrary branch with '(~*)' IF AND ONLY IF there are n optional branches with n>1 (*). 
In case of (*), for each branch bi, i in [1,n], we describe the 1-step ('local') neighborhood of the node by a recursive SMARTS pattern, including the node itself, predecessor,bi, and successor.
The pattern bi is itself a LAST-SMARTS, and the local neighborhoods are combined via disjunction (there must be at least two due to (*)).

Formally, LAST-SMARTS are defined as follows (EBNF):

AN := ’17’ | ’35’ | ’5’ | ’6’ | ’7’ | ’8’ | ’15’ | ’16’ | ’9’ | ’53’
A  := (AN ’,’ A) | AN
SB := ’-’ | ’=’ | ’#’ | ’:’
E  := (SB ’,’ E) | SB
N  := ’[#’ A ’]’
LR := (L ’,’ LR) | L
L  := N ’(’ E LS ’)’ (’(’ E N ’)’)+
BN := ’[#’ A ’;$’ ’(’ LR ’)’ ’]’ ’(~*)’+
LS := (N | BN) | LS E (N | BN)

Example:
[#7] [#6;$([#6]([#7])([#7 ])=[#6 ]),$([#6]([#7][#6])([#7 ])=[#6 ])](~*) =[#6]
This denotes a nitrogen connected to a carbon double-connected to a carbon. The middle carbon’s local environment is recursively described. It consists of back and forward links, but additionally specifies either a nitrogen or a nitrogen/carbon branch. 
Since the standard is “truly recursive”, i.e. nothing inside the $(...) is identified with the outside, we need (~*) to enforce that at least one additional branch is actually attached.

Andreas Maunz, 2010