Skip to content
LAST-Utils: Tools to process LAST (LAtent STructure mining) output
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Welcome to LAST-Utils.

These are LAST utilities, available from 
Requirements: ruby 1.8 with OpenBabel bindings (see 

Two modi are available: conversion LAST->SMARTS and instantiation SMARTS/SMILES.

1) Mine LAST descriptors and convert output to SMARTS (see LAST README file how to use the fminer frontend binary for LAST) using cpdbdata (see
/path/to/fminer -f14 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.class | ./last-utils.rb 1 "nls" > salm-last.smarts
Note: This should be called from the current directory.
Note: Variants 'msa' and 'nls' produce LAST-SMARTS with optional parts of the structures (with recursive SMARTS), while 'nop' disallows optional parts of the structures (ambiguities only on the atom / edge level).

2) Find instantiations of molecules in a .smi file using the last descriptors we just mined:
/path/to/last-utils/last-utils.rb 2 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi < salm-last.smarts > salm-last.inst

For precise synopsis and information run last-utils.rb without arguments.

SMARTS are regular expressions for chemical fragments. The implementation used here, (LAST-SMARTS), is recursively generated by a depth-first traversal of each LAST graph, starting at node 0. Atoms are represented by their number, e.g. '#6' for carbon. Bonds are represented by their order (1-3).
For an introduction to SMARTS, see e.g.

For every node visited, we demand an explicit but arbitrary branch with '(~*)' IF AND ONLY IF there are n optional branches with n>1 (*). 
In case of (*), for each branch bi, i in [1,n], we describe the 1-step ('local') neighborhood of the node by a recursive SMARTS pattern, including the node itself, predecessor,bi, and successor.
The pattern bi is itself a LAST-SMARTS, and the local neighborhoods are combined via disjunction (there must be at least two due to (*)).

Formally, LAST-SMARTS are defined as follows (EBNF):

AN := ’17’ | ’35’ | ’5’ | ’6’ | ’7’ | ’8’ | ’15’ | ’16’ | ’9’ | ’53’
A  := (AN ’,’ A) | AN
SB := ’-’ | ’=’ | ’#’ | ’:’
E  := (SB ’,’ E) | SB
N  := ’[#’ A ’]’
LR := (L ’,’ LR) | L
L  := N ’(’ E LS ’)’ (’(’ E N ’)’)+
BN := ’[#’ A ’;$’ ’(’ LR ’)’ ’]’ ’(~*)’+
LS := (N | BN) | LS E (N | BN)

[#7] [#6;$([#6]([#7])([#7 ])=[#6 ]),$([#6]([#7][#6])([#7 ])=[#6 ])](~*) =[#6]
This denotes a nitrogen connected to a carbon double-connected to a carbon. The middle carbon’s local environment is recursively described. It consists of back and forward links, but additionally specifies either a nitrogen or a nitrogen/carbon branch. 
Since the standard is “truly recursive”, i.e. nothing inside the $(...) is identified with the outside, we need (~*) to enforce that at least one additional branch is actually attached.

Andreas Maunz, 2010

You can’t perform that action at this time.