Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
LAST-Utils: Tools to process LAST (LAtent STructure mining) output http://cs.maunz.de
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
Welcome to LAST-Utils. These are LAST utilities, available from http://github.com/amaunz/last-utils/tree/master. Requirements: ruby 1.8 with OpenBabel bindings (see http://openbabel.org/wiki/Ruby). = EXAMPLES: Two modi are available: conversion LAST->SMARTS and instantiation SMARTS/SMILES. 1) Mine LAST descriptors and convert output to SMARTS (see LAST README file how to use the fminer frontend binary for LAST) using cpdbdata (see http://github.com/amaunz/cpdbdata): /path/to/fminer -f14 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.class | ./last-utils.rb 1 "nls" > salm-last.smarts Note: This should be called from the current directory. Note: Variants 'msa' and 'nls' produce LAST-SMARTS with optional parts of the structures (with recursive SMARTS), while 'nop' disallows optional parts of the structures (ambiguities only on the atom / edge level). 2) Find instantiations of molecules in a .smi file using the last descriptors we just mined: /path/to/last-utils/last-utils.rb 2 /path/to/cpdbdata/salmonella_mutagenicity/salmonella_mutagenicity_alt.smi < salm-last.smarts > salm-last.inst = NOTE: For precise synopsis and information run last-utils.rb without arguments. = TRANSFORMATION TO SMARTS: SMARTS are regular expressions for chemical fragments. The implementation used here, (LAST-SMARTS), is recursively generated by a depth-first traversal of each LAST graph, starting at node 0. Atoms are represented by their number, e.g. '#6' for carbon. Bonds are represented by their order (1-3). For an introduction to SMARTS, see e.g. http://www.daylight.com/meetings/summerschool01/course/basics/smarts.html. For every node visited, we demand an explicit but arbitrary branch with '(~*)' IF AND ONLY IF there are n optional branches with n>1 (*). In case of (*), for each branch bi, i in [1,n], we describe the 1-step ('local') neighborhood of the node by a recursive SMARTS pattern, including the node itself, predecessor,bi, and successor. The pattern bi is itself a LAST-SMARTS, and the local neighborhoods are combined via disjunction (there must be at least two due to (*)). Formally, LAST-SMARTS are defined as follows (EBNF): AN := ’17’ | ’35’ | ’5’ | ’6’ | ’7’ | ’8’ | ’15’ | ’16’ | ’9’ | ’53’ A := (AN ’,’ A) | AN SB := ’-’ | ’=’ | ’#’ | ’:’ E := (SB ’,’ E) | SB N := ’[#’ A ’]’ LR := (L ’,’ LR) | L L := N ’(’ E LS ’)’ (’(’ E N ’)’)+ BN := ’[#’ A ’;$’ ’(’ LR ’)’ ’]’ ’(~*)’+ LS := (N | BN) | LS E (N | BN) Example: [#7] [#6;$([#6]([#7])([#7 ])=[#6 ]),$([#6]([#7][#6])([#7 ])=[#6 ])](~*) =[#6] This denotes a nitrogen connected to a carbon double-connected to a carbon. The middle carbon’s local environment is recursively described. It consists of back and forward links, but additionally specifies either a nitrogen or a nitrogen/carbon branch. Since the standard is “truly recursive”, i.e. nothing inside the $(...) is identified with the outside, we need (~*) to enforce that at least one additional branch is actually attached. Andreas Maunz, 2010