Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
118 lines (77 sloc) 5.48 KB
Welcome to Fminer.
This is Fminer (frontend application), available with libraries from http://github.com/amaunz/fminer2/tree/master .
For installation and documentation see INSTALL.
For license information see LICENSE.
FILE FORMATS
============
Graphs are allowed in SMILES and gSpan format. This is to support both general graph mining, but to also make life more convenient when mining molecular data. For instance, for SMILES formatted input there is a switch to enable/disable aromatic ring perception, which, in gSpan format, requires re-formatting the input.
SMILES file line format is defined by
"ID\t SMILES", e.g.:
1 COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4
gSpan file block format (see also: http://illimine.cs.uiuc.edu/resources/readme_plain.php?program=gSpan) is defined by:
"t # ID" means the Nth graph,
"v M L" means that the Mth vertex in this graph has label L,
"e P Q L" means that there is an edge connecting the Pth vertex with the Qth vertex. The edge has label L.
Activity file line format is defined by the id, endpoint description, and target class. You can use maximally five different classes. Instead of classes, you can also provide numeric values. Target class or numeric value is collectively referred to as 'activity'. In case of numeric values, use the -g switch to enable regression (no automatic detection).
"ID\t endpoint\t activity", e.g.:
1 Salmonella Mutagenicity 1
The set of IDs in a gSpan or SMILES format file must be a subset of the set of activity IDs in the respective activity format file, i.e., not every Activity ID must be matched by a graph id, but vice versa.
OPTIONS
=======
General Options:
-f --minfreq _minfreq_ Set minimum frequency. Allowable values for _minfreq_: 1, 2, ... (default: 2).
-a --aromaticity Switch off aromatic ring perception when using smiles input format (default: on).
-o --no-output Switch off output (default: on).
-g --regression Switch on regression (default: off).
BBRC Mining exclusive options:
-l --level _level_ Set fragment type. Allowable values for _type_: 1 (paths) and 2 (trees) (default: 2).
-s --refine-singles Switch on refinement of fragments with frequency 1 (default: off).
-d --no-dynamic-ub Switch off dynamic adjustment of upper bound for backbone mining (default: on).
-b --no-bbr-classes Switch off mining for backbone refinement classes (default: on).
Upper bound pruning options (for performance benchmarking):
-u --no-upper-bound-pruning Switch off upper bound pruning (default: on).
-p --p-value _p_value_ Set p-value for chi-square significance. Allowable values for _p_value_: 0 <= _p_value_ <= 1.0 (default: 0.95).
LAST-PM exclusive options:
-m --max-hops Set maximum nr of hops, i.e. threshold for nr of ground features per patch. Note: There may occur latent structures with more hops (default: 25).
USAGE
=====
General Usage:
Usage 1: fminer <Library> <Options> <Graphs> <Activities>
Usage 2: fminer <Library> <Options> <Graphs>
File formats:
<Library> Plug-in library to use (/path/to/libbbrc.so or /path/to/liblast.so).
<Graphs> File should have suffix .smi or .gsp, indicating SMILES or gSpan format.
<Activities> File must be in Activity format (suffix not relevant).
Usage with LibBBRC:
Options for Usage 1 (BBRC mining using dynamic upper bound pruning):
[-f minfreq] [-l type] [-s] [-a] [-o] [-g] [-d [-b [-u]]] [-p p_value]
Options for Usage 2 (Frequent subgraph mining):
[-f minfreq] [-l type] [-s] [-a] [-o] [-n]
Usage with LibLAST:
Options for Usage 1 (LAtent STructure-Pattern Mining):
[-f minfreq] [-m maxhops] [-a] [-o] [-g] <graphs> <activities>
ENVIRONMENT VARIABLES (ONLY FOR BBRC MINING)
============================================
Affects the output. By default, output consists of blocks of gSpan graphs.
- FMINER_SMARTS : Produce output in SMARTS format (e.g. export FMINER_SMARTS=1).
- FMINER_LAZAR : Produce output in linfrag format which can be used as input to Lazar (e.g. export FMINER_LAZAR=1).
- FMINER_PVALUES : Produce p-values instead of chi-square values (e.g. export FMINER_PVALUES=1).
- FMINER_NO_AROMATIC_WC : Disallow aromatic wildcard bonds on aliphatic bonds, when aromatic perception was switched off ('-a') (e.g. export FMINER_NO_AROMATIC_WC=1).
- FMINER_SILENT : Redirect STDERR (debug output) of fminer to local file 'fminer-[bbrc|last]-debug.txt'
- FMINER_NR_HITS : Display number of times each fragment occurs in the output (only when FMINER_LAZAR is not set).
EXAMPLES
========
In any case, 1-frequent patterns are not refined further, unless the -s switch is used.
BBRC mining:
# BBRC representatives (min frequency 2, min significance 95%), using dynamic UB pruning
./fminer ../libbbrc/libbbrc.so <graphs> <activities>
# same as above, but much slower (explicitly no dynamic UB pruning)
./fminer ../libbbrc/libbbrc.so -d <graphs> <activities>
# ALL 2-frequent and 95%-significant subgraphs (frequent and correlated subgraph mining)
./fminer ../libbbrc/libbbrc.so -d -b <graphs> <activities>
# All 20-frequent patterns (frequent subgraph mining)
./fminer ../libbbrc/libbbrc.so -f 20 <graphs>
LAST-PM:
# LAST-PM descriptors (min frequency 5, max hops 20):
./fminer ../liblast/liblast.so -f 5 -m 20 <graphs> <activities>
(c) 2010 Andreas Maunz, feb 2010