public
Description: Fminer library
Homepage: http://cs.maunz.de
Clone URL: git://github.com/amaunz/libfminer.git
name age message
file AUTHORS Fri Sep 18 04:35:48 -0700 2009 ADDED AUTHORS [amaunz]
file Doxyfile Tue Apr 28 04:15:17 -0700 2009 Added custom Doxygen css and Clustermaps code [amaunz]
file INSTALL Fri Sep 18 04:35:48 -0700 2009 ADDED AUTHORS [amaunz]
file LICENSE Fri Mar 06 02:54:02 -0800 2009 Added License [amaunz]
file Mainpage.h Mon Nov 02 06:10:30 -0800 2009 Separator, example scrip [amaunz]
file Makefile Fri Sep 18 04:35:48 -0700 2009 ADDED AUTHORS [amaunz]
file README Tue Apr 28 05:45:03 -0700 2009 README with link [amaunz]
file TestFminer.java Wed Sep 16 04:38:51 -0700 2009 Improved Documentation [amaunz]
file closeleg.cpp Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file closeleg.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file constraints.cpp Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file constraints.h Thu Jun 25 05:00:06 -0700 2009 Fixed bug with line nr output [amaunz]
file database.cpp Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file database.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file fminer.cpp Mon Aug 10 06:47:53 -0700 2009 Added p-value output capability [amaunz]
file fminer.h Thu Jun 25 00:56:29 -0700 2009 Switched off aromatic ring perception by default [amaunz]
file globals.h Mon Aug 10 06:47:53 -0700 2009 Added p-value output capability [amaunz]
file graphstate.cpp Wed Aug 12 02:28:10 -0700 2009 Fixed p output for result vector usage [amaunz]
file graphstate.h Mon Aug 10 06:47:53 -0700 2009 Added p-value output capability [amaunz]
file jfminer_wrap.i Fri Sep 18 04:35:48 -0700 2009 ADDED AUTHORS [amaunz]
file legoccurrence.cpp Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file legoccurrence.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file libfminer.css Tue Apr 28 04:15:17 -0700 2009 Added custom Doxygen css and Clustermaps code [amaunz]
file misc.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file path.cpp Mon Nov 02 06:10:30 -0800 2009 Separator, example scrip [amaunz]
file path.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file patterntree.cpp Mon Nov 02 06:10:30 -0800 2009 Separator, example scrip [amaunz]
file patterntree.h Wed Mar 04 06:01:37 -0800 2009 Added License Notice [amaunz]
file rfminer_wrap.i Fri Sep 18 04:35:48 -0700 2009 ADDED AUTHORS [amaunz]
file test.cpp Wed Apr 29 03:01:43 -0700 2009 Added test apps [amaunz]
file test.rb Mon Nov 02 06:10:30 -0800 2009 Separator, example scrip [amaunz]
README
Welcome to LibFminer.

This is the Fminer library, available from http://github.com/amaunz/libfminer/tree/master.
The Fminer application that uses this library is available from http://github.com/amaunz/fminer/tree/master.
The official website with documentation is http://www.maunz.de/libfminer-doc .

For installation and documentation see INSTALL.
For license information see LICENSE.

Abstract:
We present a new approach to large-scale graph mining based on so-called backbone refinement classes.
The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, 
using classes of fragments to reduce feature set size and running times.
The classes are defined in terms of fragments sharing a common backbone.
The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for 
open or closed fragment mining.
In the experiments, the proposed method reduces feature set sizes by >90 % and >30 % compared to  complete tree mining 
and open tree mining, respectively.
Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees 
but significantly better than that of open trees. 
Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved 
statistical constraint (dynamic upper bound adjustment), which is also confirmed in the experiments in lower running 
times compared to ordinary (static) upper bound pruning. 
Further analysis using large-scale datasets yields insight into important properties of the proposed descriptors, such 
as the dataset coverage and the class size represented by each descriptor. 
A final cross-validation run confirms that the novel descriptors render large training sets feasible which previously 
might have been intractable.

Andreas Maunz, 2008