We currently support the following learning systems:
- Aleph
- SWI Prolog port of Aleph
- DL-Learner
- the GILPS tools, i.e.
- Golem
- Progol
The benchmarking framework assumes to find for each tool to consider a folder named as the tool's identifier, i.e. ideally all lowercase, without whitespaces, in the learningsystems
directory. There one can find the currently available learning systems:
$ ls learningsystems/
aleph dllearner funclog golem progol progolem README.md toplog
To add a new tool just create a new directory under learningsystems
named as your tool identifier, e.g. mytool
. Inside the mytool
directory there should be at least two executable files named run
and validate
. Furthermore, a file system.ini
should be provided that specifies the language of the knowledge base and input examples. The purpose of the run
executable is to run your inductive learning tool, writing the learned hypotheses to a file. Currently, the expected parameters are:
./run <config_file>
The config file contains information where the tool should store its output and which example files it should read.
An example of the content of the output file generated by the Golem ILP tool would be
active(A) :- carbon_5_aromatic_ring(A,[B,C,D,E,F]).
active(A) :- hetero_aromatic_5_ring(A,[B,C,D,E,F]), nitro(A,[F,G,H,I]).
active(A) :- nitro(A,[B,C,D,E]), phenanthrene(A,[[F,G,H,I,J,K],[L,M,N,O,P,Q],[R,S,T,U,V,W]]), bond(A,I,B,7), bond(A,X,I,7).
Currently there are no fixed specifications, how to store the learned results. However the content of this file should be processable by the second executable validate
which should be called as follows:
./validate <config_file>
The config file contains information where the tool should store its output and which results input file and example files it should read.
This executable reads the results, loads the background knowledge of the considered learning task, and checks how many of the positive/negative examples of the considered learning problem are covered. Learning on OWL knowledge bases, this means utilizing an OWL reasoner to run instance checks on the learned DL concepts. In case of Prolog-based background knowledge a Prolog interpreter has to be executed to check how many of the positive and negative examples are covered. The output generated by the validate
executable should be just four lines written to the <validation_output_file>
: One line for the number of true positives, one for the number of false positives, one for the number of true negatives and one line for the number of false negatives. An example for the content of <validation_output_file>
would be:
tp: 10
fp: 3
tn: 29
fn: 0
Tool-specific configuration settings are defined per learning problem and should be held in a file named like the tool identifier with the file suffix .conf
, e.g. aleph.conf
. Such a configuration file should be placed inside the considered learning problem directory. For example, the tool-specific configuration files of the Prolog-based tools for learning problem 42 the Mutagenesis learning task can be found here:
$ ls -1 learningtasks/mutagenesis/prolog/lp/42/*.conf
learningtasks/mutagenesis/prolog/lp/42/aleph.conf
learningtasks/mutagenesis/prolog/lp/42/funclog.conf
learningtasks/mutagenesis/prolog/lp/42/golem.conf
learningtasks/mutagenesis/prolog/lp/42/progol.conf
learningtasks/mutagenesis/prolog/lp/42/progolem.conf
learningtasks/mutagenesis/prolog/lp/42/toplog.conf
The framework will combine this file with the config file passed to the tool. The actual processing of settings made inside such a configuration file should be done by the run
executable.
If a learning task requires tool-specific data, e.g. specific mode declarations etc., these can be put into a directory named like the tool identifier residing inside the data directory of the corresponding learning task. An example for Aleph-specific data for the Mutagenesis task can be found here:
$ ls learningtasks/mutagenesis/prolog/data/aleph/
mode.pl
In case of Prolog-based learning tools such data files must have the file suffix .pl
. For OWL-based learning tools this should be one of the standard file suffixes for the common serialization formats (.owl
, .rdf
, .xml
, .nt
, .ttl
, ...).