A subgroup discovery tool that can use ontological domain knowledge (RDF graphs) in the learning process. Subgroup descriptions contain terms from the given domain knowledge and enable potentially better generalizations.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example
hedwig
tests
.gitignore
CHANGELOG.md
LICENSE.md
MANIFEST.in
README.md
requirements.txt
run_tests.py
setup.py

README.md

Hedwig

A pattern mining tool that can exploit background knowledge in the form of RDF triplets.

Installation

python setup.py install

Example

View all the options:

python -m hedwig --help

Running with default parameters and outputing the rules to a file:

python -m hedwig <path-to-folder-with-domain-rdf-files> <examples-file>.n3 -o rules

Running the included numbers mini-example:

python -m hedwig example/numbers/ontology/ example/numbers/data.n3 --output=rules --adjust=none --leaves --support=0 --beam=1

Simple hierarchy example with CSV data

If you want to use just simple hierarchies of features, you don't need to resort to RDF. Just run hedwig with the --format=csv flag, for example:

python -m hedwig --format=csv tests/data/csv/ontology/ tests/data/csv/Cities_clusters.csv -o rules

Hierarchy files must have the .tsv suffix, with the following structure:

class_1<tab>superclass_1_1; superclass_1_2; ...
class_2<tab>superclass_2_1; superclass_2_2; ...
...

If you provide proper URIs, they will be used. Otherwise generic URIs will be constructed from the provided class names.

Data files must have the .csv suffix and the following structure:

example_uri_or_label; attr_uri_1; attr_uri_2; ...
http://example.org/uri_1; 0/1; 0/1; 0/1; 0/1; ...
http://example.org/uri_2; 0/1; 0/1; 0/1; 0/1; ...
...

See the tests/data/csv/ folder for an example input of this type.

Note

Please note that this is a research project and that drastic changes can be (and are) made pretty regularly. Changes are documented in the CHANGELOG.

Pull requests and issues are welcome.