Once new knowledge has been generated, it is central to ensure that the knowledge base K to which it is added remains consistent. To this end, we need to ensure that we do not add any statements to K that go against its underlying axioms. The problem here is that these axioms are not always explicated in knowledge bases in the LOD Cloud. We thus devise an approach to generate such axioms from instance data, i.e. we use a statistical analysis of the use of predicates across the knowledge base K. Moreover, we provide means to use RDFS inference to generate new knowledge from new resources generated by our solution to Problem 2.
Sequentially applying the preceding steps of REX results in a set of triples <s, p, o> that might not be contained in K. As we assume that we start from a consistent knowledge base K, and the whole triple generation process until here is done automatically, we need to ensure that K remains consistent after adding <s, p, o> to K. To this end, REX provides a data validation interface whose first implementation was based on the DL-Learner framework. Depending on the size of K, it can be infeasible to use a standard OWL reasoner for consistency checks. Thus, our current implementation applies the following set of rules based on the schema of K and we add a triple <s1, p, o1> only if it holds that:
- If a class C is the domain of p, there exists no type D of s1 such that C and D are disjoint.
- If a class C is the range of p, there exists no type D of o1 such that C and D are disjoint.
- If p is declared to be functional, there exists no triple <s1, p, o2> in K such that o1 ≠ o2 .
- If p is declared to be inverse functional, there exists no triple <s2, p, o1> in K such that s1 ≠ s2 .
- If p is declared to be asymmetric, there exists no triple <o1, p, s1> in K.
- If p is declared to be irreflexive, it holds that s1 ≠ o1 .
- If there exist already a negative property assertion axiom <s1, p, o1> .
Note that this approach is sound but of course incomplete.
Although an increasing number of RDF knowledge bases are published, many of those consist primarily of instance data and lack sophisticated schemata. To support the application of the above defined rules, we apply the lightweight and efficient schema creation approach of the DL-Learner framework, in particular we check for:
- object/data property domain (:p rdfs:domain :dom)
- object/data property range (:p rdfs:range :ran)
- functionality of object/data properties (:p a owl:FunctionalProperty)
- inverse functionality of object properties (:p a owl:InverseFunctionalProperty)
- asymmetry of object properties (:p a owl:AsymmetricProperty)
- irreflexivity of object properties (:p a owl:IrreflexiveProperty)
- disjointness between classes that subjects/objects of the triples are asserted to and domain/range of the predicate of the triples
... //the triples generated so far Set<Triple> triples = ... //instantiate consistency checker ConsistencyChecker checker = new ConsistencyCheckerImpl(endpoint) //filter out triples that probably lead to inconsistency triples = consistency.getConsistentTriples(triples);