Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RYA-292 Added owl:intersectionOf inference. #206

Closed

Conversation

ejwhite922
Copy link
Contributor

@ejwhite922 ejwhite922 commented Aug 15, 2017

Description

Inference applies owl:intersectionOf semantics for queries including statement patterns of the form "?x rdf:type :DefinedClass".

The owl:intersectionOf property links a class to a list of class descriptions. An owl:intersectionOf statement describes a class for which the class extension contains precisely those individuals that are members of the class extension of all class descriptions in the list.

The InferenceEngine, at refresh time, stores information about owl:intersectionOf. It stores a mapping of each type that has an intersection to its list of intersection sets (since each type can have multiple intersections) These mapped definitions can then be accessed by the method: getIntersectionsImplying(Resource). Also, :A owl:intersectionOf(:B, :C) implies that :A subClassOf :B and :A subClassOf :C so members of the intersection are added to the subClassOf graph.

IntersectionOfVisitor processes statement patterns of the form "?x rdf:type :T2", and if :T2 is the value type for any owl:intersectionOf restriction according to the inference engine, it replaces the statement pattern with a union: of 1) that same statement pattern (in case the type is explicitly asserted or can be inferred by some other rule); and 2) a nested join tree of all the combined intersections for that type.

RdfCloudTripleStoreConnection calls the visitor along with the other inference logic. Because the original statement pattern is preserved as one branch of the union, other visitors can still apply if there are other ways to derive the type.

Added a simple example of a query that relies on this inference to MongoRyaDirectExample.

Tests

Unit tests

Links

Jira

Checklist

  • Code Review
  • Squash Commits

People To Review

@jessehatfield
@meiercaleb
@isper3at
@pujav65

@asfgit
Copy link

asfgit commented Aug 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/392/

if (predVar != null && objVar != null && objVar.getValue() != null && RDF.TYPE.equals(predVar.getValue()) && !EXPANDED.equals(conVar)) {
final List<Set<Resource>> intersections = inferenceEngine.getIntersectionsImplying((URI) objVar.getValue());
if (intersections != null && !intersections.isEmpty()) {
final Set<Resource> combinedIntersections = new TreeSet<>(new ResourceComparator());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple intersections associated with the same type, I think we want to treat each intersection as a sufficient condition for inferring the type, and union their separate join trees together, rather than combining all the types into one join tree. E.g. if we have:

A intersectionOf (B C)
and
A intersectionOf (D E F)

then I think the right interpretation is: A == (B intersect C) == (D intersect E intersect F). So an individual x is a member of A if: (x is an A) OR (x is a B and a C) OR (x is a D and an E and an F). So the query tree would be something like:

  • Union
    • Union
      • Join
        • (x is a B)
        • (x is a C)
      • Join
        • Join
          • (x is a D)
          • (x is an E)
        • (x is an F)
    • (x is an A)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Created a union tree of joins for each intersection.

// TODO refactor this. Wish I could execute sparql
try {
while (iter.hasNext()){
final Statement st = iter.next();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is still slightly offset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

null, conf);
final List<URI> properties = new ArrayList<URI>();
URI previousBNode = propertyChainPropertiesToBNodes.get(propertyChainProperty);
if (iter2.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is still slightly offset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}
else {
currentPropValue = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is offset w.r.t. enclosing block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

addSubClassOf(type, other);
for (final Set<Resource> intersection : intersectionList) {
if (!intersection.contains(other)) {
addIntersection(intersection, other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon thinking about this, I wonder if it might be better to handle multiple intersections via the subclass relations we're adding, rather than by explicitly connecting each type/intersection pair like this. That is:

  1. If A equals (the intersection of B and C) and A equals (the intersection of D and E), then we want to make sure (B and C) doesn't just imply A but also implies D and E. (And, equivalently, that (D and E) implies both B and C.) This inner for loop applies this rule.

  2. It's also true that if A is a subclass of A2, and A equals (the intersection of B and C), then anything belonging to (B and C) is implied to belong to A2. (Because it belongs to A, and anything belonging to a subclass also belongs to its superclasses.)

Since we're explicitly adding the facts (A subClassOf D), (A subClassOf E), etc., then rule 2 is just a more general version of rule 1. So an alternative approach would be to replace the innermost addIntersection loop with an implementation of this second rule, which would have the benefit of being more powerful in cases where the intersection type is a subclass of some other type. (e.g. [ intersectionOf (X Y) ] subClassOf Z or more likely Z equivalentClass [ intersectionOf (X Y) ] ).

This could be done either at query time (in getIntersectionsImplying(type), get the subclasses of the type and include any of their intersections as well); or here at refresh time (after looping over the intersections and adding subclass edges, loop through again: for each (type, intersection list) pair, and for each superclass of that type, add the subclass's intersections to the superclass).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the intersection/subClassOf relationships within the graph would have interesting implications if an intersection that is equivalent to multiple classes: if statement A = (B and C) = D subClassOf E, then storing (B and C) subClassOf E, would also allow us to conclude that A subClassOf E. This is just a transitive implication of what you suggested above Jesse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's the same. I can read "A = (B and C)" two ways -- either "A is the intersection of B and C":

<A> owl:intersectionOf (<B> <C>) .
<D> owl:intersectionOf (<B> <C>) .
<D> rdfs:subClassOf <E> .

or "A is equivalent to some anonymous class which is the intersection of B and C":

<A> owl:equivalentClass _:bnode
_:bnode owl:intersectionOf (<B> <C>) .
_:bnode owl:equivalentClass <D> .
<D> rdfs:subClassOf <E> .

In the latter case, we have A subClassOf E just from subclass and equivalent class transitivity, without touching the intersection.

In the former, I suppose you could infer that A and D are equivalent classes by comparing their intersections. That would require comparing the actual contents of all intersection lists to find any pairs that contain the same set of classes. That might be something to do in the future, but I don't think I'd try to do it at this point since we're not making that kind of check with any of the other class expressions (unions, property restrictions, etc).

addIntersection(intersection, type);
}
}
intersections.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this throw out the information that was just added in the preceding loop via addIntersection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch. I moved it above the loop.

* @return the {@link Set} of {@link URI} types that {@code type} is
* considered a subclassOf. Returns an empty set if nothing was found.
*/
public Set<URI> getSubClassOfTypes(final URI type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found both the name and description of this method slightly confusing -- wouldn't "superclasses", "parent classes", "parent types", or something be clearer than "types that the type is a subclass of"? (Admittedly it matches "subClassOfGraph" but at the logical level it seems strangely indirect.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was just trying to indicate that it was referencing the subClassOfGraph. I renamed it to getSuperClasses to be less confusing.

* @param contexts the context {@link Resource}s to query for.
* @throws QueryEvaluationException
*/
public void queryAll(final Resource subject, final URI predicate, final Value object, final RyaDaoStatementIterHandler ryaDaoStatementIterHandler, final Resource... contexts) throws QueryEvaluationException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use the org.openrdf.rio.RDFHandler interface (and RDFHandlerBase) instead of a new handler class? That would let someone directly pass in something like an RDFWriter, if that were ever useful for something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with Jesse here. We should avoid creating interfaces/classes that provide the same functionality as preexisting interfaces/classes. Use the RDFHandler here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to RDFHandler in this class and RDFHandlerBase in the InferenceEngine.

Copy link
Contributor

@meiercaleb meiercaleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic seems okay. There are a lot of inconsequential formatting changes that make it difficult to determine the new content of this PR. Nothing can really be done about it. But something to be aware of going forward. I'll sign off on this as soon as you Jesse come to an agreement on whether to include the intersection/superClass pairs.

* @param contexts the context {@link Resource}s to query for.
* @throws QueryEvaluationException
*/
public void queryAll(final Resource subject, final URI predicate, final Value object, final RyaDaoStatementIterHandler ryaDaoStatementIterHandler, final Resource... contexts) throws QueryEvaluationException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with Jesse here. We should avoid creating interfaces/classes that provide the same functionality as preexisting interfaces/classes. Use the RDFHandler here.

final URI predicate = statement != null ? statement.getPredicate() : null;
final Value object = statement != null ? statement.getObject() : null;
final Resource context = statement != null ? statement.getContext() : null;
queryAll(subject, predicate, object, ryaDaoStatementIterHandler, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you want to allow for the possibility of a null statement here (which would result in scanning the entire Rya instance). Given that we don't support such look ups, wouldn't it be cleaner to just require the Statement to be non-null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now checkNotNull's calls made on statement.

* @throws QueryEvaluationException
*/
public void queryFirst(final Statement statement, final RyaDaoStatementIterHandler ryaDaoStatementIterHandler) throws QueryEvaluationException {
final Resource subject = statement != null ? statement.getSubject() : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

/**
* Handles the statements returned from a {@link RyaDAO} query iterator.
*/
public abstract class RyaDaoStatementIterHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the RDFHandler interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.conf = conf;
}

@Override
public void meet(Filter node) throws Exception {
public void meet(final Filter node) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any substantive changes made to this file? Seems like you just added final everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a warning. The rest was save actions.

@@ -178,11 +183,15 @@ public void refreshGraph() throws InferenceEngineException {

subPropertyOfGraph = graph; //TODO: Should this be synchronized?


refreshIntersectionOf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for separating out the intersection refresh logic here.

final String indexedElement = st.getObject().stringValue();
log.info(indexedElement);
CloseableIteration<Statement, QueryEvaluationException> iter2 = RyaDAOHelper.query(ryaDAO, vf.createURI(st.getObject().stringValue()), RDF.FIRST,
null, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. I know I am guilty of making these sort of changes as well. But these reformatting changes are making it really tough to follow what substantive changes have been made. Nothing to be done at this point. Had to gripe about it a little tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, sorry. It does make it hard. Mostly refreshIntersectionOf() was the biggest change in here.

// _:bnode1 rdf:first <:B> .
// _:bnode1 rdf:rest _:bnode2 .
// _:bnode2 rdf:first <:C> .
// _:bnode2 rdf:rest rdf:nil .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this. It really helps clarify the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome

// Add this intersection for this type. There may be more
// intersections for this type so each type has a list of
// intersection sets.
intersectionsProp.get(type).add(intersection);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the need for the list of sets.

addSubClassOf(type, other);
for (final Set<Resource> intersection : intersectionList) {
if (!intersection.contains(other)) {
addIntersection(intersection, other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the intersection/subClassOf relationships within the graph would have interesting implications if an intersection that is equivalent to multiple classes: if statement A = (B and C) = D subClassOf E, then storing (B and C) subClassOf E, would also allow us to conclude that A subClassOf E. This is just a transitive implication of what you suggested above Jesse.

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from f1d90f7 to 64fdc98 Compare August 16, 2017 21:57
@asfgit
Copy link

asfgit commented Aug 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/402/

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from 64fdc98 to 8edf039 Compare August 17, 2017 14:19
@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from 8edf039 to ed223f6 Compare August 18, 2017 17:15
@asfgit
Copy link

asfgit commented Aug 18, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/413/

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from ed223f6 to d10d7aa Compare August 18, 2017 19:13
@asfgit
Copy link

asfgit commented Aug 18, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/414/

// intersectionOf[:B, :C] subclassOf :D
for (final Statement statement : typeStatements) {
final Resource intersectionOfBnode = (Resource) statement.getObject();
addSubClassOf(intersectionOfBnode, superClass);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite correct: this bnode (the object of <subject, owl:intersectionOf, object>) is going to be the head of a list, right? I don't think it represents a type of its own.

Is the goal here to let us answer queries for instances of D based on the subclass (A) and in turn the intersection (B and C)? We could do that by directly adding (B and C) to the list of intersections for D, instead of adding to the subclass graph. But if you take that approach, note that you'd want to get transitive superclasses of A (therefore superclasses of D, and so on), so 1) I think you'd want findChildren(subClassOfGraph, URI) instead of getSuperClasses; and 2) since we're still actively modifying the subclass graph a little earlier in this outer loop, I think you'd need to move the propagate-to-superclasses logic out to a second pass over the intersectionsProp entries.

Alternatively, if the getIntersectionsImplying method iterated through the (transitive) subclasses of the queried class, and added their intersections as well, we would get this behavior by doing the processing at query time.

All the other changes look good to me.

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from d10d7aa to 18816c1 Compare August 21, 2017 20:39
@asfgit
Copy link

asfgit commented Aug 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/420/

otherKeys.remove(type);
for (final Resource otherKey : otherKeys) {
if (intersectionsProp.get(otherKey).contains(intersection)) {
for (final URI superClass : superClasses) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we get here we've found an equivalence between type and otherKey, not just a subclass relationship (if I'm following correctly, intersectionsProp has the direct definitions, as opposed to the indirect implications we're also going through, so at this point we have type and otherKey having the same definition). If that's right, then instead of looping through the superclasses and adding them explicitly, we could do both "addSubClassOf(otherKey, type) ; addSubClassOf(type, otherKey)", and leave it up to whatever needs the superclass logic whether to traverse the graph recursively for the indirect superclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this. otherKey and type are now inferred to be equivalentClasses so this changed some of the unit test expected output.

final Resource type = entry.getKey();
final List<Set<Resource>> intersectionList = entry.getValue();
for (final Set<Resource> intersection : intersectionList) {
addIntersection(intersection, type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this last loop redundant since we're calling addIntersection(intersection, type) in the first pass (line 728)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The second loop is gone now.

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch 2 times, most recently from a109272 to 8b687cf Compare August 22, 2017 20:11
@asfgit
Copy link

asfgit commented Aug 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/424/

@asfgit
Copy link

asfgit commented Aug 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/incubator-rya-master-with-optionals-pull-requests/423/

Copy link
Contributor

@jessehatfield jessehatfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks right to me, should be good to squash+merge.

@ejwhite922 ejwhite922 force-pushed the RYA-292_IntersectionOfInference branch from 8b687cf to e371f3a Compare August 22, 2017 21:59
@asfgit asfgit closed this in e9488ff Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants