Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed enhancement - iterative rule evaluation #76

Closed
sa-bpelakh opened this issue May 25, 2021 · 7 comments · Fixed by #77
Closed

Proposed enhancement - iterative rule evaluation #76

sa-bpelakh opened this issue May 25, 2021 · 7 comments · Fixed by #77

Comments

@sa-bpelakh
Copy link

Currently rule evaluation is single pass, so even if triples asserted during evaluation could trigger the execution of additional rules, those triggers don't occur. For example, here is a set of rules which attaches a :hasDepth to every Concept in a SKOS taxonomy indicating its distance from the ConceptScheme:

:TopConceptRule
	a sh:NodeShape ;
	sh:property [
		sh:path skos:topConceptOf ;
		sh:minCount 1 ;
	] .

:DepthRule
	a sh:NodeShape ;
	sh:targetClass skos:Concept ;
	sh:rule [
		a sh:SPARQLRule ;
		sh:prefixes skos:, : ;
		sh:order 1 ;
		sh:condition :TopConceptRule ;
		sh:construct """
			CONSTRUCT {
				$this :hasDepth 0 .
			}
			WHERE {
			}
			""" ;
	] ;
	sh:rule [
		a sh:SPARQLRule ;
		sh:prefixes skos:, : ;
		sh:order 2 ;
		sh:construct """
			CONSTRUCT {
				$this :hasDepth ?plusOne .
			}
			WHERE {
				$this skos:broader ?parent .
				?parent :hasDepth ?depth
				bind(?depth + 1 as ?plusOne)
			}
			""" ;
	] ;
.

Without iterative evaluation of rules, no concepts past the direct skos:narrower of the top concepts are guaranteed to receive a depth.

The proposal is to track whether new triples were added to the data graph during rule execution, and if so, iterate through the rules again until a stable state is reached. This is the default functionality of the TopBraid rule engine.

@sa-bpelakh
Copy link
Author

@ashleysommer I will be submitting a PR for this in the next couple of days. Do you feel this needs a flag in validate(), and if so, what should its default value be?

@ashleysommer
Copy link
Collaborator

Hi @sa-bpelakh
Thats a good question regarding the default value. I can see pros and cons for having this behavior be the default. Eg, its good to align with the TopBraid shacl engine, but having this feature enabled by default has the potential to increase execution time substantially in some use cases.

Theres also different ways this could be implemented. Eg, iterative rule evaluation could iteratively execute the ordered sh:rule entries from a single NodeShape until a stable state is found, or it could iteratively execute all applicable sh:rule constraints from the Shapes File until a stable state is found. Or a combination of both.

Additionally, there would need to be mitigations in place to protect against infinite loops, and potentially infinite depth recursion.

@ashleysommer
Copy link
Collaborator

I've been doing some experimentation and testing on my local copy of PySHACL to see how hard this will be for you to implement.
I've come one problem, it seems that its not possible in rdflib when running a CONSTRUCT query to get back a count of triples added. Eg, you could iteratively execute a CONSTRUCT query, but you don't know if any new triples are being added to the graph, so you don't know when you've reached a stable state.

You could do a count of triples in the graph before the CONSTRUCT, then another count after executing, but that is generally a bad idea. If the graph is not an in-memory graph, or the graph contains billions of triples, rdflib can blow out the system memory or crash entirely when trying to count the triples in the graph.

@ashleysommer
Copy link
Collaborator

ashleysommer commented May 25, 2021

My previous comment is partially incorrect. After doing some more research, the RDFLib Construct statement does give access to the new triples created (before they are mixed into the data_graph).

However we'd still need to check if every triple in the new construct graph is already in the data_graph (before adding it) to determine if we're in steady state.

ashleysommer added a commit that referenced this issue May 25, 2021
…ly execute until the data_graph reaches a steady state.

This works on SPARQL Rules, Triple Rules, and JS Rules.
Closes #76
@ashleysommer
Copy link
Collaborator

ashleysommer commented May 25, 2021

@sa-bpelakh
I'm sorry to step on your toes here, but given my extensive knowledge of the codebase, I was able to knock out a simple implementation of this feature pretty quickly.
I've created a PR here: #77
You're still welcome to undertake an implementation of this yourself, otherwise please look over the code in #77 to see if this will do the job for you.
My implementation works on SPARQLRules, TripleRules, and JSRules. It iterates on the sh:rule level until it reaches steady state, and again on the NodeShape level iterating each rule again until it the whole Shape reaches steady state. Each loop has a maximum iterations of 100 before it throws an error, so we don't hit any infinite loops.

@sa-bpelakh
Copy link
Author

@ashleysommer It's all good, it was interesting to compare your implementation to the one I had working. I will leave a comment on the PR.

ashleysommer added a commit that referenced this issue May 25, 2021
…ly execute until the data_graph reaches a steady state.

This works on SPARQL Rules, Triple Rules, and JS Rules.
Closes #76
ashleysommer added a commit that referenced this issue May 26, 2021
Add the ability for SHACL rules to operate iteratively.
Closes #76
@ashleysommer
Copy link
Collaborator

Update, almost a year later. I realized my testing of this issue was not quite right.
The test in /test/issues/076.py has some mistakes, that proved to provide false conformance results when testing.

Firstly:
The given example used the old IRI for SKOS (http://www.w3.org/2008/05/skos#) that conflicted with the definition of SKOS in RDFLib 6.0+ (it uses the correct one http://www.w3.org/2004/02/skos/core#).

Secondly, the example uses the "sh:prefixes" feature incorrectly. All prefixes listed on a SPARQLRule using the sh:prefixes feature should have a corresponding declaration in the ontology using sh:declare. That means the :skos entry in sh:prefixes on the SPARQLRule was ignored.

Fixing these two issues uncovered a small inconsistency in my implementation of the feature #77. I've fixed that, and a new version with corrected test and corrected feature will be released shortly.

ashleysommer added a commit that referenced this issue Jan 13, 2022
Added more type hinting, to conform to the new type hinting added by RDFLib 6.1.1
Subtle correction in the way `sh:prefixs` works with `sh:declare` on the given named ontology.
Bumped some min versions of libraries, to gain compatibility with Python 3.10
Fixed test for issue #76
Fixed #76 again (after fixed test)
Bumped Version v0.18.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants