Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHACL advanced targeting #2017

Closed
rdstn opened this issue Mar 18, 2020 · 17 comments · Fixed by #2229
Closed

SHACL advanced targeting #2017

rdstn opened this issue Mar 18, 2020 · 17 comments · Fixed by #2229
Labels
📶 enhancement issue is a new feature or improvement M1 Fixed in milestone 1 📦 SHACL affects the SHACL validator
Milestone

Comments

@rdstn
Copy link
Contributor

rdstn commented Mar 18, 2020

At the moment, the SHACL targeting system has some limitations. Assume that we have the following data:

<http://example.org/resource/ObjA-1> example:type example:ObjA ;
<http://example.org/resource/ObjA-1> example:name "Test Object A1" ;
<http://example.org/resource/ObjA-2> example:type example:ObjA ;
<http://example.org/resource/ObjA-2> example:name "Test Object A2" ;
<http://example.org/resource/ObjB-1> example:type example:ObjB ;
<http://example.org/resource/ObjB-1> example:name "Test Object B1" ;

We want to be able to target ObjA-1 and ObjA-2.

  • sh:targetNode works for ObjA-1 and ObjA-2, but suppose we want to insert ObjA-3.
  • sh:targetClass is hardcoded to work with rdf:type and we are using example:type instead.
  • sh:targetSubjectsOf for example:type will also select ObjB-1.
  • Implicit class targeting wouldn't work since we are not using rdf:Type.

The simple workaround is to force rdf:type to always be the class differntiator. It is possible, however, that some users may not want to do that - for example, they may be using types for two different systems and have to differentiate between them.

There are two potential resolutions I see:

  1. Support SHACL Advanced Features  #1912 - SHACL advanced features have both SPARQL-based targets and SPARQL-based target types specified. This looks quite powerful, but might be less performant. However, so long a we are just looking for a predicate - object combination, I don't see it as a major issue, even with large repositories.
  2. Introduce a new target type focus node which can take a custom predicate-object pair. For example:
example:testShape
  a sh:NodeShape ;
  sh:predicateTarget example:Type ;
  sh:objectTarget example:ObjA .

This is less powerful, but looks like it will be more performant. There is the added drawback that it is not part of the specification.

Any way to have non-standard targeting for the use case would be good.

Are there any other options that I'm missing? Or is SHACL Advanced and/or the extneded spec workaround the way to go?

@rdstn
Copy link
Contributor Author

rdstn commented Mar 18, 2020

@hmottestad, I believe you are also interested in the Advanced features, what's your take on this one - do you think we'll be able to get decent performance for SPARQL targeting?

@hmottestad
Copy link
Contributor

I believe it will not be possible to get good performance from sparql based targets when there are many targets in the underlying database. Sparql queries are very advanced, and this complexity will likely mean that the query will need to run against the entire database for every transaction.

@hmottestad hmottestad added 📶 enhancement issue is a new feature or improvement 📦 SHACL affects the SHACL validator labels Mar 18, 2020
@hmottestad
Copy link
Contributor

Related to #1912

@hmottestad
Copy link
Contributor

As for the option of extending the SHACL specification with our own features, I think that this is in general a bad idea. The issue is that there is a standard for SHACL, and anything we add to that standard will decrease interoperability for our users.

I have seen some use cases for extending the SHACL spec before. Coming from a relational database background I have noticed that SHACL doesn't have any way of specifying a "key" for something. A good example of this is: Every ex:User is unique by their ex:username.

Performance wise, your solution with being able to specify which predicate represents "rdf:type", should bring no performance penalty.

Your example which uses example:type makes me think that a simple rdfs:subPropertyOf relation should handle that well. There is already a small backwards-chaining reasoner built into the ShaclSail, maybe it can be extended. Do you have an example where using ?pred rdfs:subPropertyOf rdf:type would not be advisable?

@rdstn
Copy link
Contributor Author

rdstn commented Mar 18, 2020

I can't find a realistic example where it is inadvisable for the purposes of validation. There is a caveat for plain SPARQL, though.

<http://example.org/resource/ObjA-1> example:type example:ObjA ;
<http://example.org/resource/ObjA-1> example:name "Test Object A1" ;
<http://example.org/resource/ObjA-2> example:type example:ObjA ;
<http://example.org/resource/ObjA-2> example:name "Test Object A2" ;
<http://example.org/resource/ObjB-1> rdf:type example:ObjB ;
<http://example.org/resource/ObjB-1> example:name "Test Object B1" ;

Now, with inference, my understanding is that if we select objects by rdf:Type, we'll get B-1, but also A-1 and A-2.

So, the subPropertyOf relation should be transaction-specific - inserted at transaction start, removed at transaction end - if we want to avoid that selection collision. There will be a performance penalty associated with that.

@hmottestad
Copy link
Contributor

hmottestad commented Mar 18, 2020

Yes, that would be true. So if you wanted two shapes, one for example:type and a different one for rdf:type, then this wouldn't work.

The scenario would then be:

_:1 example:type example:ObjA ;
     example:name "Test Object 1" .

_:2 rdf:type example:ObjA ;
     example:name "Test Object 2" .

With example:type rdfs:subClassOf rdf:type a rule for sh:targetClass example:ObjA would match both _:1 and _:2.

@rdstn
Copy link
Contributor Author

rdstn commented Mar 18, 2020

In theory, yes, it is a problem with validation as well, since we'll then select both _:1 and _:2.

In practice, for the particular use case which I'm working on, there are disambiguation tools which would block the same type example:ObjA from being used with multiple type discrimantors - example:type and rdf:type in this case.

@hmottestad
Copy link
Contributor

Have you tried posting your problem to the SHACL mailing list? https://lists.w3.org/Archives/Public/public-shacl/

Maybe they have some opinion or tips for how this should be validated.

@VladimirAlexiev
Copy link

In my opinion quite often one distinguishes node shapes not by rdf:type but by some other field, eg role, position, dct:type, or some other business type.
Declaring all those subProperty of rdf:type would be wrong.

hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue May 13, 2020
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue May 13, 2020
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue May 19, 2020
…cate-object-targets

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor

Here is another option that I figured out:

ex:Shape1
      a sh:NodeShape  ;
      sh:targetNode ex:CustomType;

      sh:property [
             sh:path [sh:inversePath dct:type ];
             sh:property [
               sh:path rdfs:label;
               sh:minCount 1 ;
               sh:maxCount 1 ;
             ]

      ] .

Would fail validation on this example data:

ex:data dct:type ex:CustomType;
    rdfs:label "a", "b".

At the moment this kind of construct is not supported by the ShaclSail because we don't allow to nest sh:property like this.

@hmottestad
Copy link
Contributor

From some discussion on the SHACL mailing list I was told of a shape filter concept that was part of an earlier SHACL draft. https://www.w3.org/TR/2016/WD-shacl-20160814/#filterShape

Issue with that would be to be that it still requires a target.

@hmottestad
Copy link
Contributor

This is filter shapes:

ex:ExampleFilteredShape
	a sh:Shape ;
	sh:targetClass ex:Person ;
	sh:filterShape [
		a sh:Shape ; # Optional triple
		sh:property [
			sh:predicate ex:member ;
			sh:hasValue ex:W3c ;
		]
	] ;
	sh:property [
		sh:predicate ex:email ;
		sh:minCount 1 ;
	] .

I was thinking of using this concept, but combined with the SHACL Advanced custom targetting system.


ex:ExampleFilteredShape
	a sh:Shape ;
	sh:target [
		a sh:TargetShape ; 
		sh:property [
			sh:path dct:type;
			sh:hasValue ex:CustomType ;
		]
	] ;
	sh:property [
		sh:path rdfs:label;
		sh:minCount 1 ;
		sh:maxCount 1 ;
	] .

And for multiple types

ex:ExampleFilteredShape
	a sh:Shape ;
	sh:target [
		a sh:TargetShape ; 
		sh:property [
			sh:path dct:type;
			sh:in ( ex:CustomType1 ex:CustomType2 ) ;
		]
	] ;
	sh:property [
		sh:path rdfs:label;
		sh:minCount 1 ;
		sh:maxCount 1 ;
	] .

And we could even support aggregation.

ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
	a sh:Shape ;
	sh:target [
		a sh:TargetShape ; 
		sh:property [
			sh:path foaf:knows;
			sh:minCount 3 ;
		]
	] ;
	sh:property [
		sh:path foaf:knows;
		sh:hasValue ex:Steve;
	] .

@hmottestad
Copy link
Contributor

Equivalent to:

ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
	a sh:Shape ;
	sh:targetSubjectsOf foaf:knows ;
	sh:or (
		[
			sh:path foaf:knows; 
			sh:minCount 3; 
			sh:hasValue ex:Steve;
		]
		[
			sh:path foaf:knows; 
			sh:maxCount 2;
		]
	) .

@rdstn
Copy link
Contributor Author

rdstn commented May 20, 2020

Branching off to have a proper targetShape on top of propertyShape and nodeShape will make some rather complicated scenarios both possible and efficient. Offers a lot of the functionality which an average SPARQL targeting method would have without the major performance penalty. I'm surprised it wasn't left as part of the spec to begin with.

@hmottestad
Copy link
Contributor

I did a bit of testing in the code base to figure out what the current semantics of the getQuery(...) methods in the target class is. As far as I can tell it is only used to run against what is effectively the base sail. This is important because it means that we would be able to create custom getPlanAddedStatements(...) and getPlanRemovedStatements(...) that can handle more complex targeting.

An example of how getPlanAddedStatements(...) could work:

For a target like this:

 ?target a ex:Paprika.
 ?target ex:color "red".

We would analyze the added statements for matching "sub" targets. Eg. we could end up matching ex:p1 a ex:Paprika and ex:car1 ex:color "red". This would then generate the following queries:

BIND(ex:p1 as ?target)
 ?target a ex:Paprika.
 ?target ex:color "red".

and

BIND(ex:car1 as ?target)
 ?target a ex:Paprika.
 ?target ex:color "red".

These queries would then have to run against the base sail. We could probably get away with running a single query and using values.

The effective result could be that ex:p1 is ex:color "red" in the base sail, so it would be a target, but ex:car1 is not a ex:Paprika in the base sail so would be ignored.

This way we use knowledge from transaction to reduce the number of targets to validate and we wouldn't have to revalidate other red paprikas already in the base sail.

hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue May 24, 2020
…AllSubjectsTarget in order to support the filter structure

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue May 26, 2020
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor

To sum up this issue:

sh:filterShape is the chosen method to implement advanced targeting.

sh:filterShape was an early feature in the SHACL spec, but was removed. It allows the user to specify a shape to filter the target nodes. Combined with dash:AllSubjects this allows us to have most of the features that the SPARQL targets supply, but with much better performance.

@hmottestad hmottestad linked a pull request Jun 1, 2020 that will close this issue
4 tasks
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue Jun 8, 2020
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue Jun 11, 2020
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue Jun 12, 2020
hmottestad added a commit that referenced this issue Jun 16, 2020
* GH-2017 initial test case and implementation

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more test cases

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* GH-1912 initial sparql target support

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* better support for sparql targets

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* GH-1912 better support for sparql targets

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* test

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more sparql target tests

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fixed typo

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* change targetClass to targetObject for new compound targetting

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* benchmark

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fix supported features docs

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* Trigger GitHub ci

* improved benchmarks

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* GH-2017 introduce support for dash:AllObjectsTarget and AllSubjectsTarget in order to support the filter structure

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* GH-2017 initial support for filter shapes

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fix up tests

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* cleanup

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fix dash vocab file

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fix after merge

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* refactoring to better handle target filters

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* merge fixes

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* Support for dash constants for dash:AllSubjects and dash:AllObjects

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* refactored code

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* final code for toggle for experimental filter shapes

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* added test cases for sh:hasValue

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more test cases for hasValue and also fix for empty sh:and, sh:or, sh:not

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* moving shapes back where they came from, will do the bigger refactor in the new AST branch

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* code cleanup

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* support for sh:hasValue

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more javadocs

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* remove unused tests

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* some simplification and docs

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fixed headers and more tests

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* fix tests and some code cleanup

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* code cleanup

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* initial support for dash:valuesIn

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* bechmark

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* migrate to sh:targetShape

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* a couple of tests for negation

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* test cases for value in

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more tests for targetNode

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* added some benchmarks

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* implemented handling of node validation

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* functional implementation for valuesIn

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* more fixes

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

* switched naming of new features

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor

Ended up back with sh:targetShape.

@hmottestad hmottestad added this to the 3.3.0 milestone Jun 16, 2020
@hmottestad hmottestad added the M1 Fixed in milestone 1 label Jun 23, 2020
hmottestad added a commit to HASMAC-AS/rdf4j that referenced this issue Jul 9, 2020
…agreed on the shacl mailing list

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit that referenced this issue Jul 10, 2020
… shacl mailing list (#2355)

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📶 enhancement issue is a new feature or improvement M1 Fixed in milestone 1 📦 SHACL affects the SHACL validator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants