Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Shapes Constraint Language (SHACL) #743

Closed
abrokenjester opened this issue Jan 30, 2017 · 12 comments
Closed

Add support for Shapes Constraint Language (SHACL) #743

abrokenjester opened this issue Jan 30, 2017 · 12 comments
Labels
📶 enhancement issue is a new feature or improvement
Milestone

Comments

@abrokenjester
Copy link
Contributor

SHACL is a W3C working draft. It is a language for validating RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph.

A SHACL implementation, available as a SAIL component, would enable automated validation of business constrains on RDF databases in RDF4J, which until now was only possible using ad-hoc approaches.

Possible approaches for implementation are either by implementing directly from scratch, or by extending/reusing the SPARQL engine. The former has a possible performance benefit while the latter is possibly easier (and would benefit from database query optimizations).

@abrokenjester abrokenjester added 📶 enhancement issue is a new feature or improvement help wanted we could use your help in fixing this issue labels Jan 30, 2017
@hmottestad
Copy link
Contributor

hmottestad commented Feb 5, 2017

I would recommend limiting support to:

  • sh:targetClass
  • sh:property
  • Restrictions
    • sh:path
    • sh:minCount
    • sh:maxCount
    • sh:class
    • sh:datatype

This lets you model simple restrictions such as "Person must have exactly 1 age which is an integer".

@hmottestad
Copy link
Contributor

hmottestad commented Feb 5, 2017

Probably simplest to implement by overriding the commit() method and implementing the SailConnectionListener to listen on changes.

Getting as close to a stream based approach where all the rules can be checked in one pass would be the most efficient. For insertions, one pass should be sufficient to determine sh:class, sh:datatype and sh:minCount. sh:maxCount is harder, but can at least be optimised to only run on resources where the restricted property has been added in the current commit.

@barthanssens
Copy link
Contributor

So would it make sense to add an abstract ShaclSail listing the various constraints, and leave the actual implementation to e.g. ShaclSparqlSail and/or ShaclRawSail ?

Additional use case: the European Commission (or at least the DCAT-AP group) is "establishing a task group to develop an OWL expression of DCAT-AP, as well as a SHACL file that could be used for validation of DCAT-AP implementations" (quote from the mailing list, not sure if the archives are public, although the work itself of course is...)

@hmottestad
Copy link
Contributor

I actually wrote a SHACL file for the Norwegian profile (DCAT-AP-NO) :)

As for the SHACL Sail. @heshanjse is writing a proposal for doing a Google Summer of Code project developing the SHACL Sail. Hopefully that'll get approved. I'm pushing him towards a database engine style implementation, so that we can optimise which rules need to run and how much data needs to be retrieved from the base sail. Essentially, if you have a native store with 100 million triples, and you add 1 triple. The SHACL Sail shouldn't need to run all the SHACL validation rules on the entire 100 million +1 triples.

@hmottestad
Copy link
Contributor

Abstract class for the SHACL sail will probably be nice to extract once we have an implementation and realise what would potentially be common between two implementations. When I developed the new RDFS reasoner I started off using the abstract RDFS reasoner as a base, but found out in the end that it was not generic enough. So the abstract RDFS reasoner is a superfluous abstraction that just makes understanding the code more difficult and couldn't be reused for other RDFS reasoner implementations anyway.

@barthanssens
Copy link
Contributor

OK, any objections if I already contribute a SHACL vocabulary class (org.eclipse.rdf4j.model.vocabulary.SHACL) ?
I probably won't have time to work on the sail itself, so if would be great if the GSoC-proposal would be accepted :-)

@hmottestad
Copy link
Contributor

Vocabulary class is a great contribution:)

@tonyhammond
Copy link

We think this is a really great idea.

Btw, our approach to SHACL shapes in publishing the datasets in the Springer Nature SciGraph project can be seen here:

https://github.com/springernature/scigraph/tree/master/shapes

@barthanssens
Copy link
Contributor

Nice, I've created a pull request for the vocabulary class with constants for SHACL Classes / properties #802

heshanjse added a commit to heshanjse/rdf4j that referenced this issue Apr 23, 2017
Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse added a commit to heshanjse/rdf4j that referenced this issue Apr 23, 2017
Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
@VladimirAlexiev
Copy link

Hi folks! 2 days ago we submitted an EC H2020 Big Data Research proposal, see https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Apr/0047.html for some info. One of the goals is:

  • efficient SHACL validation on big data (out of memory repos).
    Hopefully merge AKSW RDFUnit, TQ SHACL, WESO Shaclex, and improve them significantly.
    Implement SHACL core in optimized Java (for Jena and Sesame)

We'll know the outcome in 5 months.

@VladimirAlexiev
Copy link

#695, a prerequisite for SHACL list unrolling rdf:rest*/rdf:first, was recently fixed

heshanjse added a commit to heshanjse/rdf4j that referenced this issue Jun 27, 2017
heshanjse added a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…-SHACL

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
hmottestad added a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…e group by a range for which subset of the tuple to be added to the output

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…e group by a range for which subset of the tuple to be added to the output

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…e group by a range for which subset of the tuple to be added to the output

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…e group by a range for which subset of the tuple to be added to the output

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to heshanjse/rdf4j that referenced this issue Dec 28, 2017
…evelop

# Conflicts:
#	core/pom.xml
#	testsuites/store/pom.xml
hmottestad added a commit to heshanjse/rdf4j that referenced this issue Dec 28, 2017
…-SHACL

# Conflicts:
#	core/pom.xml
#	testsuites/store/pom.xml
@abrokenjester abrokenjester added this to the 2.4.0 milestone May 6, 2018
@abrokenjester
Copy link
Contributor Author

@hmottestad can we call this issue done and log any further work on followup issues?

@abrokenjester abrokenjester removed the help wanted we could use your help in fixing this issue label Jul 2, 2018
abrokenjester pushed a commit that referenced this issue Aug 22, 2019
Signed-off-by: Heshan Jayasinghe <heshanjse>
abrokenjester pushed a commit that referenced this issue Aug 22, 2019
…range for which subset of the tuple to be added to the output

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
abrokenjester pushed a commit that referenced this issue Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📶 enhancement issue is a new feature or improvement
Projects
None yet
Development

No branches or pull requests

5 participants