Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime. Howto enforce inheritance? #167

Open
volkerjaenisch opened this issue Nov 19, 2022 · 5 comments

Comments

@volkerjaenisch
Copy link

Dear RDFlib developers!

At first thank you so much!
Without your code our Open-Data Portal https://datenadler.de would never have taken flight. We do not use CKAN, we programmed from scratch. Currently we are the second most populated Open-Data portal in Germany.

But now we need SHACL. pySHACL generally runs fine but I seems that we are running into some detail problems.

This may be an exotic case. It does not work on other SHACL processors, too. So it may be a limitation of SHACL itself.

So this is not a bug report but more stating of a SHACL fact.

Given the following data

xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime .

<https://geobasis-bb.de#dcat_Dataset_568978c5-fa73-48d1-a6f9-487aabdc1aef> a dcat:Dataset;
  dct:description "Für die Digitalen Topographischen Karten werden Vektordaten.."@de;
  dct:identifier "568978c5-fa73-48d1-a6f9-487aabdc1aef";
  dct:modified "2022-11-17T09:37:25.626789"^^xsd:dateTimeStamp;

and the following shape

:DateOrDateTimeDataType_Shape
    a sh:NodeShape ;
    rdfs:comment "Date time date disjunction shape checks that a datatype property receives a temporal value: date, dateTime, gYear or gYearMonth literal" ;
    rdfs:label "Date time date disjunction" ;
    sh:message "The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth" ;
    sh:or ([
            sh:datatype xsd:date
        ]
        [
            sh:datatype xsd:dateTime
        ]
		[
            sh:datatype xsd:gYear
        ]
		[
            sh:datatype xsd:gYearMonth
        ]
    ) .

OK, the files are a bit more complex but it boils down to that.

We get an 

Validation Report
Conforms: False
Results (20):
Constraint Violation in NodeConstraintComponent (http://www.w3.org/ns/shacl#NodeConstraintComponent):
	Severity: sh:Violation
	Source Shape: :Dataset_Property_dct_issued
	Focus Node: <https://geobasis-bb.de#dcat_Dataset_568978c5-fa73-48d1-a6f9-487aabdc1aef>
	Value Node: Literal("2022-11-17T09:37:25.626872" = None, datatype=xsd:dateTimeStamp)
	Result Path: dct:issued
	Message: Value does not conform to Shape :DateOrDateTimeDataType_Shape

Is it possible with pySHACL to inject the inheritance information:

xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime .

If so, how?
Any help appreciated.

Cheers,
Volker

@ashleysommer
Copy link
Collaborator

Hi @volkerjaenisch
Thanks for your report.

It is stipulated in the SHACL specification that all ontological relationships required for the successful validation of a datagraph against a Shape, must exist within the datagraph at the time of validation. In other words, if the relationship of [xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime] is required to be known in order for the datagraph to validate against your shape, then that relationship must exist in the datagraph when it is validated.

I suspect you know this already, because you asked about injecting the inheritance information into PySHACL.

PySHACL does have some very basic relationships baked into the code (eg, those found in RDF and RDFS) but that does not extend to XSD subclasses.

If it is not possible for you to have ontological definitions in the datagraph (eg, if the data in the datagraph is pulled from a separate closed system), then you can use PySHACLs 3-file method. Where you have the SHACL Shape file, and Extra ontology file, and the data graph.
In this mode, PySHACL will "mix-in" the contents of the extra-ontology file into the datagraph before validation. The feature is intended to solve exactly this problem.

Note, this issue has encouraged me to look into whether it would be wise to extend the hardcoded relationships in PySHACL to include some common XSD relationships, like this one, for convenience for users.

@volkerjaenisch
Copy link
Author

Dear @ashleysommer !

Thank you so much for shedding light on all these topics.
There is also a GH discussion I initiated at SEMICeu/DCAT-AP#238 concerning this problem.

I already tried to inject the additional ontological information (into the data) but to no avail. Probably I did something wrong.

bin/pyshacl -s shapes/dcat-ap_2.1.1_shacl_shapes.ttl -im -e shapes/dcat-ap-de-imports.ttl data/datetimestamp.ttl

datetimestamp.ttl

@prefix dct: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcatde: <http://dcat-ap.de/def/dcatde/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix adms: <http://www.w3.org/ns/adms#> .
@prefix owl: <http://www.w3.org/2002/07/owl> .
@prefix schema: <http://schema.org/> .
@prefix spdx: <http://spdx.org/rdf/terms#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix rdf4j: <http://rdf4j.org/schema/rdf4j#> .
@prefix sesame: <http://www.openrdf.org/schema/sesame#> .
@prefix fn: <http://www.w3.org/2005/xpath-functions#> .


xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime .


<https://geobasis-bb.de#dcat_Dataset_568978c5-fa73-48d1-a6f9-487aabdc1aef> a dcat:Dataset;
  dct:description "Für die Digitalen Topographischen Karten werden Vektordaten aus dem Basis-DLM generalisiert und nach dem ATKIS-Signaturenkatalog bearbeitet. Die digitalen Daten können per Download oder auf anderen Medienträgern abgegeben werden. Sie liegen in max. 22 Inhaltsebenen (nach dem techn. Regelwerk der AdV) in drei Ausprägungen (Einzelebenen, Graukombination und Farbkombination) vor. Es gilt zu beachten, dass ein UTM-Gitter nur in den Einzelebenen ausgegeben wird. Die Standardauflösung beträgt 200L/cm = 508dpi. Eine Kartenausgabe gleichen Inhalts stellt die TK (ATKIS) als gedruckte Karte dar. Die Daten werden über automatisierte Verfahren oder durch Selbstentnahme kostenfrei bereitgestellt. Bei Nutzung der Daten sind die Lizenzbedingungen zu beachten."@de;
  dct:identifier "568978c5-fa73-48d1-a6f9-487aabdc1aef";
  adms:identifier "568978c5-fa73-48d1-a6f9-487aabdc1aef";
  dct:modified "2022-11-17T09:37:25.626789"^^xsd:dateTimeStamp;
  dct:publisher <https://geobasis-bb.de#foaf_Agent_568978c5-fa73-48d1-a6f9-487aabdc1aef>;
  dct:title "Digitale Topographische Karte 1 : 10 000 - 3846-SO Zossen - Neuhof"@de;
  dcat:contactPoint <https://geobasis-bb.de#vcard_Kind_568978c5-fa73-48d1-a6f9-487aabdc1aef>;
  dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TECH>, <http://publications.europa.eu/resource/authority/data-theme/GOVE>,
    <http://publications.europa.eu/resource/authority/data-theme/REGI>, <http://publications.europa.eu/resource/authority/data-theme/ENVI>,
    <http://publications.europa.eu/resource/authority/data-theme/AGRI>, <http://inspire.ec.europa.eu/theme/lc>;
  dcat:distribution <https://geobasis-bb.de/lgb/de/geodaten/topographische-karten/top-karten-1-10000/>,
    <https://data.geobasis-bb.de/geobasis/information/legenden/legende_dtk10.pdf>, <https://geobroker.geobasis-bb.de/gbss.php?MODE=GetProductInformation&PRODUCTID=84579219-6849-4c89-90d0-aa7db3f26fa8>,
    <https://data.geobasis-bb.de/geobasis/daten/dtk/dtk10/ebenen/dtk10_ebenen_3846-so.zip>,
    <https://data.geobasis-bb.de/geobasis/daten/dtk/dtk10/kombination/dtk10_3846-so.zip>;
  dcatde:contributorID <http://dcat-ap.de/def/contributors/landBrandenburg>;
  dct:issued "2022-11-17T09:37:25.626872"^^xsd:dateTimeStamp;
  dcat:keyword "opendata"@de, "Vermessung"@de, "Karte"@de, "Verkehr"@de, "1:10.000"@de,
    "Bodenbedeckung"@de, "Rasterdaten"@de, "DTK10"@de, "DTK10FAR"@de, "DTK10GRA"@de, "3846-SO"@de;
  foaf:page <https://geobasis-bb.de/lgb/de/geodaten/topographische-karten/top-karten-1-10000/>,
    <https://data.geobasis-bb.de/geobasis/information/legenden/legende_dtk10.pdf>;
  <http://inqbus.de/nspriority> 30;
  dct:spatial <https://geobasis-bb.de#dct_Location_568978c5-fa73-48d1-a6f9-487aabdc1aef>;
  dct:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/CONT>;
  dct:isPartOf <https://geobasis-bb.de#dcat_Dataset_84579219-6849-4c89-90d0-aa7db3f26fa8> .

The result:

Constraint Violation in NodeConstraintComponent (http://www.w3.org/ns/shacl#NodeConstraintComponent):
	Severity: sh:Violation
	Source Shape: :Dataset_Property_dct_issued
	Focus Node: <https://geobasis-bb.de#dcat_Dataset_568978c5-fa73-48d1-a6f9-487aabdc1aef>
	Value Node: Literal("2022-11-17T09:37:25.626872" = None, datatype=xsd:dateTimeStamp)
	Result Path: dct:issued
	Message: Value does not conform to Shape :DateOrDateTimeDataType_Shape

The offended shape:

:DateOrDateTimeDataType_Shape
    a sh:NodeShape ;
    rdfs:comment "Date time date disjunction shape checks that a datatype property receives a temporal value: date, dateTime, gYear or gYearMonth literal"@en ;
    rdfs:label "Date time date disjunction"@en ;
    sh:message "The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth"@en ;
    sh:or ([
            sh:datatype xsd:date
        ]
        [
            sh:datatype xsd:dateTime
        ]
        [
            sh:datatype xsd:gYear
        ]
        [
            sh:datatype xsd:gYearMonth
        ]
    ) .

From the discussion at SEMICeu I think that I learned that the data types xsd:dateTime etc. are no RDF entities and therefore not a concern of SHACL. But I think that this depends strongly on the implementation of the SHACL engine.
The SHACL engine of SEMICeu is also not capable to deal with xsd:dateTimeStamp.

Note, this issue has encouraged me to look into whether it would be wise to extend the hardcoded relationships in PySHACL to include some common XSD relationships, like this one, for convenience for users.

This would be really great.

Cheers,
Volker

@ajnelson-nist
Copy link
Contributor

Apologies for butting in, but something caught my eye about @volkerjaenisch 's request.

My understanding was that there is not a way to declare a datatype a "Subclass" or "Subdatatype" of another datatype. OWL has a way of defining restricted datatypes (OWL 2 Syntax, 9.4; see the "Show RDF in Examples" button to see Turtle alongside the functional syntax), which is the closest I've seen to establishing a hierarchy.

I'd thought the only way to handle wanting to permit two datatypes using SHACL, where one looks like a "subdatatype" of another, was to use an sh:or. Aside from dateTime vs. dateTimeStamp, I'd seen this come up in another thread about rdf:langString vs. xsd:string, here.

@volkerjaenisch
Copy link
Author

@ajnelson-nist
Thanks for your contribution. You are welcome.

The problem that I face is that we have to use SHACL files provided by the EU, that we cannot alter. This is because the EU will use these SHACL files to validate the data files we provide the EU with. I have opened a GH issue at the EU to tackle this problem.

And you are right: The EU shape uses indeed sh:or to check for the dateTime types

:DateOrDateTimeDataType_Shape
    a sh:NodeShape ;
    rdfs:comment "Date time date disjunction shape checks that a datatype property receives a temporal value: date, dateTime, gYear or gYearMonth literal"@en ;
    rdfs:label "Date time date disjunction"@en ;
    sh:message "The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth"@en ;
    sh:or ([
            sh:datatype xsd:date
        ]
        [
            sh:datatype xsd:dateTime
        ]
        [
            sh:datatype xsd:gYear
        ]
        [
            sh:datatype xsd:gYearMonth
        ]
    ) .

But I think that we all here agree that this is a strategy with a limited future. Since someone will come up with the next type that is derived from xsd:dateTime. E.g. xsd:dateTimeJulianDay .

The idea behind this shape is to check that the given information is typed with a xsd:<date, time type>. So concerning the idea of the shape any dateTime type fullfills the shape.

If there would be inference or inheritance working, a simple check against the xsd:datetime base class will suffice to implement the idea. This is quite common in other languages. In e.g. Python

isinstance(obj, list) checks if the given object is of a class that is derived from class "list".

This is IMHO what is needed in SHACL.

Cheers,
Volker

@ajnelson-nist
Copy link
Contributor

But I think that we all here agree that this is a strategy with a limited future. Since someone will come up with the next type that is derived from xsd:dateTime. E.g. xsd:dateTimeJulianDay .

I disagree on this point, but fortunately I think I disagree in a way that causes us all less work at the end of the day.

The EU shape you cited uses a few of the time-relevant types used in XML Schema, defined in a document last updated in 2012, which had several rounds of editing from a version posted in 2004:

https://www.w3.org/TR/xmlschema11-2/

So, I think concerns of this being a "Limited-future" strategy can be comfortably disregarded. New time types seem unlikely to appear with a surprising speed. And, if they should appear upstream, the effect would likely propagate at a pace downstream through RDF, and then SHACL, and then the EU shape maintainers where everyone can comfortably plan to adapt.

For your use case with dateTimeStamp, unless I missed a detail between the 2004 derived datatypes and the 2012 derived datatypes, the SHACL shape looks like it can become comprehensive of the XML Schema time-relevant types by expanding the sh:or to also look for xsd:dateTimeStamp. Section 5.1 of RDF 1.1 Concepts can be reviewed for the EU shape's coverage.

Again, I suggest augmenting the sh:or because although "rdfs:Datatype is both an instance of and a subclass of rdfs:Class" (RDF Schema 1.1, Section 2.4), and OWL defines custom datatypes with an "Equivalent class" mechanism, I haven't seen anything interpret a statement of the form ex:DT2 rdfs:subClassOf ex:DT1 to provide the class-inheritance you're looking for. In particular for dateTimeStamp, neither the RDF namespace's defining RDF file, the RDF Schema namespace's defining RDF file, nor XML Schema (lacking an RDF file) provide the triple xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime. Since none of the standards provide that explicit triple, a downstream application providing that triple might introduce surprising widespread behaviors throughout the application's databases.

SHACL's datatype constraints cite a dependence on the SPARQL 1.1 datatype function. Between SHACL and SPARQL, all of the references to datatype IRIs seem to disregard the possibility of datatypes being subclasses (/subdatatypes) of one another.

I'm happy to be informed of counter-examples to any of my above points if someone can cite me a response, but to date that is what I have observed. So I think this issue is outside the scope of pySHACL and of SHACL.

One potential counter-example might come from this definition in the RDFS Entailment Rules appendix, where literals are assigned class-membership using rdf:type. However, that pattern assumes a graph environment where literals may be in the Subject (i.e. first) triple-position, which is not applicable in all RDF contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants