Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is possible to create invalid RDF with rdflib #895

Closed
nbaksalyar opened this issue Jan 23, 2019 · 12 comments
Closed

It is possible to create invalid RDF with rdflib #895

nbaksalyar opened this issue Jan 23, 2019 · 12 comments

Comments

@nbaksalyar
Copy link

I was able to successfully create the following RDF graph:

from rdflib import Graph, URIRef, Literal
g = Graph()
g.add((Literal('A'), Literal('B'), Literal('C')))

It is serialised into Turtle as follows:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<...>
"Hi" "Hello" "Bye" .

And according to the RDFLib code it is a valid triple. However, according to the RDF specification it is not:

An RDF triple consists of three components:

*    the subject, which is an IRI or a blank node
*    the predicate, which is an IRI
*    the object, which is an IRI, a literal or a blank node

And indeed when the output Turtle is validated with the W3C validator, it produces this error:

Expected subject but got literal at line 6.

Hence, this bit in the README file is also incorrect:

The components of the triples are URIs (resources) or Literals (values),

I think this issue deserves a further discussion, and please let me know if I can fix this and raise a PR.

@prince17080
Copy link

@nicholascar Is this issue still open? I'm thinking to solve this issue by adding assert statements in the the add function.

@white-gecko
Copy link
Member

white-gecko commented Mar 27, 2020 via email

@white-gecko
Copy link
Member

white-gecko commented Mar 27, 2020 via email

@tgbugs
Copy link
Contributor

tgbugs commented Mar 27, 2020

There isn't a good way to catch issues like this at runtime without taking a performance hit. If this is implemented in rdflib core, then maybe it could be as an add_safe function or similar, that could check for these kinds of things. There are even more insidious variants where some python type will sneak in to a literal and only cause issues when trying to sort entities somewhere else in the code base. Here is an example of I deal with this https://github.com/SciCrunch/sparc-curation/blob/f02a0cd1cbc27699a6525966b4d94b1111e14295/sparcur/export/triples.py#L90-L101 (this particular version doesn't catch the fact that literals are not valid subjects, but could be added).

One solution to these kinds of issues is a static type system. I'm not sure if python's type annotations would help here, but the might be a minimally obtrusive way to add this functionality for those who want it.

@gromgull
Copy link
Member

gromgull commented Mar 27, 2020

My feeling is maybe this should be caught at serialisation time, like we do with non rdf/xml serialisable URIs etc.

@nicholascar
Copy link
Member

@prince17080 obviously still open! Sorry, we didn't get to handle this for 5.0.0.

@gromgull not sure this should only be handles "on the way out" at serlialization time. I think this really should be handled "on the way in" at creation, my general feeling being, if the spec (RDF) doesn't allow something, we shouldn't allow it in the Python Graph object.

We don't tolerate this:

g.add((URIRef('A:'), URIRef('B:'), 25))

AssertionError: Object 25 must be an rdflib term

So we shouldn't tolerate:

g.add((Literal('A'), Literal('B'), Literal('C')))

Unless there's some very good code speed reason etc. that prevents us from not tolerating it.

Adding assert statements in production code is no good idea.

Clearly we have a bunch of them in there already...

@white-gecko
Copy link
Member

A generalized RDF triple is a triple having a subject, a predicate, and object, where each can be an IRI, a blank node or a literal. A generalized RDF graph is a set of generalized RDF triples. A generalized RDF dataset comprises a distinguished generalized RDF graph, and zero or more pairs each associating an IRI, a blank node or a literal to a generalized RDF graph. (https://www.w3.org/TR/rdf11-concepts/#dfn-generalized-rdf-dataset)

But there is also a note:

Any users of generalized RDF triples, graphs or datasets need to be aware that these notions are non-standard extensions of RDF and their use may cause interoperability problems. There is no requirement on the part of any RDF tool to accept, process, or produce anything beyond standard RDF triples, graphs, and datasets.

In SPARQL 1.1, Literals are allowed in the subject position.

https://www.w3.org/TR/sparql11-query/#construct
https://www.w3.org/TR/sparql11-query/#sparqlTriplePatterns

JSON-LD says:

However, JSON-LD also extends the RDF data model to optionally allow JSON-LD to serialize generalized RDF Datasets. (https://www.w3.org/TR/json-ld/#relationship-to-rdf)

They also put up a note:

The use of blank node identifiers to label properties is obsolete, and may be removed in a future version of JSON-LD, as is the support for generalized RDF Datasets.

As a conclusion we don't need to support it, but we can. As it is currently possible, whhat is a good reason to remove it from the Graph, if we can also ensure it for the serializers.

Adding assert statements in production code is no good idea.

Clearly we have a bunch of them in there already...

Maybe we should see if we can remove them in the future.

@prince17080
Copy link

@nicholascar @white-gecko @tgbugs So, I guess the conclusion from the above discussion can be adding a new function add_safe is the best solution with assert statements.

@nicholascar
Copy link
Member

@prince17080 indeed there is the suggestion to introduce add_safe and a supporter of that so you could perhaps create a PR for such a function.

Not sure I would use it much though - like @tgbugs I check my entries to a Graph() in my own application code rather than throwing inserts at add and then having it validate the values - but happy to have add_safe proposed if others have use cases motivating such a function.

@oohlaf
Copy link

oohlaf commented Apr 4, 2020

There's check_statement in util. A wrapper would be nothing more than a call to check_statement.

@nicholascar
Copy link
Member

Thanks @oohlaf, really useful tip!

@white-gecko
Copy link
Member

I close this issue now. Maybe someone finds a place for the hint of @oohlaf in the documentation or someone should ask and answer it on https://stackoverflow.com/questions/tagged/rdflib .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants