Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank-nodes collisions #980

Closed
nleguillarme opened this issue Mar 19, 2020 · 11 comments · Fixed by #1495
Closed

Blank-nodes collisions #980

nleguillarme opened this issue Mar 19, 2020 · 11 comments · Fixed by #1495
Milestone

Comments

@nleguillarme
Copy link

Hi.

If I understand correctly the graphs merging process explained here, the following piece of code should create a graph with two distinct blank nodes :

from rdflib import Graph

graph1 = """
_:0 <http://purl.obolibrary.org/obo/RO_0002350> <http://www.gbif.org/species/0000001> .
"""
graph2 = """
_:0 <http://purl.obolibrary.org/obo/RO_0002350> <http://www.gbif.org/species/0000002> .
"""

g = Graph()
g.parse(data=graph1, format="nt")
g.parse(data=graph2, format="nt")

for triple in g:
    print(triple)

However, when executing the code, I get the following output :

(rdflib.term.BNode('Ne3fd8261b37741fca22d502483d88964'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002350'), rdflib.term.URIRef('http://www.gbif.org/species/0000002')) (rdflib.term.BNode('Ne3fd8261b37741fca22d502483d88964'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002350'), rdflib.term.URIRef('http://www.gbif.org/species/0000001'))

Am I missing something ? (versions : rdflib 4.2.2, python 3.7.5)

@white-gecko
Copy link
Member

I think you understand it correctly. I think this is related to issue #892 . The rdflib uses the blank identifiers as they are.

Changing this behavior now would break some things and as we are in the feature freeze for 5.x I moved it to the 6.0.0 milestone.

Actually I have a use case where I need to parse multiple files within the same context of blank identifiers. When executing SPARQL queries I need to have individual contexts per query. Maybe it would be a good idea to introduce some blank context object which can be handed over to the parse method and the query method. We have to put this on the roadmap for 6.0.0.

@nleguillarme
Copy link
Author

Thank you for your reply. However I don't really understand... does that mean that there is no graph merging mechanism currently implemented in rdflib ? This would be in contradiction with what is said in the doc :

In RDFLib, blank nodes are given unique IDs when parsing, so graph merging can be done by simply reading several files into the same graph

https://rdflib.readthedocs.io/en/stable/merging.html

@sanyam19106
Copy link

but both the graph have common subject and predicate and object is different.

@vikash18086
Copy link

vikash18086 commented May 31, 2020

we solve this issue as follows,
We take the new map through which we were assigning new ids to each new blank nodes of different graphs. If two blank nodes came from the same graph then we assign the same id.
you can download the updated code from the URL #1101

@white-gecko
Copy link
Member

@vikash18086 thank you for contributing to the RDFlib. I think this would not actually solve the issue. As I have mentioned earlier:

Actually I have a use case where I need to parse multiple files within the same context of blank identifiers. When executing SPARQL queries I need to have individual contexts per query. Maybe it would be a good idea to introduce some blank context object which can be handed over to the parse method and the query method. We have to put this on the roadmap for 6.0.0.

So we need some way to:

  1. reference blank nodes across graphs within a dataset/conjunctive graph
  2. allow to parse multiple documents within the same context of blank nodes
  3. allow to parse files in different contexts

mwatts15 added a commit to mwatts15/rdflib that referenced this issue May 31, 2020
mwatts15 added a commit to mwatts15/rdflib that referenced this issue May 31, 2020
- Needed so you can access through Graph.parse which does not support
  passing args to Parser __init__
mwatts15 added a commit to mwatts15/rdflib that referenced this issue May 31, 2020
mwatts15 added a commit to mwatts15/rdflib that referenced this issue May 31, 2020
- Needed so you can access through Graph.parse which does not support
  passing args to Parser __init__
mwatts15 added a commit to mwatts15/rdflib that referenced this issue May 31, 2020
- Also making "py:obj" the default role for docs
nicholascar added a commit that referenced this issue Jun 1, 2020
…de-collisions

Allow distinct blank node contexts from one NTriples parser to the next (#980)
mwatts15 added a commit to mwatts15/rdflib that referenced this issue Jun 1, 2020
- Also, updating the shared context so it works properly with
  Graph.parse
mwatts15 added a commit to mwatts15/rdflib that referenced this issue Jun 1, 2020
- Also, writing out BNode reference to avoid Sphinx warning
- Deleting NTParser __init__ since it doesn't do anything
@white-gecko
Copy link
Member

Cool thank you @mwatts15 for #1107 this is the interface as I have proposed it. I like it. We have to make sure that it also works across different serialization formats. I think it should not be a problem with Turtle, for RDF/XML the value of rdf:nodeID the same as the bnodeLabel following _: in Turtle and NTriples and JSON-LD is also using the _: syntax.

Also We need a similar solution for #892.

@white-gecko
Copy link
Member

I'm currently not able to test #1107 and #1108. But As I see for #1108 the test do not yet reflect using the same context for different serialization formats. Also we need it for the other formats as well.

@mwatts15
Copy link
Contributor

mwatts15 commented Jun 1, 2020

@white-gecko I'm only really interested in the N-Triples and N-Quads formats.

As far as other parsers, you already get distinct blank nodes between different documents for some. I don't know if sharing them across documents makes as much sense for other formats. Turtle/N3 has more complicated handling of blank nodes: formulas define their own nested blank node contexts. What's the use-case for something like the bnode_context idea? The RDF/XML parser gives you distinct IDs for each parse unless you use preserve_node_ids - it just means "use the node ID as the BNode identifier". TriX also has preserve_node_ids although the TriX parser still creates BNodes like BNode(label) even when it's not "preserving" identifiers -- seems pretty useless.

JSON-LD looks like it would be more annoying in general, but also for this. I have less than zero interest in that.

@white-gecko
Copy link
Member

That is fine. I'm actually also just interested in this feature for NTriples. But for the sake of consistency of the parsing interface I think it would be good to have the blank node/blank id support handled in the same way for all parsers. Maybe there will be somebody who needs it at some time … ;-)

nicholascar added a commit that referenced this issue Jun 5, 2020
…ode-collisions

BNode context dicts for NT and N-Quads parsers
@ghost
Copy link

ghost commented Dec 10, 2021

Looks like #1108 fixes this issue (“Address remainder #980. Also add similar behavior for N-Quads.”) and so it can be closed?

@nicholascar
Copy link
Member

Closing this Issue since PR #1495 includes a test that shows that this particular Issue is solved (due to PR #1108). Thanlks @gjhiggins!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants