-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xfail test for TriG default graph handling #1796
Conversation
This adds a xfail marked failing tests for the handling of default graph identifier in TriG serialization. TriG serializes triples within the default graph as anonymous triples instead of labelled tripples. This xfail tests records a known bug and should help us notice if this bug is fixed, and should also help detect further regressions. Other changes: - Added `simple_quad` to variants tests with HexTuple and TriG format. - Added an additional exact_match assert for variants which can be used to sidestep some of the known issues with isomorphic graph detection. This is useful for graphs with no BNodes. - Also added round-tripping for `variants/simple_quad.trig`.
@pytest.mark.xfail(raises=AssertionError, reason=""" | ||
This should pass, but for some reason when the default identifier is | ||
set, trig serializes quads inside this default indentifier to an | ||
anonymous graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this is the correct behaviour. For once, the docs are clear and explicit:
"A ConjunctiveGraph is an (unnamed) aggregation of all the named graphs in a store.
It has a
default
graph, whose name is associated with the graph throughout its life.
:meth:__init__
can take an identifier to use as the name of this default graph or it
will assign a BNode."
graph_id
is asserted to be the identifier of the ConjunctiveGraph and therefore the
identifier of the default graph and, as the quad uses graph_id
as the context identifier,
the statement appears in the default graph, as described in the docstring and the
serialization is correct.
The quad after parsing is:
[
(
rdflib.term.URIRef('http://example.com/subject'),
rdflib.term.URIRef('http://example.com/predicate'),
rdflib.term.URIRef('http://example.com/object'),
rdflib.term.URIRef('http://example.com/graph')
)
]
AIUI, the isomorphic difference comes from the parsing of that (IMO) correct serialization, the result of which is:
[
(
rdflib.term.URIRef('http://example.com/subject'),
rdflib.term.URIRef('http://example.com/predicate'),
rdflib.term.URIRef('http://example.com/object'),
rdflib.term.BNode('N8b0826bb0fcb40cc88aeafe1a6964898')
)
]
which in turn is serialized as:
@prefix ns1: <http://example.com/> .
_:N8b0826bb0fcb40cc88aeafe1a6964898 {
ns1:subject ns1:predicate ns1:object .
}
it starts getting complicated at that point because if a quad with a BNode
context identifier is added:
quad2 = (EG["SUBJECT"], EG["predicate"], EG["object"], rdflib.BNode())
graph.add(quad2)
the quads become:
[
(
rdflib.term.URIRef('http://example.com/SUBJECT'),
rdflib.term.URIRef('http://example.com/predicate'),
rdflib.term.URIRef('http://example.com/object'),
rdflib.term.BNode('n3b8c44770ac14158bb50244d3caf978ab1')
),
(
rdflib.term.URIRef('http://example.com/subject'),
rdflib.term.URIRef('http://example.com/predicate'),
rdflib.term.URIRef('http://example.com/object'),
rdflib.term.BNode('N12d002dd5a6e459a955ffb38d326dcf9')
)
]
and the graph is trig-serialized as:
@prefix ns1: <http://example.com/> .
_:n3b8c44770ac14158bb50244d3caf978ab1 {
ns1:SUBJECT ns1:predicate ns1:object .
}
_:N12d002dd5a6e459a955ffb38d326dcf9 {
ns1:subject ns1:predicate ns1:object .
}
whilst (AFAICT) the RDFLib trig parser can successfully round-trip this serialization, I don't believe the semantics are exportable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if you are keen to have a trig serialization xfail 😄, I can oblige:
data = """<http://example.org/alice> <http://purl.org/dc/terms/publisher> "Alice" .
<http://example.org/bob> <http://purl.org/dc/terms/publisher> "Bob" .
<http://example.org/harry> <http://purl.org/dc/terms/publisher> "Harry" .
_:b1 <http://xmlns.com/foaf/0.1/mbox> <mailto:bob@oldcorp.example.org> <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/knows> _:b2 <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/knows> _:b3 <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/name> "Bob" <http://example.org/bob> .
_:b2 <http://xmlns.com/foaf/0.1/mbox> <mailto:alice@work.example.org> <http://example.org/alice> .
_:b2 <http://xmlns.com/foaf/0.1/name> "Alice" <http://example.org/alice> .
_:b3 <http://xmlns.com/foaf/0.1/mbox> <mailto:harry@work.example.org> <http://example.org/harry> .
_:b3 <http://xmlns.com/foaf/0.1/name> "Harry" <http://example.org/harry> .
_:b3 <http://xmlns.com/foaf/0.1/knows> _:b1 <http://example.org/harry> .
"""
@pytest.mark.xfail(reason="TriG fails to serialize BNodes correctly")
def test_trig_serializer():
graph = rdflib.ConjunctiveGraph()
graph.parse(data=data, format="nquads")
data_str = graph.serialize(format="trig")
assert "[] foaf:knows [ ]" in data_str # Whoa!
parsed_graph = rdflib.ConjunctiveGraph()
parsed_graph.parse(data=data_str, format="trig")
GraphHelper.assert_quad_sets_equals(graph, parsed_graph)
Actually serializes as:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://example.org/> .
ns1:bob {
[] foaf:knows [ ],
[ ] ;
foaf:mbox <mailto:bob@oldcorp.example.org> ;
foaf:name "Bob" .
}
ns1:alice {
[] foaf:mbox <mailto:alice@work.example.org> ;
foaf:name "Alice" .
}
_:N8bab0d1ee5c547879f6c6b5ab3d4a6a8 {
ns1:alice dcterms:publisher "Alice" .
ns1:bob dcterms:publisher "Bob" .
ns1:harry dcterms:publisher "Harry" .
}
ns1:harry {
[] foaf:knows [ ] ;
foaf:mbox <mailto:harry@work.example.org> ;
foaf:name "Harry" .
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this is the correct behaviour. For once, the docs are clear and explicit:
"A ConjunctiveGraph is an (unnamed) aggregation of all the named graphs in a store.
It has adefault
graph, whose name is associated with the graph throughout its life.
:meth:__init__
can take an identifier to use as the name of this default graph or it
will assign a BNode."
graph_id
is asserted to be the identifier of the ConjunctiveGraph and therefore the identifier of the default graph and, as the quad usesgraph_id
as the context identifier, the statement appears in the default graph, as described in the docstring and the serialization is correct.
Thank you very much for the thorough review, you are indeed correct, if this is the identifier for the default graph, then the serialization is correct, so the behaviour here is as intended. I got confused because I did not expect the identifier here to be associated with the concept of the default graph from RDF abstract syntax as that should not have an IRI, or any graph name at all (not even a BNode) as per:
https://www.w3.org/TR/rdf11-concepts/#section-dataset
- RDF Datasets
An RDF dataset is a collection of RDF graphs, and comprises:
In defense of the current behaviour however, ConjuctiveGraph is not a Dataset, and it is not possible to assign a default graph ID to a Dataset, even though it has one which is internal to RDFLib:
Line 1872 in 1cba9d8
DATASET_DEFAULT_GRAPH_ID = URIRef("urn:x-rdflib:default") |
This test actually passes for JSON-LD though, but I will rework this a bit and then see if I can't fix json-ld to also behave the same as TriG and hextuples which after your explanation is the correct explanation.
I will also maybe add a warning to the docs that quads in the identified graph will be serialized without a graph name, which is something that is quite surprising, even though it is in line with the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
]
However, if you are keen to have a trig serialization xfail smile, I can oblige:
I will investigate this a bit and add it as a variant test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if you are keen to have a trig serialization xfail smile, I can oblige:
Where did you get this graph from, is it one of the TriG examples? Would like to not the provenance of the file if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay it seems to be example 2 from TriG but with some extra bits.
GraphHelper.assert_quad_sets_equals(graph, parsed_graph)
will only work if there are no blank nodes, GraphHelper.assert_isomorphic(g1, g2)
is better for when there are blank nodes, but this has some issues (see #1797).
There is however a problem somewhere, if I add round-tripping for this it fails:
# from example 2 in https://www.w3.org/TR/trig/#sec-graph-statements
# This document contains a default graph and two named graphs.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
# default graph
{
<http://example.org/bob> dc:publisher "Bob" .
<http://example.org/alice> dc:publisher "Alice" .
}
<http://example.org/bob>
{
_:a foaf:name "Bob" .
_:a foaf:mbox <mailto:bob@oldcorp.example.org> .
_:a foaf:knows _:b .
}
<http://example.org/alice>
{
_:b foaf:name "Alice" .
_:b foaf:mbox <mailto:alice@work.example.org> .
}
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.cleanish, configfile: tox.ini
plugins: subtests-0.7.0, md-report-0.2.0, cov-3.0.0
collected 1 item
test/test_roundtrip.py F [100%]
================================================================================= FAILURES ==================================================================================
____________________________________________________________ test_extra[roundtrip_rdf11trig_eg2.trig_trig_trig] _____________________________________________________________
Traceback (most recent call last):
File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/test_roundtrip.py", line 281, in test_extra
checker(*args)
File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/test_roundtrip.py", line 200, in roundtrip
GraphHelper.assert_isomorphic(g1, g2)
File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/testutils.py", line 223, in assert_isomorphic
assert rdflib.compare.isomorphic(lhs, rhs), format_report()
AssertionError: in both:
(rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:alice@work.example.org'))
(rdflib.term.URIRef('http://example.org/bob'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Bob'))
(rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Alice'))
(rdflib.term.URIRef('http://example.org/alice'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Alice'))
only in first:
(rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'))
(rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:bob@oldcorp.example.org'))
(rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
only in second:
(rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:bob@oldcorp.example.org'))
(rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cb0'))
(rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
assert False
+ where False = <function isomorphic at 0x7f65b55b2160>(<Graph identifier=N4cecba01e82a4534bb72cd9c89d58c47 (<class 'rdflib.graph.ConjunctiveGraph'>)>, <Graph identifier=N8a93f485544c49e79b49ed0f3ffddcc4 (<class 'rdflib.graph.ConjunctiveGraph'>)>)
+ where <function isomorphic at 0x7f65b55b2160> = <module 'rdflib.compare' from '/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/rdflib/compare.py'>.isomorphic
+ where <module 'rdflib.compare' from '/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/rdflib/compare.py'> = rdflib.compare
============================================================================= warnings summary ==============================================================================
.venv/lib64/python3.9/site-packages/_pytest/fixtures.py:227
/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/.venv/lib64/python3.9/site-packages/_pytest/fixtures.py:227: UserWarning: Code: _pytestfixturefunction is not defined in namespace XSD
fixturemarker: Optional[FixtureFunctionMarker] = getattr(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================== short test summary info ==========================================================================
FAILED test/test_roundtrip.py::test_extra[roundtrip_rdf11trig_eg2.trig_trig_trig] - AssertionError: in both:
======================================================================= 1 failed, 1 warning in 0.11s ========================================================================
Closing this as it is indeed intended behaviour, may make another PR with some of the other changes in here, they should have been in a separate PR anyway. |
The first xfail occurs during round tripping, TriG seems to be making some mistake when encoding blank nodes, as it is encoding that "Bob" knows someone who does not exist. This was reported by @gjhiggins in RDFLib#1796 (comment) The second xfail seems to be related to hextuple parsing, when comparing the hextuple parsed result of Example 2 with the TriG parsed graph of Example 2 the graphs are not isomorphic more than 70% of the time, but sometimes they are isomorphic. Inoticed this while adding the xfail for the issue @gjhiggins noticed. Other changes: - Added `simple_quad` to variants tests with HexTuple and TriG format. - Added an additional exact_match assert for variants which can be used to sidestep some of the known issues with isomorphic graph detection. This is useful for graphs with no BNodes. - Also added round-tripping for `variants/simple_quad.trig`. - Various changes to ensure determensitic ordering so that it is easier to compare things visually and so that tests always do the exact same thing in the exact same order.
This adds a xfail marked failing tests for the handling of default
graph identifier in TriG serialization. TriG serializes triples within the
default graph as anonymous triples instead of labelled tripples.
This xfail tests records a known bug and should help us notice if this
bug is fixed, and should also help detect further regressions.
Other changes:
simple_quad
to variants tests with HexTuple and TriG format.to sidestep some of the known issues with isomorphic graph detection.
This is useful for graphs with no BNodes.
variants/simple_quad.trig
.