Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xfail test for TriG default graph handling #1796

Closed
wants to merge 1 commit into from

Conversation

aucampia
Copy link
Member

@aucampia aucampia commented Apr 9, 2022

This adds a xfail marked failing tests for the handling of default
graph identifier in TriG serialization. TriG serializes triples within the
default graph as anonymous triples instead of labelled tripples.

This xfail tests records a known bug and should help us notice if this
bug is fixed, and should also help detect further regressions.

Other changes:

  • Added simple_quad to variants tests with HexTuple and TriG format.
  • Added an additional exact_match assert for variants which can be used
    to sidestep some of the known issues with isomorphic graph detection.
    This is useful for graphs with no BNodes.
  • Also added round-tripping for variants/simple_quad.trig.

This adds a xfail marked failing tests for the handling of default
graph identifier in TriG serialization. TriG serializes triples within the
default graph as anonymous triples instead of labelled tripples.

This xfail tests records a known bug and should help us notice if this
bug is fixed, and should also help detect further regressions.

Other changes:
- Added `simple_quad` to variants tests with HexTuple and TriG format.
- Added an additional exact_match assert for variants which can be used
  to sidestep some of the known issues with isomorphic graph detection.
  This is useful for graphs with no BNodes.
- Also added round-tripping for `variants/simple_quad.trig`.
@aucampia aucampia marked this pull request as ready for review April 9, 2022 12:55
@aucampia aucampia added bug Something isn't working review wanted This indicates that the PR is ready for review labels Apr 9, 2022
@pytest.mark.xfail(raises=AssertionError, reason="""
This should pass, but for some reason when the default identifier is
set, trig serializes quads inside this default indentifier to an
anonymous graph.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is the correct behaviour. For once, the docs are clear and explicit:

"A ConjunctiveGraph is an (unnamed) aggregation of all the named graphs in a store.

It has a default graph, whose name is associated with the graph throughout its life.
:meth:__init__ can take an identifier to use as the name of this default graph or it
will assign a BNode."

graph_id is asserted to be the identifier of the ConjunctiveGraph and therefore the
identifier of the default graph and, as the quad uses graph_id as the context identifier,
the statement appears in the default graph, as described in the docstring and the
serialization is correct.

The quad after parsing is:

[
    (
        rdflib.term.URIRef('http://example.com/subject'),
        rdflib.term.URIRef('http://example.com/predicate'),
        rdflib.term.URIRef('http://example.com/object'),
        rdflib.term.URIRef('http://example.com/graph')
    )
]

AIUI, the isomorphic difference comes from the parsing of that (IMO) correct serialization, the result of which is:

[
    (
        rdflib.term.URIRef('http://example.com/subject'),
        rdflib.term.URIRef('http://example.com/predicate'),
        rdflib.term.URIRef('http://example.com/object'),
        rdflib.term.BNode('N8b0826bb0fcb40cc88aeafe1a6964898')
    )
]

which in turn is serialized as:

@prefix ns1: <http://example.com/> .

_:N8b0826bb0fcb40cc88aeafe1a6964898 {
    ns1:subject ns1:predicate ns1:object .
}

it starts getting complicated at that point because if a quad with a BNode context identifier is added:

    quad2 = (EG["SUBJECT"], EG["predicate"], EG["object"], rdflib.BNode())
    graph.add(quad2)

the quads become:

[
    (
        rdflib.term.URIRef('http://example.com/SUBJECT'),
        rdflib.term.URIRef('http://example.com/predicate'),
        rdflib.term.URIRef('http://example.com/object'),
        rdflib.term.BNode('n3b8c44770ac14158bb50244d3caf978ab1')
    ),
    (
        rdflib.term.URIRef('http://example.com/subject'),
        rdflib.term.URIRef('http://example.com/predicate'),
        rdflib.term.URIRef('http://example.com/object'),
        rdflib.term.BNode('N12d002dd5a6e459a955ffb38d326dcf9')
    )
]

and the graph is trig-serialized as:

@prefix ns1: <http://example.com/> .

_:n3b8c44770ac14158bb50244d3caf978ab1 {
    ns1:SUBJECT ns1:predicate ns1:object .
}

_:N12d002dd5a6e459a955ffb38d326dcf9 {
    ns1:subject ns1:predicate ns1:object .
}

whilst (AFAICT) the RDFLib trig parser can successfully round-trip this serialization, I don't believe the semantics are exportable.

Copy link

@ghost ghost Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if you are keen to have a trig serialization xfail 😄, I can oblige:

data = """<http://example.org/alice> <http://purl.org/dc/terms/publisher> "Alice" .
<http://example.org/bob> <http://purl.org/dc/terms/publisher> "Bob" .
<http://example.org/harry> <http://purl.org/dc/terms/publisher> "Harry" .
_:b1 <http://xmlns.com/foaf/0.1/mbox> <mailto:bob@oldcorp.example.org> <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/knows> _:b2 <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/knows> _:b3 <http://example.org/bob> .
_:b1 <http://xmlns.com/foaf/0.1/name> "Bob" <http://example.org/bob> .
_:b2 <http://xmlns.com/foaf/0.1/mbox> <mailto:alice@work.example.org> <http://example.org/alice> .
_:b2 <http://xmlns.com/foaf/0.1/name> "Alice" <http://example.org/alice> .
_:b3 <http://xmlns.com/foaf/0.1/mbox> <mailto:harry@work.example.org> <http://example.org/harry> .
_:b3 <http://xmlns.com/foaf/0.1/name> "Harry" <http://example.org/harry> .
_:b3 <http://xmlns.com/foaf/0.1/knows> _:b1 <http://example.org/harry> .
"""

@pytest.mark.xfail(reason="TriG fails to serialize BNodes correctly")
def test_trig_serializer():
    graph = rdflib.ConjunctiveGraph()
    graph.parse(data=data, format="nquads")
    data_str = graph.serialize(format="trig")

    assert "[] foaf:knows [ ]" in data_str  # Whoa!

    parsed_graph = rdflib.ConjunctiveGraph()
    parsed_graph.parse(data=data_str, format="trig")

    GraphHelper.assert_quad_sets_equals(graph, parsed_graph)

Actually serializes as:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://example.org/> .

ns1:bob {
    [] foaf:knows [ ],
            [ ] ;
        foaf:mbox <mailto:bob@oldcorp.example.org> ;
        foaf:name "Bob" .
}

ns1:alice {
    [] foaf:mbox <mailto:alice@work.example.org> ;
        foaf:name "Alice" .
}

_:N8bab0d1ee5c547879f6c6b5ab3d4a6a8 {
    ns1:alice dcterms:publisher "Alice" .

    ns1:bob dcterms:publisher "Bob" .

    ns1:harry dcterms:publisher "Harry" .
}

ns1:harry {
    [] foaf:knows [ ] ;
        foaf:mbox <mailto:harry@work.example.org> ;
        foaf:name "Harry" .
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is the correct behaviour. For once, the docs are clear and explicit:

"A ConjunctiveGraph is an (unnamed) aggregation of all the named graphs in a store.
It has a default graph, whose name is associated with the graph throughout its life.
:meth:__init__ can take an identifier to use as the name of this default graph or it
will assign a BNode."

graph_id is asserted to be the identifier of the ConjunctiveGraph and therefore the identifier of the default graph and, as the quad uses graph_id as the context identifier, the statement appears in the default graph, as described in the docstring and the serialization is correct.

Thank you very much for the thorough review, you are indeed correct, if this is the identifier for the default graph, then the serialization is correct, so the behaviour here is as intended. I got confused because I did not expect the identifier here to be associated with the concept of the default graph from RDF abstract syntax as that should not have an IRI, or any graph name at all (not even a BNode) as per:

https://www.w3.org/TR/rdf11-concepts/#section-dataset

  1. RDF Datasets

An RDF dataset is a collection of RDF graphs, and comprises:

  • Exactly one default graph, being an RDF graph. The default graph does not have a name and may be empty.
  • Zero or more named graphs. Each named graph is a pair consisting of an IRI or a blank node (the graph name), and an RDF graph. Graph names are unique within an RDF dataset.

In defense of the current behaviour however, ConjuctiveGraph is not a Dataset, and it is not possible to assign a default graph ID to a Dataset, even though it has one which is internal to RDFLib:

DATASET_DEFAULT_GRAPH_ID = URIRef("urn:x-rdflib:default")

This test actually passes for JSON-LD though, but I will rework this a bit and then see if I can't fix json-ld to also behave the same as TriG and hextuples which after your explanation is the correct explanation.

I will also maybe add a warning to the docs that quads in the identified graph will be serialized without a graph name, which is something that is quite surprising, even though it is in line with the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]

However, if you are keen to have a trig serialization xfail smile, I can oblige:

I will investigate this a bit and add it as a variant test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if you are keen to have a trig serialization xfail smile, I can oblige:

Where did you get this graph from, is it one of the TriG examples? Would like to not the provenance of the file if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay it seems to be example 2 from TriG but with some extra bits.

GraphHelper.assert_quad_sets_equals(graph, parsed_graph) will only work if there are no blank nodes, GraphHelper.assert_isomorphic(g1, g2) is better for when there are blank nodes, but this has some issues (see #1797).

There is however a problem somewhere, if I add round-tripping for this it fails:

# from example 2 in https://www.w3.org/TR/trig/#sec-graph-statements

# This document contains a default graph and two named graphs.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

# default graph
    {
      <http://example.org/bob> dc:publisher "Bob" .
      <http://example.org/alice> dc:publisher "Alice" .
    }

<http://example.org/bob>
    {
       _:a foaf:name "Bob" .
       _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
       _:a foaf:knows _:b .
    }

<http://example.org/alice>
    {
       _:b foaf:name "Alice" .
       _:b foaf:mbox <mailto:alice@work.example.org> .
    }
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.cleanish, configfile: tox.ini
plugins: subtests-0.7.0, md-report-0.2.0, cov-3.0.0
collected 1 item                                                                                                                                                            

test/test_roundtrip.py F                                                                                                                                              [100%]

================================================================================= FAILURES ==================================================================================
____________________________________________________________ test_extra[roundtrip_rdf11trig_eg2.trig_trig_trig] _____________________________________________________________
Traceback (most recent call last):
  File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/test_roundtrip.py", line 281, in test_extra
    checker(*args)
  File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/test_roundtrip.py", line 200, in roundtrip
    GraphHelper.assert_isomorphic(g1, g2)
  File "/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/test/testutils.py", line 223, in assert_isomorphic
    assert rdflib.compare.isomorphic(lhs, rhs), format_report()
AssertionError: in both:
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:alice@work.example.org'))
    (rdflib.term.URIRef('http://example.org/bob'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Bob'))
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Alice'))
    (rdflib.term.URIRef('http://example.org/alice'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Alice'))
  only in first:
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'))
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:bob@oldcorp.example.org'))
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
  only in second:
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:bob@oldcorp.example.org'))
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cb0'))
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
assert False
 +  where False = <function isomorphic at 0x7f65b55b2160>(<Graph identifier=N4cecba01e82a4534bb72cd9c89d58c47 (<class 'rdflib.graph.ConjunctiveGraph'>)>, <Graph identifier=N8a93f485544c49e79b49ed0f3ffddcc4 (<class 'rdflib.graph.ConjunctiveGraph'>)>)
 +    where <function isomorphic at 0x7f65b55b2160> = <module 'rdflib.compare' from '/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/rdflib/compare.py'>.isomorphic
 +      where <module 'rdflib.compare' from '/home/iwana/sw/d/github.com/iafork/rdflib.cleanish/rdflib/compare.py'> = rdflib.compare
============================================================================= warnings summary ==============================================================================
.venv/lib64/python3.9/site-packages/_pytest/fixtures.py:227
  /home/iwana/sw/d/github.com/iafork/rdflib.cleanish/.venv/lib64/python3.9/site-packages/_pytest/fixtures.py:227: UserWarning: Code: _pytestfixturefunction is not defined in namespace XSD
    fixturemarker: Optional[FixtureFunctionMarker] = getattr(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================== short test summary info ==========================================================================
FAILED test/test_roundtrip.py::test_extra[roundtrip_rdf11trig_eg2.trig_trig_trig] - AssertionError: in both:
======================================================================= 1 failed, 1 warning in 0.11s ========================================================================

@aucampia
Copy link
Member Author

Closing this as it is indeed intended behaviour, may make another PR with some of the other changes in here, they should have been in a separate PR anyway.

@aucampia aucampia closed this Apr 10, 2022
aucampia added a commit to aucampia/rdflib that referenced this pull request Apr 11, 2022
The first xfail occurs during round tripping, TriG seems to be making
some mistake when encoding blank nodes, as it is encoding that "Bob"
knows someone who does not exist. This was reported by @gjhiggins in
RDFLib#1796 (comment)

The second xfail seems to be related to hextuple parsing, when comparing
the hextuple parsed result of Example 2 with the TriG parsed
graph of Example 2 the graphs are not isomorphic more than 70% of the time, but
sometimes they are isomorphic. Inoticed this while adding the xfail for
the issue @gjhiggins noticed.

Other changes:
- Added `simple_quad` to variants tests with HexTuple and TriG format.
- Added an additional exact_match assert for variants which can be used
  to sidestep some of the known issues with isomorphic graph detection.
  This is useful for graphs with no BNodes.
- Also added round-tripping for `variants/simple_quad.trig`.
- Various changes to ensure determensitic ordering so that it is easier
  to compare things visually and so that tests always do the exact same
  thing in the exact same order.
@aucampia aucampia deleted the iwana-20220409T1407-trig_defaultid_xfail branch April 16, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working review wanted This indicates that the PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant