Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add test of ConjunctiveGraph operators #1647

Merged
merged 2 commits into from Jan 10, 2022
Merged

add test of ConjunctiveGraph operators #1647

merged 2 commits into from Jan 10, 2022

Conversation

ghost
Copy link

@ghost ghost commented Jan 6, 2022

Although this is a byproduct of the identifier-as-context PR, I'm splitting this off as a separate PR in order to clarify the situation w.r.t issue #225 and establish a pattern for a similar approach to Dataset operators.

Proposed Changes

  • Add test of ConjunctiveGraph operator combinations

gromgull's ruminations in #225 remain pertinent to ConjunctiveGraph in terms of a set-theoretic perspective as well as Pythonic perspective.

“Currently all operations are done on the default graph, i.e. if you add another graph, even if it's a conjunctive graph, all triples are added to the default graph. ... It may make sense to check if the other thing added is ALSO a conjunctive graph and merge the contexts:”

I suspect that we have to read “added” and “merged” in an “amenable-to-operator” sense as it doesn't seem to make sense for it to mean “copied” and really it can't because a dbpedia SPARQLStore can be a context. I think it also sheds a little light on the conceptual difference between ConjunctiveGraph and Dataset (in which adding doesn't imply adding to the default graph, except, of course, if your SPARQL endpoint has a different opinion).

Interestingly, the next set of observations (reformatted for re-presentaion) also seem to be based on an assumption that the contexts are all local :

"all methods work against this default graph" - sounds good, but isn't quite true.

len will give you the length of the conjunction of all graphs; contains and triples will check for triples in all graphs; remove will remove from all graphs; addN/quads work on quads directly.

In fact, the only method to make use of the default_context was add (and iadd gets delegated to add). So add works on a single graph only, remove does not.
So: conjgraph += g1 and conjgraph -= g1 may not be what you want (since isub will be delegated to remove which removes from ALL graphs).

I am not really sure what makes the most sense here. Adding iadd/isub that works for quads seems sensible, but breaks backwards compatibility in a fairly subtle way. I would postpone until rdflib 4.

I also do not like that len gives you the number of TRIPLES not of quads, but that is another issue.

Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a suggestion on using urn:example:

@ghost
Copy link
Author

ghost commented Jan 6, 2022

“Currently all operations are done on the default graph, i.e. if you add another graph, even if it's a conjunctive graph, all triples are added to the default graph.

The tests introduced in this PR include an example of the above, adding a ConjunctiveGraph to a ConjunctiveGraph but it's not a sufficiently thorough test to reveal what gromgull was concerned about:

def test_operators_with_two_conjunctivegraphs():
    cg1 = ConjunctiveGraph()
    cg1.add([tarek, likes, pizza])
    cg1.add([tarek, likes, michel])
    cg2 = ConjunctiveGraph()
    cg2.add([tarek, likes, pizza])
    cg2.add([tarek, likes, cheese])
    assert len(cg1 + cg2) == 3  # adds cheese as liking
    assert len(cg1 - cg2) == 1  # removes pizza from cg1
    assert len(cg1 * cg2) == 1  # only pizza
    assert len(cg1 + cg2) == 3  # adds cheese as liking
    assert len(cg1 ^ cg2) == 2  # removes pizza, adds cheese

In the course of neatening the Dataset tests I've just realised an implication from the above that sheds more light on the difference between ConjunctiveGraph and Dataset. It's true that for ConjunctiveGraph all operations are done on the default graph and “all triples are added to the default graph”

def test_cg_with_contexts_add_to_cg():
    cg1 = Dataset()
    assert len(list(cg1.contexts())) == 1
    assert len(list(cg1)) == 0
    cg2 = ConjunctiveGraph()
    cg2.parse(
        data="<urn:x-rdflib:tarek> <urn:x-rdflib:likes> <urn:x-rdflib:pizza> .",
        publicID="urn:x-rdflib:context-a",
        format="ttl",
    )
    cg2.parse(
        data="<urn:x-rdflib:michel> <urn:x-rdflib:likes> <urn:x-rdflib:cheese> .",
        publicID="urn:x-rdflib:context-b",
        format="ttl",
    )
    assert len(list(cg2.contexts())) == 2
    cg1 += cg2
    assert len(list(cg1.contexts())) == 1  # publicID contexts lost
    assert len(list(cg1)) == 2  # Two triples added to default graph

Now, according to the Dataset docstring, you get your working Graphs from a Dataset:

    >>> # Create a graph in the dataset, if the graph name has already been
    >>> # used, the corresponding graph will be returned
    >>> # (ie, the Dataset keeps track of the constituent graphs)
    >>> g = ds.graph(URIRef("http://www.example.com/gr"))

graph (synonym add_graph) returns a Graph object, nominally precluding the use of a ConjunctiveGraph object as a Dataset context. However, gromgull does describe the situation as "chaos" and I guess this is what he meant:

def test_cg_with_contexts_add_to_ds():
    ds = Dataset()
    assert len(list(ds.contexts())) == 1
    assert len(list(ds)) == 0
    cg = ConjunctiveGraph()
    cg.parse(
        data="<urn:x-rdflib:tarek> <urn:x-rdflib:likes> <urn:x-rdflib:pizza> .",
        publicID="urn:x-rdflib:context-a",
        format="ttl",
    )
    cg.parse(
        data="<urn:x-rdflib:michel> <urn:x-rdflib:likes> <urn:x-rdflib:cheese> .",
        publicID="urn:x-rdflib:context-b",
        format="ttl",
    )
    assert len(list(cg.contexts())) == 2
    ds += cg  # !!
    assert len(list(ds.contexts())) == 1  # publicID contexts lost
    assert len(list(ds)) == 2  # Two triples added to default graph

It's difficult to see this as desirable for Dataset (or ConjunctiveGraph for that matter) but any coherent implementation will need to resolve the set-theoretic aspects of Dataset (and ConjunctiveGraph) operators, in-place or otherwise.

thanks!

Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@nicholascar nicholascar self-requested a review January 10, 2022 08:23
@nicholascar nicholascar merged commit d957533 into RDFLib:master Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants