Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind prefixes choices #1686

Merged
merged 20 commits into from
Apr 15, 2022
Merged

Bind prefixes choices #1686

merged 20 commits into from
Apr 15, 2022

Conversation

nicholascar
Copy link
Member

@nicholascar nicholascar commented Jan 23, 2022

Fixes #1679

Proposed Changes

  • a bunch of reST documentation updates
  • updates to inline docc for term.py classes (RDF 1.1, not 1.0 explanations)
  • new param for Graph(): bind_namespaces
    • with options for none, core prefixes (default), all RDFLib Namespaces or use of prefix.cc

Since this PR is really needed - see the comments - I'm pushing it up to review before the prefix.cc option is implemented.

docs/index.rst Outdated Show resolved Hide resolved
docs/rdf_terms.rst Outdated Show resolved Hide resolved
Comment on lines 368 to 369
else: # bind_namespaces is None
pass # bind nothing
Copy link
Member

@aucampia aucampia Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a bit better to rather distinguish between explicit None and invalid/unknown string so that sending the wrong string results in a failure, will make a PR against your branch tomorrow to show you what I mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it. I haven't finished all the implementation here of course, so was going to add checks for None and other strings too but just haven't got to it so far.

Feel free to put in a PR or commits to the branch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR here: #1693

Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
@wmelder
Copy link

wmelder commented Feb 25, 2022

Noticed this dependency in the PR: This is in DRAFT until I implement the prexif.cc lookup option
For me, that lookup option is a nice to have. It would already be an improvement if the options: none, core prefixes (default), all RDFLib Namespaces .. can be chosen. Using 'none' I can add all namespaces I need and no other defaults would bloat my RDF.

nicholascar and others added 2 commits February 26, 2022 02:27
Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
@cmungall
Copy link
Contributor

I agree with @wmelder's comments - I don't think having prefix.cc lookup should be core functionality. It would be really awesome if we could get this PR merged as currently these additional prefixes are causing a lot of problems for us.

@nicholascar
Copy link
Member Author

nicholascar commented Mar 28, 2022

I'm held up in completing this PR for, due to my testing, I've think I'm seeing that bind with replace=True doesn't actually work! This test fails:

from rdflib import Graph
from rdflib.namespace import OWL

def test_rebinding():
    g = Graph()  # 'core' bind_namespaces (default)
    print()
    # 'owl' should be bound
    assert "owl" in [x for x, y in list(g.namespaces())]

    # replace 'owl' with 'sowa'
    # 'sowa' should be bound
    # 'owl' should not be bound
    g.bind("sowa", OWL, replace=True)

    assert "sowa" in [x for x, y in list(g.namespaces())]
    assert "owl" not in [x for x, y in list(g.namespaces())]

@ghost
Copy link

ghost commented Mar 28, 2022

I'm held up in completing this PR for, due to my testing, I've think I'm seeing that bind with replace=True doesn't actually work! This test fails:

    assert "owl" not in [x for x, y in list(g.namespaces())]

I believe that's a correctly-failing test according to the implementation, if not according to the API spec. It rather looks like replace should be read as "replace the namespace bound to this prefix" rather than "replace the prefix bound to this namespace".

In namespace/__init.py__::NamespaceManager.bind(), LoC#566 results in bound_namespace=None and so the if bound_namespace and bound_namespace != namespace: of LoC#573 fails and the execution skips to the else at Loc#598, where it skips to the else on Loc#674 - resulting in the new binding of sowa to the OWL namespace, that gets added to the trie and returned.

I'll hazard a guess that for replace to work as originally intended, the old binding needs to be deleted from the trie.

Um, wrong again. Turns out that the persistence-backed stores handle this (sort of) gracefully and automatically perform a replace: BerkeleyDB has:

    def bind(self, prefix, namespace):
        prefix = prefix.encode("utf-8")
        namespace = namespace.encode("utf-8")
        bound_prefix = self.__prefix.get(namespace)
        if bound_prefix:
            self.__namespace.delete(bound_prefix)
        self.__prefix[namespace] = prefix
        self.__namespace[prefix] = namespace

So, for persistence stores using that architecture, the test passes but the SimpleMemory and Memory implementation ignores any already-bound prefix and so the test fails:

    def bind(self, prefix, namespace):
        self.__prefix[namespace] = prefix
        self.__namespace[prefix] = namespace

Making the following (fairly straightforward) change to memory.py allows the test to succeed (with one small change)

diff --git a/rdflib/plugins/stores/memory.py b/rdflib/plugins/stores/memory.py
index 5a137b5a..135f2a22 100644
--- a/rdflib/plugins/stores/memory.py
+++ b/rdflib/plugins/stores/memory.py
@@ -149,6 +149,9 @@ class SimpleMemory(Store):
         return i
 
     def bind(self, prefix, namespace):
+        bound_prefix = self.__prefix.get(namespace)
+        if bound_prefix:
+            self.__namespace.delete(bound_prefix)
         self.__prefix[namespace] = prefix
         self.__namespace[prefix] = namespace
 
@@ -400,6 +403,9 @@ class Memory(Store):
                             yield triple, self.__contexts(triple)
 
     def bind(self, prefix, namespace):
+        bound_prefix = self.__prefix.get(namespace)
+        if bound_prefix:
+            del self.__namespace[bound_prefix]
         self.__prefix[namespace] = prefix
         self.__namespace[prefix] = namespace
 
def test_rebinding():
    g = Graph()  # 'core' bind_namespaces (default)

    # 'owl' should be bound
    assert "owl" in [x for x, y in list(g.namespaces())]

    NOWL = g.store.namespace('owl')  # see ¹

    # replace 'owl' with 'sowa'
    # 'sowa' should be bound
    # 'owl' should not be bound

    g.bind("sowa", NOWL)

    assert "sowa" in [x for x, y in list(g.namespaces())]
    assert "owl" not in [x for x, y in list(g.namespaces())]

¹ NOWL is necessary because OWL is "<class 'rdflib.namespace.DefinedNamespaceMeta'>" and so the Store's test of self.__prefix.get(namespace) fails, whereas NOWL evals to the actually-bound URIRef and succeeds.

@nicholascar
Copy link
Member Author

I'll hazard a guess that for replace to work as originally intended, the old binding needs to be deleted from the trie.
the SimpleMemory and Memory implementation ignores any already-bound prefix and so the test fails

That's what I though I was seeing, that it was Store-specific and that the Memory stores were failing! I was hoping to be able to restrict my PR changes to the namespace module but clearly it's not possible to sort this our without also touching the Memory stores.

OK, I'll make the updates as outlined above later today.

@aucampia
Copy link
Member

That's what I though I was seeing, that it was Store-specific and that the Memory stores were failing! I was hoping to be able to restrict my PR changes to the namespace module but clearly it's not possible to sort this our without also touching the Memory stores.

OK, I'll make the updates as outlined above later today.

Happy to help here if you don't find time, can try make a separate PR to fix it tomorrow if you want, just ping me.

@nicholascar
Copy link
Member Author

Thanks for the offer @aucampia. I think I'd like to give this a go initially. The fix for Memory stores is easy, even though I was trying to avoid having to fiddle with the Stores! - but I'm keen to test out other stores too, especially the SPARQL stores. So I'll give it a shot shortly and reach out to you if I need a hand.

@nicholascar
Copy link
Member Author

@aucampia any ideas as to why everything's failing? It looks like a system fault with errors such as ERROR: py38: commands failed rather than application code errors.

Running locally, everything passes so I think I'm done with the fixes for Stores.

@ghost ghost mentioned this pull request Mar 30, 2022
3 tasks
@ghost
Copy link

ghost commented Mar 31, 2022

Just FTR, there's a relevant extant issue from a while ago - #543 (which contains some informative observations by gromgull of the intended replace and override functionality) and an unfinished open PR #1094, both of which can be closed when this PR is merged.

@aucampia
Copy link
Member

@aucampia any ideas as to why everything's failing? It looks like a system fault with errors such as ERROR: py38: commands failed rather than application code errors.

Running locally, everything passes so I think I'm done with the fixes for Stores.

image

mypy is failing, will make a fix to your branch in a moment.

@aucampia
Copy link
Member

@nicholascar let me know if you want me to split things off from this PR, can easily use rebase to keep your commits the same just put them in a seperate branch with you as author, I think a lot of these doc changes are very helpful and very unobjectionable. Up to you though.

This was referenced Apr 14, 2022
@aucampia
Copy link
Member

Will look at tying up all loose ends here next on 2022-04-18 if there is no further feedback.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in hand now.

@aucampia
Copy link
Member

Will rebase and fix the docs warning.

@nicholascar
Copy link
Member Author

pre-commit.ci autofix

@nicholascar
Copy link
Member Author

This is only failing the documentation building check, but that is being addressed elsewhere, so I'm going to merge this in now.

@nicholascar nicholascar merged commit b5ec096 into master Apr 15, 2022
@nicholascar nicholascar deleted the default_prefixes branch April 15, 2022 14:39
@aucampia
Copy link
Member

This is only failing the documentation building check, but that is being addressed elsewhere, so I'm going to merge this in now.

Happy with that, I figured out what the problem is and will make a PR to fix on master shortly.

@cmungall
Copy link
Contributor

it would be awesome to have a new release with this PR in - I'm currently blocked on upgrading rdflib in a few projects as the namespace pollution is causing issues

@nicholascar
Copy link
Member Author

@cmungall we're going to get a release out ASAP. Yes, I'm in the same boat as you!

@cmungall
Copy link
Contributor

cmungall commented Jun 6, 2022

Just a friendly ping to see if there was a timeline for a new release with the namespace pollution removed?

aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 19, 2023
… back the `wgs` binding

<RDFLib#1686> inadvertently changed prefix for
`https://schema.org/` from `schema` to `sdo` and removed the `wgs` prefix. This
PR changes the `https://schema.org/` binding back to `schema` and adds the `wgs`
binding back.

- Closes <RDFLib#2196>.
aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 19, 2023
… the `wgs` binding back

<RDFLib#1686> inadvertently changed the prefix
for `https://schema.org/` from `schema` to `sdo` and removed the `wgs` prefix.
This PR changes the `https://schema.org/` binding back to `schema` and adds the
`wgs` binding back.

- Closes <RDFLib#2196>.
aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 19, 2023
… the `wgs` binding back

<RDFLib#1686> inadvertently changed the prefix
for `https://schema.org/` from `schema` to `sdo` and removed the `wgs` prefix.
This PR changes the `https://schema.org/` binding back to `schema` and adds the
`wgs` binding back.

- Closes <RDFLib#2196>.
aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 19, 2023
… the `wgs` binding back

<RDFLib#1686> inadvertently changed the prefix
for `https://schema.org/` from `schema` to `sdo` and removed the `wgs` prefix.
This PR changes the `https://schema.org/` binding back to `schema` and adds the
`wgs` binding back.

- Closes <RDFLib#2196>.
aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 19, 2023
… the `wgs` binding back

<RDFLib#1686> inadvertently changed the prefix
for `https://schema.org/` from `schema` to `sdo` and removed the `wgs` prefix.
This PR changes the `https://schema.org/` binding back to `schema` and adds the
`wgs` binding back.

- Closes <RDFLib#2196>.
aucampia added a commit to aucampia/rdflib that referenced this pull request Mar 20, 2023
<RDFLib#1686> inadvertently removed the `wgs` prefix.
This change adds it back.

- Closes <RDFLib#2196>.
aucampia added a commit that referenced this pull request Mar 21, 2023
<#1686> inadvertently removed the `wgs` prefix.
This change adds it back.

- Closes <#2196>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace long list of namespaces with list of prefixes used when using serialize.
4 participants