Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bind_namespaces strategy for Prefix.cc #2239

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

cthoyt
Copy link
Contributor

@cthoyt cthoyt commented Feb 27, 2023

Summary of changes

This PR implements a web-dependent loader for prefix-namespace definitions from Prefix.cc by parsing its context document (https://prefix.cc/context.jsonld)

Demo

The Gene Ontology is a biomedical ontology describing biological processes, cellular locations, and cellular components.
It is typically abbreviated with the prefix "go" and uses PURLs issued by the Open Biological and Biomedical Ontologies (OBO) Foundry. Prefix.cc has an entry for GO that uses its preferred OBO PURL.

import rdflib

graph = rdflib.Graph(bind_namespaces="cc")

prefix_map = {prefix: str(ns) for prefix, ns in graph.namespaces()}
assert "go" in prefix_map
assert prefix_map["go"] == "http://purl.obolibrary.org/obo/GO_"
assert graph.qname("http://purl.obolibrary.org/obo/GO_0032571") == "go:0032571"

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Added tests for any changes that have a runtime impact.
  • Checked that all tests and type checking passes.
  • For changes that have a potential impact on users of this project:
    • Updated relevant documentation to avoid inaccuracies.
    • Considered adding additional documentation.
    • Considered adding an example in ./examples for new features.
    • Considered updating our changelog (CHANGELOG.md).
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@cthoyt cthoyt marked this pull request as ready for review February 27, 2023 18:15
Comment on lines +729 to +733
def _get_prefix_cc():
"""Get the context from Prefix.cc."""
response = urlopen("https://prefix.cc/context.jsonld")
context = json.loads(response.read())
return context["@context"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern here is that there are some valid "@context" values [ref] that will not be handled correctly here, it may be that they don't appear in prefix.cc, but it may be good to confirm that somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's the case that there are no special values in this context dictionary based on the way that the context is constructed (https://github.com/cygri/prefix.cc/blob/cbc85c00e59e00cf4fee697374109fdd9027231a/templates/format/jsonld.php) and the strict requirements on prefixes (though right now I am having a hard time finding where it's documented that these have to be lowercase strings of alphanumeric characters length <= 10)

Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to add cc to

_NamespaceSetString = PyLiteral["core", "rdflib", "none"]

And also update

* cc:
* using prefix bindings from prefix.cc which is a online prefixes database
* not implemented yet - this is aspirational

@aucampia
Copy link
Member

Will try to get this into the next release, I will make any changes needed directly to your branch.

@cthoyt
Copy link
Contributor Author

cthoyt commented Mar 11, 2023

@aucampia thank you for the feedback! Feel free to make any edits you think are appropriate or request I make some updates.

@aucampia
Copy link
Member

@cthoyt I will likely make another patch release soon, so will hold back a bit on merging this.

Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, though I may add a warning that indicates that the "cc" selector is offered as a best-effort feature and may stop working if prefix.cc stops working or if they change their interface.

@@ -418,11 +418,14 @@ def __init__(self, graph: "Graph", bind_namespaces: "_NamespaceSetString" = "cor
for prefix, ns in _NAMESPACE_PREFIXES_CORE.items():
self.bind(prefix, ns)
elif bind_namespaces == "cc":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the TLD name for prefix.cc? cc is also the conventional prefix for creativecommons, so this might be confusing.

(I would also prefer having this as a utility (e.g. graph.bind_namespaces(util.get_prefix_cc()) rather than a flag to Graph. That would be more explicit and lets the user control network access and caching.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niklasl I am happy to do whatever the RDFLib team decides on, but this interface and nomenclature was already predefined, so I just filled it in with an implementation as suggested.

I agree that since prefix.cc relies on a network connection that this is a bit of a tricky situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you can achieve the same effect using RDFLib as is, using just:

from rdflib import Graph

graph = Graph()
graph.parse('https://prefix.cc/context.jsonld')

for pfx, ns in graph.namespaces():
    print(pfx, ns)

This yields one very interesting difference though: the go prefix won't work, as it ends in a _, which is not treated as a namespace prefix in JSON-LD 1.1, since it does not en with a URI gen-delim (it has to be explicitly declared using "@prefix": true in the context to be treated as a prefix anyway).

@aucampia
Copy link
Member

aucampia commented Apr 7, 2023

Since we had this in the docstring, I'm going to merge it (possibly with some minor changes), though possibly it would have been better in rdflib._contrib, in future I will channel functionality like this there instead.

@aucampia
Copy link
Member

aucampia commented Apr 7, 2023

Since we had this in the docstring, I'm going to merge it (possibly with some minor changes), though possibly it would have been better in rdflib._contrib, in future I will channel functionality like this there instead.

Actually, I think it would be better to just move this into rdflib._contrib, some ways in which we can implement this is:

  • Change type of bind_namespaces to also accept a mapping of prefix to string (similar to rdflib.namespace._NAMESPACE_PREFIXES_RDFLIB), add a function to rdflib._contrib that returns such a mapping for prefix.cc, this would then be used as follow:

    from rdflib._contrib import get_prefixcc_mapping
    from rdflib import Graph
    
    g = Graph(bind_namespaces=rdflib._contrib.get_prefixcc_mapping())
  • Change type of bind_namespaces to also accept a function that returns a mapping of prefix to string

    from rdflib._contrib.namespace_providers import prefix_cc
    from rdflib import Graph
    
    g = Graph(bind_namespaces=prefix_cc)

    rdflib._contrib.namespace_providers.prefix_cc would then return the mapping when called.

Maybe something better can also be done, but I would rather have it in rdflib._contrib than explain in the doc string that we cannot cover this with normal guarantees for our public interface stability.

@aucampia aucampia added the needs more work The PR needs more work before being merged or further reviewed. label Apr 7, 2023
@aucampia
Copy link
Member

PRs to V6 is closed until further notice. See this for more details:

@aucampia aucampia added the on hold Progress on this issue is blocked by something. label May 21, 2023
@aucampia
Copy link
Member

PRs to V6 is closed until further notice. See this for more details:

We will be open for PRs again once this is resolved:

@aucampia aucampia removed the on hold Progress on this issue is blocked by something. label Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more work The PR needs more work before being merged or further reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants