Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incongruity in type signature of NamespaceManager #2004

Open
ajnelson-nist opened this issue Jul 1, 2022 · 5 comments
Open

Incongruity in type signature of NamespaceManager #2004

ajnelson-nist opened this issue Jul 1, 2022 · 5 comments
Labels
concept: namespace enhancement New feature or request type hints Relating to type hints or type aliases

Comments

@ajnelson-nist
Copy link
Contributor

Is there a reason rdflib.namespace.NamespaceManager.namespaces has the return signature Iterable[Tuple[str, URIRef]] - specifically, URIRef instead of Namespace?

@aucampia
Copy link
Member

@ajnelson-nist that is in primarily because it will return URIRef objects and not Namespace objects:

From monkeytype:

20220729T042949 iwana@teekai.zoic.eu.org:~/sw/d/github.com/iafork/rdflib.monkeytype
$ task run -- monkeytype stub rdflib.namespace | tee rdflib/namespace/__init__.pyi
task: [run] 
VIRTUAL_ENV="/home/iwana/sw/d/github.com/iafork/rdflib.monkeytype/.venv" PATH="/home/iwana/sw/d/github.com/iafork/rdflib.monkeytype/.venv/bin${PATH:+:${PATH}}" monkeytype stub rdflib.namespace


55 traces failed to decode; use -v for details
from rdflib.term import URIRef
from typing import (
    Any,
    Dict,
    Iterable,
    List,
    Optional,
    Tuple,
    Type,
    Union,
)

# ...elided...

class NamespaceManager:
    def __init__(self, graph: Graph, bind_namespaces: _NamespaceSetString = ...) -> None: ...
    def _store_bind(self, prefix: str, namespace: URIRef, override: bool) -> None: ...
    def absolutize(self, uri: str, defrag: int = ...) -> URIRef: ...
    def bind(self, prefix: Optional[str], namespace: Any, override: bool = ..., replace: bool = ...) -> None: ...
    def compute_qname(self, uri: str, generate: bool = ...) -> Tuple[str, URIRef, str]: ...
    def compute_qname_strict(self, uri: str, generate: bool = ...) -> Tuple[str, str, str]: ...
    def expand_curie(self, curie: str) -> Optional[URIRef]: ...
    def namespaces(self) -> Iterable[Tuple[str, URIRef]]: ...
    def normalizeUri(self, rdfTerm: str) -> str: ...
    def qname(self, uri: str) -> str: ...
    def qname_strict(self, uri: str) -> str: ...
    def reset(self) -> None: ...
    @property
    def store(self) -> Store: ...

# ...elided...

And from pytest-annotate

    {
        "path": "rdflib/namespace/__init__.py",
        "line": 707,
        "func_name": "NamespaceManager.namespaces",                                                                                                                          
        "type_comments": [
            "() -> Iterator",
            "() -> Iterator[Tuple[str, rdflib.term.URIRef]]"
        ],
        "samples": 2051
    },

@aucampia aucampia added the type hints Relating to type hints or type aliases label Jul 29, 2022
@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented Jul 29, 2022

@aucampia That is an informative view of the inertia of that type signature. However, I believe there are good reasons to upset it.

In particular, I'm concerned with the interface between the .namespaces method and Graph.bind(). .bind hasn't received type review yet. Store.bind() types namespace as a URIRef. As I noted on another thread this morning, I'm not so familiar with the Store code tree. However, I do make frequent use of Graph.bind.

I personally expect .namespaces to give me objects that feed into Graph.bind, and I further expect the objects from .namespaces to be Namespaces because I would like to be able to use the [] or . operators to reference terms within a namespace.

I've seen significant confusion arise from these concepts being muddied, some back since the pre-RDF days of XML:

  • A namespace
  • A string prefix
  • An ontology IRI

Each of these is still a distinct concept. Some current technologies make confusions of them all come data-serialization time. FOAF ends its ontology IRI with a slash, which OWL permits and which allows a handy and inoffensive confusion of ontology IRI, namespace, and string prefix. But I'm particularly thinking of JSON-LD's context dictionary that uses string prefixes, which are not necessarily namespaces. I encountered this when dealing with the differences in forward-slash parsing between SPARQL, Turtle, and JSON-LD to handle IANA Media types and their slashes. (IANA Media Types don't have an IANA-provide ontology to my knowledge, but Dublin Core Terms provides http://purl.org/dc/terms/IMT, and a "namespace for MIME media types vocabulary" from the Publishing Metadata documnt: <http://purl.org/NET/mediatypes/>.)

So, I wish to argue for precision in the type designation for .namespaces(), and let the type effects bubble out from there.

If you agree with this way forward, I will put some effort into the type designation with a PR.

@aucampia
Copy link
Member

aucampia commented Aug 15, 2022

In particular, I'm concerned with the interface between the .namespaces method and Graph.bind(). .bind hasn't received type review yet. Store.bind() types namespace as a URIRef. As I noted on another thread this morning, I'm not so familiar with the Store code tree. However, I do make frequent use of Graph.bind.

I personally expect .namespaces to give me objects that feed into Graph.bind, and I further expect the objects from .namespaces to be Namespaces because I would like to be able to use the [] or . operators to reference terms within a namespace.

Store, NamespaceManager and Graph will have more or less complete type signatures with #2080 - the type signature for Graph.bind() is:

rdflib/rdflib/graph.py

Lines 1191 to 1197 in 97f88d3

def bind(
self,
prefix: Optional[str],
namespace: Any, # noqa: F811
override: bool = True,
replace: bool = False,
) -> None:

In here, the namespace has type Any, but for a very good reason which is more or less related to your concern here.

Currently we have namespaces occurring with the following types:

  • str
  • URIRef
  • Namespace
  • ClosedNamespace
  • DefinedNamespace

The class diagram for these types is:

class diagram

If not for DefinedNamespace, all classes used to represent namespaces would derive from str, but because DefinedNamespace does not there is no common superclass, other than object which is the base class for all new style classes.

However, all these classes do have __str__, and all values of namespace passed to Graph.bind() gets converted to URIRef by first calling str() on them (which calls __str__) and then by using that to construct a URIRef:

def bind(
self,
prefix: Optional[str],
namespace: Any,
override: bool = True,
replace: bool = False,
) -> None:
"""Bind a given namespace to the prefix
If override, rebind, even if the given namespace is already
bound to another prefix.
If replace, replace any existing prefix with the new namespace
"""
namespace = URIRef(str(namespace))

So really, anything can be passed as the namespace argument to Graph.bind() and it will work, maybe there are some corner cases where we will raise an exception for invalid URIs, but I think currently we just warn for invalid URIs.

But this is also why the the following three methods return URIRef, and not Namespace:

If you pass a Namespace object to Graph.bind(), it won't be stored as a Namespace, but as a URIRef, and if we change the typing of Graph.namespaces() to be Namespace instead of URIRef, then the outcome will be that code will pass mypy validation but cause exceptions at runtime, there is no way under normal operation that you will get a it to return a Namespace object. You could maybe get something if you add it to Store implementations directly, but that will only work for some store implementations, and won't work for example if you use berkelydb:

def namespaces(self) -> Generator[Tuple[str, "URIRef"], None, None]:
cursor = self.__namespace.cursor()
results = []
current = cursor.first()
while current:
prefix, namespace = current
results.append((prefix.decode("utf-8"), namespace.decode("utf-8")))
# Hack to stop 2to3 converting this to next(cursor)
current = getattr(cursor, "next")()
cursor.close()
for prefix, namespace in results:
yield prefix, URIRef(namespace)

It would be better if Graph.namespaces() did return a Namespace, but this would require changes in more than type hints, it would require that we change

namespace = URIRef(str(namespace))

to something like

        namespace = Namespace(str(namespace))

(and some additional changes)

We should look at doing this in a future version of RDFLib, however we need quite a big overhaul of our public interfaces in general, as shared in some other places, I think the right approach here is that we should design a new Graph interface addressing most of the problems that we know of in rdflib._experimental, and then once we are happy with the interface redesign we can proceed to release version 7. The alternative will be that every release becomes a major version release, as there is many things in the interface that needs fixing and doing it all in one PR will make for one very hard to review PR, and it is we will likely have some fluctuations in each version as we try and stick everything together.

We should also really have architectural decision records for interface breakages, and ideally there should be a deprecation notice or warning in at least one version with some option for adopting the new behaviour.

@ajnelson-nist
Copy link
Contributor Author

I had a sneaking feeling there would be some reason due to technology breadth.

Thanks to your thorough explanation---and thanks for it!---I agree with this needing to be in a 7.0.0 release.

Is there a development branch where such effects are being pooled, or is there some new module _experimental.py where these will be pooled in the master branch?

@aucampia aucampia added enhancement New feature or request backwards incompatible will affect backwards compatibility, this includes things like breaking our interface labels Aug 15, 2022
@aucampia
Copy link
Member

aucampia commented Aug 15, 2022

Is there a development branch where such effects are being pooled, or is there some new module _experimental.py where these will be pooled in the master branch?

I will try to find time to make an ADR this weekend to define _experimental and the approach to interface breaking changes in general, once that is in place I will make an ADR for new interfaces. In general we should try and approach the problem similar to collections.abc with fairly minimal base base classes that provide very specific functionality so we can easily mix and match them.

But even with this we should be as conservative as possible with interface breakages, one other option here is to change Namespace and DefinedNamespace to inherit from URIRef, and then we can actually change the code just fine without breaking interfaces.

Ideally there should be issues for all problems with the existing interfaces, and most of them should be labelled as backwards incompatible and enhancement, I did not label this with backwards incompatible though as there may be ways to fix this without breaking backwards compatibility.

@aucampia aucampia removed the backwards incompatible will affect backwards compatibility, this includes things like breaking our interface label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concept: namespace enhancement New feature or request type hints Relating to type hints or type aliases
Projects
None yet
Development

No branches or pull requests

2 participants