Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name for union of URIRef and BNode? #1526

Closed
ajnelson-nist opened this issue Dec 21, 2021 · 17 comments · Fixed by #1680
Closed

Name for union of URIRef and BNode? #1526

ajnelson-nist opened this issue Dec 21, 2021 · 17 comments · Fixed by #1680
Labels
enhancement New feature or request

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Dec 21, 2021

I frequently need to do type checks for Nodes that are not Literals, mainly to catch programming errors where the "Thing vs String" logic may have been lax. I'm casting around for a name that either could be defined in the Node subclass hierarchy as an abstract class, or instead could be defined as a Union. My recollection is there isn't an obvious name from RDFS for this.

@aucampia suggested Subject in this comment:

Subject = typing.Union[rdflib.BNode, rdflib.URIRef]

I have a mental TODO item to review the definition of RDF Resource to see if that meets the need, but I'd like to see if someone has already found an appropriate term for "not-Literal".

@aucampia aucampia added the enhancement New feature or request label Dec 21, 2021
@nicholascar
Copy link
Member

SHACL's nodeKind property has an option of sh:BlankNodeOrIRI.

Unless we have alternative logic - I don't think we do - we should adopt parallel classes/categories

@nicholascar
Copy link
Member

Ah ha, I see you already referenced these SHACL classes in PR #1515!

So to answer more directly:

I'd like to see if someone has already found an appropriate term for "not-Literal"

RDF calls non-literal, local IDd, nodes blank nodes and thinsk with URIs named nodes of course.

Extending on this, HexTuples calls BN nodes localIds and URI nodes globalId nodes.

So I think we can go with IdentifiedNode then BNs are locally-scope identifiers, named nodes universally IDd (URN, URI. IRI etc., potentially even IPFS protocol name...)

@ajnelson-nist
Copy link
Contributor Author

IdentifiedNode looks like a good term among the above options. SHACL's BlankNodeOrIRI is my close second, but I could see not wanting to encode the dangling question "Wait, why are there URIRefs, but BlankNodeOrIRIs?

Once the name is settled, what is the maintainer preference for how to implement this - an abstract class, or a Union type?

@nicholascar
Copy link
Member

Yes, the URI/IRI etc. naming issues motivated IdentifiedNode, and the fact that Literals are completely no identified.

I prefer IdentifiedNode as a class name than the compounded BlankNodeOrIRI for aesthetic reasons for, I suppose, the BlankNodeOrIRI is perfectly practical!

I don't know all the pros and cons of Union types, so my natural inclination would be to go with a "normal" Python abstract class, but if you or anyone else can indicate potential benefits to using a Union class approach, I'm happy to hear those.

@aucampia
Copy link
Member

aucampia commented Dec 22, 2021

IdentifiedNode sounds good to me also. If it can be done meaningfully with an abstract base class that would be beneficial, as a Union is something only the type system understands, won't provide any meaning at runtime and can't be extended by users. But ideally the abstract base class should then present as a Node itself and I'm not sure how well that will work out, and if it is too much of a hassle it may be best to just go for a Union.

However, I think we should be careful in naming the Unions so that we don't consume names with Unions that we may later want to use for runtime types or normal classes, if we make IdentifiedNode a union, then later on changing it to an abstract base class may not be ideal. So maybe, for purely typing constructs such as Union, it is best to place them in rdflib.typing, e.g. rdflib.typing.IdentifiedNode.

So in summary, my preference:

  1. rdflib.term.IdentfiedNode as an abstract base class
  2. rdflib.typing.IdentifiedNode as a Union if the first option does not work.

@nicholascar
Copy link
Member

Let's go with @aucampia's suggestion above.

I think that all of the type checking that @ajnelson-nist wants can be done with isinstance(node_x, IdentifiedNode) right, as that will cover subclasses? So that will likely work. Then the hierarchy is something like:

            Node
          /     \
IdentifiedNode    Literal
     /    \
URIRef     BNode

But just a check: we already have a class Identifier as a superclass of URIRef, BNode and Literal! I can't really understand (from a 2 second look) why Literal should extend Identifier, so maybe we just need to change Literal to extending Node in the code base. If such an extension breaks lots of things, this change, if we really want it, can be for RDFlib 7.0.0, along with a few other breaking change. But perhaps it won't actually break much.

My initial test with Literal(Node) and Literal(str) failed quickly, but this is probably because tests are looking for those return type, not because there's something fundamentally wrong with Literal extending either Node or str.

@ajnelson-nist
Copy link
Contributor Author

One other point in favor of a subclass, instead of a Union type - we can put a docstring onto a subclass.

I'd also wondered why Identifier was a superclass of Literal. Is that actually a bug? If it is, my original request might become a bugfix instead of a feature addition.

@nicholascar
Copy link
Member

It’s not a bug but it is a strange - in my opinion - design choice! I suspect it was just easier to implement than what we are talking about now. There are many cases too in RDFlib where a patter was implemented and then broken by others later, so we may be seeing that here. We can definitely improve this!

@niklasl
Copy link
Member

niklasl commented Dec 27, 2021

I think Identifier is to distinguish those node types from Graph, which is a Node but not an Identifier. (At first I thought it was to distinguish them from Variable, but that is also an Identifier.)

@nicholascar
Copy link
Member

Thanks @niklasl. I think we should probably implement the change suggested here and definitely draw up the Python class model here in UML for RDFLib documentation. Better to have these structures more obvious to people

@ghost
Copy link

ghost commented Dec 27, 2021

definitely draw up the Python class model here in UML for RDFLib documentation

Couldn't resist ...
classes

ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Jan 18, 2022
This patch adds and exposes an intermediary class `IdentifiedNode` as a
superclass of `URIRef` and `BNode`.  From review of the subclass methods
for identical implementations, two appeared to be able to move into this
superclass.

This patch addresses at least some of Issue 1526.

References:
* RDFLib#1526

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Jan 18, 2022
This patch adds and exposes an intermediary class `IdentifiedNode` as a
superclass of `URIRef` and `BNode`.  From review of the subclass methods
for identical implementations, two appeared to be able to move into this
superclass.

Thanks to Nicholas Car for finding this pragmatic name.

This patch addresses at least some of Issue 1526.

References:
* RDFLib#1526

Cc: Nicholas Car <nicholas.car@surroundaustralia.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

PR 1680 filed.

I don't quite understand how Variable should interface with this class. Should Variable also be a subclass of IdentifiedNode? I ask thanks to @gjhiggins 's UML diagram suggesting its sibling-class nature.

Also, if you want a UML diagram to be included in the PR, I'm not sure where you'd like that. Anyone, please feel free to push on top of the PR's branch. I have the "Allow edits and access to secrets by maintainers" box checked, so please let me know if that doesn't quite work out.

@aucampia
Copy link
Member

I think we can do the diagram separately, not sure variable should inherit from the identified node, but I may be wrong about it.

@niklasl
Copy link
Member

niklasl commented Jan 19, 2022

Yes, the point here is to single out the "real" URIRef:s and their "stand-in" local (on the paper) BNode (ID):s here. So Variable is out by definition (as it is an unbound / existential quantifier).

Some notes:

  1. URIRef exists because IRI:s wheren't standardized yet when RDF 1.0 was. I believe that they should just be called IRI:s nowadays.
  2. The RDF JS community has settled on NamedNode as the name for what many call IRI (and BlankNode for BNodes).

I personally did find the naming of "nodes" (NamedNode and BlankNode) a bit irksome in the triple view of RDF, as I see nodes as parts of graphs, i.e. something represented by a set of triples. But you can argue both ways about which level of abstraction RDFLib is, and that my pursuit here is of an "unattainable perfection". (Also, we do have the notion of rdflib.Resource for a more "tangible" or "object-oriented" view of the graph, above triples.)

(I may prefer the names IRI and BlankNodeID with a base like NodeID on this level, since the BNode in RDFLib carries a blank node identifier for the concrete syntaxes, but we shouldn't rename all the things in an historically long-lived code base. And I'm quite comfortable with rdflib.BNode, so I'm inconsequential here anyway.)

Possibly though, from this, I retain a slight preference on my part for NodeIdentifier over the suggested IdentifiedNode, not because it ryhmes better with BNode, but with the "Identifier" hiding in both URIref and IRI. But given the existence of the (perhaps misnamed) Identifier, and Node above that, it might not clear up the confusion that much.

Unfortunately though, nor does IdentifiedNode...

Perhaps ResourceIdentifier would be better (which is either (globally) unique or blank)? 🤷

@aucampia
Copy link
Member

Regarding the diagram, I think the best option is to add https://github.com/sphinx-contrib/kroki so that we can put this in rst file:

.. kroki::

    @startuml
    skinparam shadowing false
    skinparam packageStyle rectangle
    
    
    class Node
    
    class Identifier {
      eq(other)
      neq(other)
      startswith(prefix, start, end)
    }
    Identifier -up-|> Node
    
    
    class BNode {
     (...)
    }
    BNode -up-|> Identifier
    
    class Literal {
     (...)
    }
    Literal -up-|> Identifier
    
    class URIRef {
     (...)
    }
    URIRef -up-|> Identifier
    
    
    class Variable {
     (...)
    }
    Variable -up-|> Identifier
    
    
    class Genid
    Genid -up-|> URIRef
    
    
    class RDFLibGenid
    RDFLibGenid -up-|> Genid
    @enduml

And then get output like this in generated HTML:

_

@aucampia
Copy link
Member

I will add this in a separate PR

@nicholascar
Copy link
Member

I will add this in a separate PR

This looks like a nice enhancement @aucampia! I really do think a class diagram will make a big difference for understanding the toolkit and I'm surprise we didn't think of it before...

aucampia added a commit to aucampia/rdflib that referenced this issue Jan 22, 2022
The diagram is based on the diagram created by
Graham Higgins (@gjhiggins) in RDFLib#1526

This shows the class heirarchy of various terms such as Identifier,
IdentifiedNode, URIRef, Literal, etc.

The diagram is in [plantuml](https://plantuml.com/class-diagram) and
compiled to svg by the
[kroki extension for sphinx](https://github.com/sphinx-contrib/kroki).

Diagrams can be rendered from the plantuml at https://kroki.io/.

Other changes:
 - Some updates to the "Writing RDFLib Documentation" page.
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Mar 2, 2022
A follow-on patch will regenerate the Make-managed file.

References:
* RDFLib/rdflib#1526

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants