Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quirks of the /links endpoint #217

Closed
CasperWA opened this issue Dec 2, 2019 · 10 comments · Fixed by #273
Closed

Quirks of the /links endpoint #217

CasperWA opened this issue Dec 2, 2019 · 10 comments · Fixed by #273
Labels
misc/help wanted Extra attention is needed status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. status/no-consensus There is no consensus on this issue, it needs to be further discussed and developed. topic/info-endpoint topic/sub-databases Discusses topics related to having multiple (sub)-DBs
Milestone

Comments

@CasperWA
Copy link
Member

CasperWA commented Dec 2, 2019

I have found some quirks in the spec concerning the /links endpoint and the Index meta-database, while in the process of adding these to the test server in optimade-python-tools.

Quirky things the specification states about /links (either ex- or implicitly):

  • It is not considered an entry listing endpoint, but still has a list of resource objects.
  • The resource objects' types are not plural (like all other resource objects).
  • Only allowed query parameters are those equal to a single entry endpoint. This means that there is no filtering in the /links endpoint, nor is there a way to handle pagination. (This actually makes for some weird work-arounds in the implementation).
  • Should the providers from providers.json be supplied here for all OPTiMaDe implementations/databases? Or do we perhaps recommend to only show them in the Index meta-database?

For the Index meta-database:

  • Under /info one MAY provide a default database using relationships. However, this is done under the non-existing default relationship-type:

    // ...
    "relationships": {
      "default": {
        "data": {"type": "child", "id": "test_server"}
      }
    }
    // ...

    This should probably be renamed to reflect the endpoint where this related resource may be found, i.e., links, or it should be renamed to child to reflect the related type. I am not completely sure which is the most appropriate according to JSON API. However, I would suggest links.

@CasperWA CasperWA added misc/help wanted Extra attention is needed status/no-consensus There is no consensus on this issue, it needs to be further discussed and developed. status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/info-endpoint topic/sub-databases Discusses topics related to having multiple (sub)-DBs labels Dec 2, 2019
@CasperWA CasperWA added this to To do in Road to OPTIMADE 1.0 via automation Dec 2, 2019
@rartino rartino added this to the 1.0 release milestone Dec 2, 2019
@rartino
Copy link
Contributor

rartino commented Dec 2, 2019

Good catch, this definitely needs a closer look.

It is not considered an entry listing endpoint, but still has a list of resource objects.

I realize our own definitions do not make this distinction, but in my mind, there is a division between (1) "data that originates from the underlying database and thus pertains to materials in some way" vs. (2) "other data, i.e., protocol and metadata". To me, Entries were meant to exclusively be about data of type (1), whereas the links endpoint serves data of type (2).

But now that I look at the definitions, I see that they do not explicitly back this model up. But, this model would explain why links isn't an Entry listing endpoint, because a links or link object isn't an Entry type.

I'm reluctant to require implementations to have to support filtering, pagination, etc. on the links endpoint for now, because for implementations that don't store the list of links in their backend database, this requires more work to implement. Right now, links can be served as static data. But I wouldn't be opposed to optionally allowing these operations.

Under /info one MAY provide a default database using relationships. However, this is done under the non-existing default relationship-type

I agree that it would be more consistent with links as the relationship-type.

@CasperWA
Copy link
Member Author

CasperWA commented Dec 3, 2019

It is not considered an entry listing endpoint, but still has a list of resource objects.

I realize our own definitions do not make this distinction, but in my mind, there is a division between (1) "data that originates from the underlying database and thus pertains to materials in some way" vs. (2) "other data, i.e., protocol and metadata". To me, Entries were meant to exclusively be about data of type (1), whereas the links endpoint serves data of type (2).

I agree with this sentiment, and merely wanted to highlight the inconsistencies/shortcomings of the spec in this regard.

I'm reluctant to require implementations to have to support filtering, pagination, etc. on the links endpoint for now, because for implementations that don't store the list of links in their backend database, this requires more work to implement. Right now, links can be served as static data. But I wouldn't be opposed to optionally allowing these operations.

This makes sense. In the optimade-python-tools implementation it would, however, be much simpler to simply support filtering, pagination, etc.
So having it be optional to use all query parameters vs. only the ones for a single-entry endpoint, which is currently the case, would be very helpful.

@rartino
Copy link
Contributor

rartino commented Dec 3, 2019

About your comment on the relationship example inside the info/ endpoint.

To my best understanding, the string you want to change is, in JSON API, meant to be a short identifier for the nature of the relationship, much like the field name of an attribute. I.e., I'm not sure "links" makes a lot of sense here, because then there wouldn't be any place to indicate that this indicates the default child database. (We do indeed use the type name for this field in relationships between database-related Entries, but it is my interpretation that it is short for saying the nature of the relationship is "a generic relationship to a references object".)

But now that you highlight this, "default" is very non-descriptive. Shouldn't it be, e.g., "default_child_database"?

Nevertheless, based on your comments here, maybe we need to revisit our use of object types with regards to links? Given our otherwise rather structured association between types and endpoints, it is annoying that one would have to just know that objects of type child refers to "links to child databases", and are looked up under links/.

Would it have been better to name the types, e.g., links:child and links:parent? Or, let all these objects be of type links, and indicate the nature of the link in the meta field instead?, i.e.,

{
  "data": [
    {
      "type": "links",
      "id": "index",
      "attributes": {
        "nature": "parent",
        "name": "Index",
        "description": "Index for example's OPTiMaDe databases",
        "base_url": "http://example.com/optimade/index"
      }
    }]
}

@CasperWA
Copy link
Member Author

Would it have been better to name the types, e.g., links:child and links:parent? Or, let all these objects be of type links, and indicate the nature of the link in the meta field instead?, i.e.,

{
  "data": [
    {
      "type": "links",
      "id": "index",
      "attributes": {
        "nature": "parent",
        "name": "Index",
        "description": "Index for example's OPTiMaDe databases",
        "base_url": "http://example.com/optimade/index"
      }
    }]
}

I like this - although there is some discrepancy between the text and your example here. Putting it in attributes doesn't seem like the right place to me, since it is indeed meta data about the link, not related to the actual content of the link, i.e., I suggest to change your example here to match your text:

{
  "data": [
    {
      "type": "links",
      "id": "index",
      "attributes": {
        "name": "Index",
        "description": "Index for example's OPTiMaDe databases",
        "base_url": "http://example.com/optimade/index"
      },
      "meta": {
        "nature": "parent"
      }
    }
  ]
}

The field/property name may be different from "nature" in the final form, but otherwise, I think this is a good solution.

@CasperWA
Copy link
Member Author

Concerning the last point in my OP:

  • Should the providers from providers.json be supplied here for all OPTiMaDe implementations/databases? Or do we perhaps recommend to only show them in the Index meta-database?

In Materials-Consortia/providers#18 a happy misunderstanding has led to a solution: Introduce another allowed value for the "nature" field (according to the example above) that is "provider_list" or similar.
This would make the total and only allowed values for the "nature" field: "parent", "child", "provider", "provider_list".

@rartino
Copy link
Contributor

rartino commented Feb 20, 2020

I had actually forgotten about this discussion when I wrote essentially the same suggestion in Materials-Consortia/providers#18. At least I'm consistent.

Looking at this, I'm seriously considering that it is worth it to try to push through a patch for this link-type mess before v1.0.0. Because changing these link types is not backward compatible, and will need a major version bump.

The naming we have here isn't super great; we are suggesting to introduce jsonapi resource objects of type links, while there simultaneously are jsonapi "links objects" which are something different. But we probably need to live with that, as we do not want to move the /links endpoint.

Also for nature, I'd be happy if we find a better name for that field.

But lets discuss the attributre vs. meta question. Why is nature not an attribute of the links resource object? If we think of links resource objects as representing a relationship with other databases, isn't the nature of that relationship an attribute? I should add here that my mental model of a links resource object is less in the URL sense and more in the mathematical sense in a connected graph.

@CasperWA
Copy link
Member Author

Looking at this, I'm seriously considering that it is worth it to try to push through a patch for this link-type mess before v1.0.0. Because changing these link types is not backward compatible, and will need a major version bump.

Fully agree that this will require a major version bump, i.e., better to get it in before the final release of v1.

The naming we have here isn't super great; we are suggesting to introduce jsonapi resource objects of type links, while there simultaneously are jsonapi "links objects" which are something different. But we probably need to live with that, as we do not want to move the /links endpoint.

Yeah. At least we would call them links resource objects - however, it is a bit worrisome how these two distinct concepts may easily be confused.

Also for nature, I'd be happy if we find a better name for that field.

How about relation, direction (graph-like, I guess), parentage, origin, link_type (although that would again name-clash a bit), heritage, ... hmm?

But lets discuss the attributes vs. meta question. Why is nature not an attribute of the links resource object? If we think of links resource objects as representing a relationship with other databases, isn't the nature of that relationship an attribute? I should add here that my mental model of a links resource object is less in the URL sense and more in the mathematical sense in a connected graph.

I think that's the issue. I see it not as representing the link (or edge) itself, but rather the databases (or nodes). I.e., attributes should only contain properties describing the database (or node), while the meta or relationship field may hold related data.
A node's edges are not an inherent attribute of the node (right?), they are separate entities, but may be related.
In terms of it being meta or relationship, I favour meta, since relationship implies the databases (or nodes) are related to other resources in the current implementation, which is not the case - it can only be related to other links resources, but such relationships are not considered in JSON API, if I understand it correctly. There relationships have to do with compound documents, where resources in one collection (or endpoint) are related to other resources in another collection (or endpoint).

@rartino
Copy link
Contributor

rartino commented Feb 21, 2020

Yeah. At least we would call them links resource objects - however, it is a bit worrisome how these two distinct concepts may easily be confused.

We have to carefully look over the language we use in these sections so that we distinctively phrase it as "jsonapi link objects" vs. "links resource objects", and maybe even add a clarifying note about this. Are we in agreement that links are not entries? (Because entries must stem from materials database data.)

How about

  • relation: leads to unnecessary term confusion with jsonapi relationships
  • direction: links represent more than the direction of edges
  • parentage: also too limited in scope
  • origin: say what? :-)
  • link_type: formally ok, looks awkward
  • heritage: too limited
  • ... hmm?: indeed...

I see it not as representing the link (or edge) itself, but rather the databases (or nodes). I.e., attributes should only contain properties describing the database (or node), while the meta or relationship field may hold related data. A node's edges are not an inherent attribute of the node (right?), they are separate entities, but may be related.

I would buy what you are saying if we had named the endpoint and objects databases/, base_urls, implementations or something like that. But we ended up with links which term-wise clearly (at least to me) includes the edges in what it refers to as objects.

If we go for the alternative design you seem to suggest, we really should rather have a ~ databases/ endpoint that lists all databases-type objects this API implementation knows and their attributes. Then, I would strongly argue the databases meta field is not the right place for properties related to the edges. Rather, all edge properties really should be part of jsonapi relationships (possibly using the relationship meta field.) It would then seem logical to mandate a 'self' databases object so that similar info as we now return from /links would be given by retriving /databases/self and look at the relationships returned to other databases objects. The other databases objects could then be included using the jsonapi included feature, or we could say that the client needs to fetch them using /databases/<id>.

This is a very consistent design. But it is also more complex than maintaining a list of links objects the way we have and I think it makes it more complicated for clients to deal with these responses, for API implementations to handle them, and to maintain an index meta-database.

So why don't we instead just accept that a links object represents an adge from the present database to another database, and its attributes include both information pertaining to the edge and the node the edge goes to? (i.e. a "link" formally represents the combination of the edge and the node it connects to.)

such relationships are not considered in JSON API [...] resources in one collection (or endpoint) [must be] related to other resources in another collection

I don't think so?... Can you cite the text you read to say that one is not allowed to include a jsonapi relationship with an object of the same type as the object having the relationship?

@CasperWA
Copy link
Member Author

Yeah. At least we would call them links resource objects - however, it is a bit worrisome how these two distinct concepts may easily be confused.

We have to carefully look over the language we use in these sections so that we distinctively phrase it as "jsonapi link objects" vs. "links resource objects", and maybe even add a clarifying note about this. Are we in agreement that links are not entries? (Because entries must stem from materials database data.)

Yes, I agree with this sentiment.

So why don't we instead just accept that a links object represents an adge from the present database to another database, and its attributes include both information pertaining to the edge and the node the edge goes to? (i.e. a "link" formally represents the combination of the edge and the node it connects to.)

You have convinced me in this. We should probably also include this "definition" in the spec.

such relationships are not considered in JSON API [...] resources in one collection (or endpoint) [must be] related to other resources in another collection

I don't think so?... Can you cite the text you read to say that one is not allowed to include a jsonapi relationship with an object of the same type as the object having the relationship?

Indeed, you're right, see json-api/json-api#1065.

@giovannipizzi
Copy link
Contributor

Some "final" agreement from the discussion held today:

  • change to type="links" for all links, adding a new attribute: link_type.
  • We define the following link_types:
    • child: links to internal (i.e. same-provider) OPTIMADE DBs ("child”). Requirement: within the same provider, used if you have sub-DBs.
    • root: links to the root of the same provider internal DBs ("root”) (for our list of providers at providers.optimade.org this will be an index meta-db, but from the point-of-view of the specification it doesn’t need to be, it just needs to be an OPTIMADE implementations). Only one per provider; at any level of children, you can point back to it.
    • external: link to an external OPTIMADE DBs; optional, you can link to any other implementation.
    • providers: link to a provider list, i.e. any provider list that indexes you, e.g. providers.optimade.org

Note: all URLs MUST always point to unversioned base_urls.

Note: provider lists will use external to link to (typically) the root index meta-DBs.

Note: we remove parent: it gives high risk of inconsistency and we do not have a use case of browsing internal DBs upwards. You can always go back to the root and go down with child again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
misc/help wanted Extra attention is needed status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. status/no-consensus There is no consensus on this issue, it needs to be further discussed and developed. topic/info-endpoint topic/sub-databases Discusses topics related to having multiple (sub)-DBs
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants