Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release register - unexpected entity #159

Open
marqh opened this issue Jun 28, 2022 · 5 comments
Open

release register - unexpected entity #159

marqh opened this issue Jun 28, 2022 · 5 comments

Comments

@marqh
Copy link
Member

marqh commented Jun 28, 2022

Hello Registry folks

I've got a slightly unexpected behaviour from the registry-core that i'd like to explore with you if I may?

i am trying to create a companion register to another internal register within a registry
the aim of the companion is to release control a subset of a register

register-main contains all of the registered entites, with defining URIs

register-release-1 contains registerItems for a subset of the entites registered, which were included within release-1

to enable this, content payloads of the form:

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix reg: <http://purl.org/linked-data/registry#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .

</gr/gr2/m--4/4.2/2-0-192> a skos:Concept ;
	rdfs:label "primary deciduous trees"@en ;
.
<> a reg:RegisterItem ;
	reg:notation "2-0-192" ;
	reg:definition [ reg:entity </gr/gr2/m--4/4.2/2-0-192> ] ;
	.

are created, and then uploaded as POSTs to /gr/gr2/m--4/release-1/4.2/

this works neatly, as each registeItem is then defined as the entity in the main register, and 'download' dowloads all of the release-included entities from the main register. The URIs are always main reg URIs and no release-1 is included in entity URIs

this all seems pretty neat, useful and manageable
(all feedback on design intent welcome)

However, there's a behaviour that trips up content testing in its current form, which i'd be interested in some validation / feedback on

in POSTing the presented content to the registry, the RegisterItem /gr/gr2/m--4/release-1/4.2/_2-0-192 is created as expected

However, an entity also appears: /gr/gr2/m--4/release-1/4.2/2-0-192

the intent was that this URI would not exist, and return 404
we do not want these URI patterns with the release-1/ fragment inside them used to reference entities

we can't work out how this entity has appeared, it's not easy to find as the containing register and the registerItems link to the main register entities

we'd be really interested in any advice or insight into how we could deliver our aim of tagged release registers referencing a continuously updating main register and how we could upload relevant payloads without accidentally enabling hidden entity creation, which is then a problem for testing and for usage guidance

many thanks
mark

(@der @simonoakesepimorphics)

@marqh
Copy link
Member Author

marqh commented Jun 28, 2022

i have also tried uploading a payload f the form

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix reg: <http://purl.org/linked-data/registry#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .

</gr/gr2/m--4/4.2/2-0-192> a skos:Concept ;
	rdfs:label "primary deciduous trees"@en ;
.
<_2-0-192> a reg:RegisterItem ;
	reg:definition [ reg:entity </gr/gr2/m--4/4.2/2-0-192> ] ;
	.

which appears to exhibit the same behaviour as the payload above

der added a commit that referenced this issue Jun 30, 2022
There are two cases to handle.
This patch handles the case where the request has no flags
by validating entity graph to check external refs.

If the request is _view=with_metadata then we also have to run
a similar check. However, a number of existing test cases
currently fail because they (accidentally) use external items.

Partially addresses: #159
@der
Copy link
Contributor

der commented Jun 30, 2022

Hi @marqh

We understand the requirement and it seems like a reasonable approach.

We've done an initial investigation about what's going on (thanks @simonoakesepimorphics) and the behaviour is due to an unfortunate interaction with the Graph Registration feature.

The idea of that feature is to allow a complete entity graph to be registered and retrievable as a single item. In that case the whole graph is stored together and there is no separate entity in the default graph to retrieve.

So when retrieving an entity if there is no entity but a corresponding item which points to an entity graph then the API currently returns that, though no actual entity exists in the store, on the assumption it was a graph registration.

The current implementation of that feature doesn't check if the entity graph contains a matching entity description so it applies even in your case where the description graph exists but is of a different entity and wasn't a graph registration.

After some investigation fixing that in the simple case is straightforward (see #161).

With that in place then a plain GET on the entity will return a 404.

The problem comes if you GET {entity-url}?_view=with_metadata which is what the UI will do if visiting the entity url in a browser (or anything that conegs to html). That uses a slightly different code path. Fixing that is itself easy (see the comment in the PR). However, the problem that arises is that a rather large number of the existing test cases in TestAPI turn out to use registry URLs which don't match the registry base used in the tests. See for example number-eight-post-notation.ttl, whereas the base URL for the registry in the base test class is http://location.data.gov.uk/. This has not been apparent up till now because, with the current behaviour of returning the entity anyway, then expect values are returned so long as the relative path is correct.

Resolving all those test issues is no doubt possible but more work than we can easily squeeze in the gaps between current commitments.

So a key question is whether this partial fix would actually be sufficient for you? If so we could release that and create a separate ticket to work on the fuller solution when time/budgets permit.

Afterthought Perhaps all the uses of the dyndns alias can be regarded as legacy and a bulk rewrite of the entire test/expected data files might be possible.

@der
Copy link
Contributor

der commented Jul 1, 2022

After further investigation of the after thought it's not as simple as a bulk rewrite of http://ukgovld-registry.dnsalias.net/ in the test cases to http://location.data.gov.uk/. There are other assumptions about the target register structure in the test cases. Those tests work because these are being treated as external entities and the current "return the entity anyway" behaviour kicks in, allowing the test to pass without us realising the discrepancies. So a manual walk through of each of the test cases would be required.

@marqh
Copy link
Member Author

marqh commented Sep 22, 2022

many thanks @der @simonoakesepimorphics
(with usual apologies for the lag in response)

i have managed to evaluate the limitation on the partial solution within #161
this is already somewhat of an edge case, but a risk for confusion and further concern

the behaviour of the direct GET is the key response, if a direct GET 404s then the entity is recognised as not existing

on this basis, we'd be really please to have #161 merged and packaged into a release

the secondary fix for human viewing is worthwhile and would be valued by us, but we recognise that fixing up tests takes focused effort and that can wait for some more time

so, a merge of #161 and an aspiration to fix in future effort would be great for us
many thanks
mark

@simonoakesepimorphics
Copy link
Contributor

We have merged #161 and released version 2.3.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants