-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alt_identifiers need to be aggregated #522
Comments
Ah! Sorry Both solutions work for us from VizieR point of view, as long as the verbose option of describe() gets the information. BUT: There was a feature request during the last IVOA for a DOI criteria, if I remember well from the shared document. Maybe it's an occasion to add that in? Then alt_identifier would be an optional joined table when this criteria is selected and would be called like get_contact in every other situation? |
On Thu, Feb 08, 2024 at 08:19:07AM -0800, Manon Marchand wrote:
Both solutions work for us from VizieR point of view, as long as
the verbose option of describe() gets the information.
Ok... I suppose I'd prefer an extra network request in
describe/verbose to hitting alt_identifier in every registry search
for now. We can always switch later when we have a better idea of
why people would look at the alternate ids.
BUT: There was a feature request during the last IVOA for a DOI
criteria, if I remember well from the shared document. Maybe it's
an occasion to add that in? Then alt_identifier would be an
I'd say an AltIdentifier constraint class should go in in a separate
PR. As usual, there are a few technicalities to ponder, in
particular as to identifier normalisation (i.e., turning
https?://(dx\.)?doi.org/(?<doi>.*) into doi:<doi> and *perhaps* do
that the other way round for ORCIDs), but we can do that in a bug or
in the PR.
Oh, and since RegTAP 1.0 doesn't have alt_identifier, you should add
a check for the table's presence in get_search_condition as in, say,
rtcons.Temporal.
optional table when this criteria is selected and would be called
like get_contact in every other situation?
The constraint is independent of what columns are selected, and we
shouldn't change the column set based on the constraints; that would
make a really confusing API.
As long as we don't have strong indications that folks do request
alternate ids for a significant percentage of the records, I'd say
we're fine having a separate, network-requesting method to retrieve
them.
|
Hi, |
On Fri, Feb 09, 2024 at 12:56:43AM -0800, Manon Marchand wrote:
Do you want to do this and I work on #521 ?
Sounds good.
|
This is supposed to address bug astropy#522. The code as it was up to here would have needed aggregation of alt_identifiers (which are n:1 over resources), or else we see duplicate capabilities. But at least some registry operators prefer to not hit the rr.alt_identifier table by default as long as it's not clear who will actually look at these alternate identifiers. But we maintain the alt identifiers in describe(); to do that, there's now get_alternate_identifiers method returning these. The downside: describe() now does an uncached network query. Perhaps we want to at least hide failures from there? On the other hand, once we are here we can also call get_contact() here; should we?
On Fri, Feb 09, 2024 at 04:27:11AM -0800, Renaud Savalle wrote:
Indeed, I believe that such a constraint would be a useful feature
to answer one of the use cases gathered for our thematic session in
Tucson, cf slide 3 of
https://wiki.ivoa.net/internal/IVOA/InterOpNov2023Registry/interop_202311_pyvo.pdf
Hm... thinking about this a bit I think I'd like to see a bit more
flesh on the use case sketched there with:
I read in a paper that a dataset I am interested in using
has DOI XXXX. How do I search the registry for that?
You see, it says "data set", and I suspect that's what whoever wrote
this is actually thinking about: DOIs on an image or whatever.
Leaving aside my personal opinion that that's not a good idea: That's
not something the Registry can solve (by itself). This would need
conventions where to put such DOIs (presumably in obscore) and then a
global dataset search for obscore records (say) mentinoning the DOI.
The DOIs we have are DOIs for Data Collections. It *is* conceivable
that people will put such a collection DOI into a paper, but then I'd
be very surprised if folks went to the VO Registry to resolve that
DOI when they can use the various services on doi.org that they probably
already know *about* as well (I give you that these won't point to
the VO services machine-readably, so that *might* be an argument).
All in all... I'd say that before we actually build something that
will probably mislead quite a few people, let's first make sure we
properly understand what it is that the original requester wanted.
|
This is supposed to address bug astropy#522. The code as it was up to here would have needed aggregation of alt_identifiers (which are n:1 over resources), or else we see duplicate capabilities. But at least some registry operators prefer to not hit the rr.alt_identifier table by default as long as it's not clear who will actually look at these alternate identifiers. But we maintain the alt identifiers in describe(); to do that, there's now get_alternate_identifiers method returning these. The downside: describe() now does an uncached network query. Perhaps we want to at least hide failures from there? On the other hand, once we are here we can also call get_contact() here; should we?
Commit 93d264d started retrieving alt_identifiers in RegTAP queries. I missed the resulting problem, too, but that introduces a rather ugly regression, in particular with the roughly simultaneous change of lax=False on get_service.
Consider:
This used to work without problems and still should work, because the resource only has one conesearch capability. Alas, the record has multiple alt_identifiers, and thus suddenly multiple capabilities appear to be present after 93d264d, which then causes a
There are various ways to fix this. One would be to remove the alt_identifier from the GROUP_BY clause and ivo_string_agg it. A bit of care is necessary so the thing ends up as a proper python list in the RegistryResource attribute, but that's not hard.
However, on re-thinking the business I have to say that I think at least at this point alt_identifiers are of too little operational relevance to actually make it worth involving the alt_identifiers table for every discovery query. So, I'd actually prefer if we changed things so that there is a get_alt_identifiers method analogous to the current get_contact, i.e., the thing hits the Registry when people actually need the information and not every time. As Registry operator, I'd obviously like that a lot; I'd argue that the slightly reduced complexity and faster queries would make everyone else win, too.
Sorry I've not noticed this earlier. @ManonMarchand, what do you (does CDS) think? I can implement both, but I'd much prefer if you'd be ok with the second option.
The text was updated successfully, but these errors were encountered: