Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alt_identifiers need to be aggregated #522

Closed
msdemlei opened this issue Feb 8, 2024 · 6 comments
Closed

alt_identifiers need to be aggregated #522

msdemlei opened this issue Feb 8, 2024 · 6 comments

Comments

@msdemlei
Copy link
Contributor

msdemlei commented Feb 8, 2024

Commit 93d264d started retrieving alt_identifiers in RegTAP queries. I missed the resulting problem, too, but that introduces a rather ugly regression, in particular with the roughly simultaneous change of lax=False on get_service.

Consider:

from pyvo import registry

rscs = registry.search(
    registry.Freetext("quasar"),
    registry.UCD("src.redshift"))
svc = rscs["III/175"]
svc.get_service("conesearch")

This used to work without problems and still should work, because the resource only has one conesearch capability. Alas, the record has multiple alt_identifiers, and thus suddenly multiple capabilities appear to be present after 93d264d, which then causes a

ValueError: Multiple matching interfaces found. If you know that they are all equivalent, use 'lax=True'.
Otherwise, the search can be constrained by specifying a 'service_type' or a 'keyword' to be found in the service description. You might also want to see all the available services with `pyvo.registry.regtap.RegistryResource.list_services()`.

There are various ways to fix this. One would be to remove the alt_identifier from the GROUP_BY clause and ivo_string_agg it. A bit of care is necessary so the thing ends up as a proper python list in the RegistryResource attribute, but that's not hard.

However, on re-thinking the business I have to say that I think at least at this point alt_identifiers are of too little operational relevance to actually make it worth involving the alt_identifiers table for every discovery query. So, I'd actually prefer if we changed things so that there is a get_alt_identifiers method analogous to the current get_contact, i.e., the thing hits the Registry when people actually need the information and not every time. As Registry operator, I'd obviously like that a lot; I'd argue that the slightly reduced complexity and faster queries would make everyone else win, too.

Sorry I've not noticed this earlier. @ManonMarchand, what do you (does CDS) think? I can implement both, but I'd much prefer if you'd be ok with the second option.

@ManonMarchand
Copy link
Member

ManonMarchand commented Feb 8, 2024

Ah! Sorry

Both solutions work for us from VizieR point of view, as long as the verbose option of describe() gets the information.

BUT: There was a feature request during the last IVOA for a DOI criteria, if I remember well from the shared document. Maybe it's an occasion to add that in? Then alt_identifier would be an optional joined table when this criteria is selected and would be called like get_contact in every other situation?

@msdemlei
Copy link
Contributor Author

msdemlei commented Feb 9, 2024 via email

@ManonMarchand
Copy link
Member

Hi Markus,

Got it, let's opt for a get_contact-style request then. The constraint feature can come later. I think it was asked by @rsav ?

Do you want to do this and I work on #521 ?

@rsav
Copy link

rsav commented Feb 9, 2024

Hi,
Indeed, I believe that such a constraint would be a useful feature to answer one of the use cases gathered for our thematic session in Tucson, cf slide 3 of https://wiki.ivoa.net/internal/IVOA/InterOpNov2023Registry/interop_202311_pyvo.pdf
Thanks for your work in developing this interface which continues to be a priority in the Registry WG roadmap !
Renaud.

@msdemlei
Copy link
Contributor Author

msdemlei commented Feb 9, 2024 via email

msdemlei added a commit to msdemlei/pyvo that referenced this issue Feb 12, 2024
This is supposed to address bug astropy#522.  The code as it was up to here
would have needed aggregation of alt_identifiers (which are n:1 over
resources), or else we see duplicate capabilities.

But at least some registry operators prefer to not hit the rr.alt_identifier
table by default as long as it's not clear who will actually look at these
alternate identifiers.

But we maintain the alt identifiers in describe(); to do that, there's now
get_alternate_identifiers method returning these.

The downside: describe() now does an uncached network query.  Perhaps
we want to at least hide failures from there?  On the other hand, once
we are here we can also call get_contact() here; should we?
@msdemlei
Copy link
Contributor Author

msdemlei commented Feb 12, 2024 via email

msdemlei added a commit to msdemlei/pyvo that referenced this issue Feb 15, 2024
This is supposed to address bug astropy#522.  The code as it was up to here
would have needed aggregation of alt_identifiers (which are n:1 over
resources), or else we see duplicate capabilities.

But at least some registry operators prefer to not hit the rr.alt_identifier
table by default as long as it's not clear who will actually look at these
alternate identifiers.

But we maintain the alt identifiers in describe(); to do that, there's now
get_alternate_identifiers method returning these.

The downside: describe() now does an uncached network query.  Perhaps
we want to at least hide failures from there?  On the other hand, once
we are here we can also call get_contact() here; should we?
@msdemlei msdemlei closed this as completed Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants