Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorize astroquery.esa.hsa & HSA.download_data #3004

Open
jkrick opened this issue May 8, 2024 · 3 comments
Open

vectorize astroquery.esa.hsa & HSA.download_data #3004

jkrick opened this issue May 8, 2024 · 3 comments
Labels

Comments

@jkrick
Copy link

jkrick commented May 8, 2024

I would like to query the Herschel archive for ~thousands of spectra based on position (maybe a million one day??). Right now I have skycoords in a table for my sample, but in order to do HSA.query_hsa_tap() I have to do a for loop over them all. It would be nice if table_upload were supported, or any method of vectorizing that query.

Secondly, to download the data, it would be nice if HSA.download_data were vectorized to handle input of multiple observation_ids.

I see this is a more specific version of #682 . But don't worry, one day I'll ask for vectorizing the other archives too.

@keflavich
Copy link
Contributor

@jkrick does the HSA archive allow multi-position queries? If it does, it is possible to support this - though we'd need help implementing it.

@bsipocz
Copy link
Member

bsipocz commented May 9, 2024

cc @jespinosaar

@bsipocz bsipocz added the esa.hsa label May 9, 2024
@jespinosaar
Copy link
Contributor

Dear @jkrick , many thanks for your feedback.

I have been checking the options available and, indeed, table_upload is not currently supported. In the Archive UI (https://archives.esac.esa.int/hsa/whsa/) you can see we can upload a list of targets, but in the end what we are doing is simply resolving them, extracting their coordinates and then generating a query with several OR clauses, one for each pair of coordinates.

On the other hand, I really like the idea of vectorizing the methods included in the different modules, but please bear in mind also the limitations on the server and the DB about the length of the queries and the contents of the requests (thinking on millions of different elements). I think that searching for such amount of data is easier if you just execute a for loop over a table of results, so the results are extracted one by one. You can control what is done between each iteration and it is easier and faster for the server to handle small requests.

Please let me know if you have further doubts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants