Hsa support #2122

lvalerom · 2021-07-19T18:47:32Z

Added new astroquery submodule called:

astroquery.esa.hsa

This module allows access to the ESA Herschel mission.

pep8speaks · 2021-07-19T18:47:36Z

Hello @lvalerom! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-14 13:39:49 UTC

codecov · 2021-07-19T18:55:58Z

Codecov Report

Merging #2122 (9143453) into main (b787ee2) will increase coverage by 0.18%.
The diff coverage is 35.66%.

@@            Coverage Diff             @@
##             main    #2122      +/-   ##
==========================================
+ Coverage   62.98%   63.17%   +0.18%     
==========================================
  Files         131      132       +1     
  Lines       17059    17187     +128     
==========================================
+ Hits        10745    10858     +113     
- Misses       6314     6329      +15

Impacted Files	Coverage Δ
astroquery/esa/hsa/core.py	`35.66% <35.66%> (ø)`
astroquery/utils/tap/taputils.py	`67.85% <0.00%> (-11.69%)`	⬇️
astroquery/alma/utils.py	`18.12% <0.00%> (-11.29%)`	⬇️
astroquery/mast/missions.py	`76.54% <0.00%> (-2.91%)`	⬇️
astroquery/atomic/core.py	`35.77% <0.00%> (-1.12%)`	⬇️
astroquery/cadc/core.py	`77.38% <0.00%> (ø)`
astroquery/exceptions.py	`0.00% <0.00%> (ø)`
astroquery/esa/iso/core.py	`64.58% <0.00%> (ø)`
astroquery/mast/services.py	`79.56% <0.00%> (ø)`
astroquery/utils/tap/core.py	`42.69% <0.00%> (+0.10%)`	⬆️
... and 4 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

keflavich

The code mostly looks fine, with a few small corrections here and there.

However, the documentation is targeted too much at a technical audience - not at astronomers. There need to be examples of how to find observation IDs based on a position on the sky or some other astronomical criteria. Can that be achieved through this interface?

Is there any way, through the HSA, to access large program data, like the HiGal data?

astroquery/hsa/core.py

keflavich · 2021-12-05T21:14:39Z

astroquery/hsa/core.py

+                        verbose=False,
+                        cache=True, **kwargs):
+        """
+        Download observation from Herschel


Could you say more here - what does it mean to 'download observations'? Are we downloading FITS images? Raw timestream data? Level X processed files? All of the above?

I wasn't sure if this function was going to be useful or just confusing.

You can achieve the same result using the download_data function with the retrieval_type='OBSERVATION' (which is the default value). Maybe we could delete this function completely.

Answering the question, it retrieves all the products associated with an observation (mostly FITS files).

This might be useful, but it's necessary to explain what "all the products associated with an observation" is likely to mean. Up to you, though.

lvalerom · 2022-01-06T13:18:24Z

Thank you @keflavich for your review.

The way to get observation IDs with this interface is through the TAP (query_hsa_tap). For example, you can make a query to the v_active_observation table and get the observation IDs from there. Once you have them, you can download the products using the download_data or get_observation functions.

I can provide an example of the entire process but I don't know where to put it. Should I add another section in the hsa.rst file?

As far as, I know you cannot access HiGAL data from the HSA. I don't have much experience with that so I might be mistaken.

keflavich · 2022-01-06T13:21:52Z

I can provide an example of the entire process but I don't know where to put it. Should I add another section in the hsa.rst file?

yes!

keflavich

Sorry it took me a while to get to the review.

This looks pretty good internally - I think the code is fine.

The user-facing documentation could use some improvement, though, and we would like an astroquery-compatible API - so, for the "query a region on the sky" TAP query, we want a query_region function.

keflavich · 2022-02-02T12:58:02Z

astroquery/hsa/core.py

+        -The auxiliary directory: contains all Herschel non-science spacecraft data
+        -The calibarion directory: contains the uplink and downlink calibration products
+        -<obs_id> directory: contains the science data distributed in sub-directories called level0/0.5/1/2/2.5/3.


check how this renders in the docs build; I think you might need different formatting for this to look nice. But I don't remember my .rst well enough to be sure - it might be OK too.

keflavich · 2022-02-02T12:58:58Z

docs/hsa/hsa.rst

+Herschel Science Archive (`astroquery.hsa`)
+*******************************************
+
+Herschel was the fourth cornerstone in ESA's Horizon 2000 science programme, designed to observe the 'cool' universe. 


Should there be a link to the Herschel mission page here?

keflavich · 2022-02-02T12:59:37Z

docs/hsa/hsa.rst

+-------------------------------
+2. Getting Observation Product
+-------------------------------


Suggested change

-------------------------------

2. Getting Observation Product

-------------------------------

-------------------------------

2. Getting Observation Products

-------------------------------

keflavich · 2022-02-02T13:00:28Z

docs/hsa/hsa.rst

+This will download the product of the observation '1342195355' with the instrument 'PACS' and 
+it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API.


it would be helpful to add a few words about what a user can expect to find in an observation tarball, or at least, a link to where one can find that information

Not a problem, I've added a link.

keflavich · 2022-02-02T13:01:23Z

docs/hsa/hsa.rst

+This will download the product of the observation '1342195355' with the instrument 'PACS' and 
+it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API.


This needs more explanation of how "Observation Products" are different from "data". Is it just the interface that's different, or are the products different?

I've added a few words explaining this.

keflavich · 2022-02-02T13:02:11Z

docs/hsa/hsa.rst

+		'http://archives.esac.esa.int/hsa/whsa/'
+
+-------------------------------
+3. Getting Herschel Postcard


This section also needs description - what is a postcard? Is it a "cutout" or "quicklook"? What does that mean? Is it usable for science? Again, this could be answered with a link to the right part of the archive, but it would be helpful to have descriptive text here

I've added a brief description.

keflavich · 2022-02-02T13:02:32Z

docs/hsa/hsa.rst

+------------------------------------------
+
+This function provides access to the Herschel Science Archive database using the Table Access Protocol (TAP) and via the Astronomical Data
+Query Language (ADQL).


Can we link to the description of ADQL?

keflavich · 2022-02-02T13:04:27Z

docs/hsa/hsa.rst

+
+  >>> from astroquery.hsa import HSA
+  >>>
+  >>> HSA.query_hsa_tap("select top 10 observation_id from hsa.v_active_observation where contains(point('ICRS', hsa.v_active_observation.ra, hsa.v_active_observation.dec),circle('ICRS', 100.2417,9.895, 1.1))=1", output_format='csv', output_file='results.cvs')


We should have a wrapper function implemented to do this, e.g.:

def query_observations(coordinate, radius, n_obs=10): ...

to match the general astroquery user API.

bsipocz

I have a mixture of comments, some are of things that need to be fixed, others are more advisory or nitpicky.

bsipocz · 2022-02-25T07:30:24Z

CHANGES.rst

+hsa
+^^^
+
+- New module to access ESA Herschel mission. [#]


needs the PR number

bsipocz · 2022-02-25T07:31:38Z

astroquery/hsa/__init__.py

@@ -0,0 +1,29 @@
+# Licensed under a 3-clause BSD style license - see LICENSE.rst
+"""
+HSA


Please move this module under the esa namespace

bsipocz · 2022-02-25T07:36:50Z

astroquery/hsa/core.py

+                      filename=None,
+                      verbose=False,
+                      cache=True,
+                      **kwargs):


please list in the signature all the kwargs you added in the docstring. I also wonder what else could be hidden under **kwargs? If it's not a huge list I think it's better to list them all.

Also, minor preference, I think it's nicer not to list everything in a new line, but I'm ok with if it stays as is.

I've added the parameters in the docstring to the signature. I wouldn't add the rest of them because IMO there are a lot see

bsipocz · 2022-02-25T07:39:27Z

astroquery/hsa/core.py

+            filename = os.path.splitext(filename)[0]
+
+        if retrieval_type is None:
+            retrieval_type = "OBSERVATION"


I think it's better to define the defaults in the signature, I don't see any practical use here for setting it to None

Makes sense. I've changed it.

bsipocz · 2022-02-25T07:43:25Z

astroquery/hsa/core.py

+        link += "".join("&{0}={1}".format(key, val)
+                        for key, val in kwargs.items())


These can be in one line, it's easier to read, and having ~100char long lines is totally fine

bsipocz · 2022-02-25T08:18:20Z

astroquery/hsa/tests/test_hsa_remote.py

+from .test_hsa import TestHSA
+
+
+class TestHSARemote(TestHSA):


Why do you need to subclass here?

It's not needed anymore since we're not using the dummyTapHandler anymore.

bsipocz · 2022-02-25T08:20:43Z

astroquery/hsa/tests/test_hsa_remote.py

+                      'observation_id': obs_id,
+                      'instrument_name': "PACS"}
+        expected_res = obs_id + ".tar"
+        hsa = HSAClass(self.get_dummy_tap_handler())


I don't exactly understand why you need the dummy here and below, these remote test should ideally test using direct connection to the archive, just as a user would do it.

You're right, I think I copy-pasted from the other tests and didn't change it. It's fixed now.

bsipocz · 2022-02-25T08:23:45Z

astroquery/hsa/tests/test_hsa_remote.py

+        res = hsa.download_data(**parameters)
+        assert res == expected_res
+        assert os.path.isfile(res)
+        os.remove(res)


Please use a temp_dir type pytest fixture, as these files are not cleaned up in case there is a test failure. There are examples for this approach is a few other modules, e.g in esasky

I'm using tempfile.TemporaryDirectory() now and I've added a download_dir parameter to all the functions.

bsipocz · 2022-02-25T08:26:45Z

astroquery/hsa/core.py

+        else:
+            self._tap = tap_handler
+
+    def download_data(self, *, retrieval_type=None,


users may want to be able to set the download directory (e.g download_dir in esasky), but still let the filenames figured out automatically. Would be nice if this method and a few below would support it.

Added download_dir

bsipocz · 2022-02-25T08:32:19Z

astroquery/hsa/tests/test_hsa_remote.py

@@ -0,0 +1,117 @@
+# Licensed under a 3-clause BSD style license - see LICENSE.rst


These remote tests are very slow (I have let them run for 30mins, 2 failed one success, and cancelled the remaining), I'm not exactly sure about the reasons. If possible please use obs_ids that belongs to smaller files, etc.
While these tests are not running in CI for pull requests, we run them weekly from CI cron, and also run them manually to check for regression, it's really important that they would run in a reasonably quick timescale.

I've changed the observations that were being downloaded to smaller ones and I've also changed the query so it only downloads one product_level. Now the tests take ~10 seconds.

bsipocz · 2022-02-25T08:34:29Z

Oh, and also, please rebase both to resolve the minor conflicts with the changelog and the docs index page, and also to pick up the latest CI matrix. Thanks!

Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

bsipocz · 2022-04-13T18:49:05Z

@lvalerom - I've pushed commits to fix the docs build as well as the remote data doctests.

bsipocz

My only comment that would be great to be addressed in this PR is about the skipping of remote tests when the retries fail. I like that retry is built in, but I feel it would be better to not skip those tests, but I can be convinced otherwise if you would comment on the assumptions that went into the current setup.

The rest of my comments are nitpicks, that don't really need to be cleanup now, or point beyond this PR.

bsipocz · 2022-04-13T18:58:16Z

astroquery/esa/hsa/core.py

+
+        return filename
+
+    def query_hsa_tap(self, query, *, output_file=None,


This points outside the scope of this PR, but I this it would make sense API wise it would be better to rename all these query_xyz_tap methods to query_tap. What do you guys, @lvalerom @jespinosaar, think?

I think it makes sense. I would change it in another PR along with the other missions though.

Yes, that should definitely be done in a separate PR, as the other modules need to be renamed using proper deprecations.

bsipocz · 2022-04-13T18:59:25Z

astroquery/esa/hsa/tests/dummy_handler.py

+        if method == self._invokedMethod:
+            return
+        else:
+            raise ValueError("Method '{}' is not invoked. (Invoked method \


nitpick: no linebreaks are needed in these types of strings

I could be wrong but I think these ValueError were not properly formatted.

ValueError("Method '{}' is not invoked. (Invoked method \ is '{}'.)").format(method, self._invokedMethod) # Should be ValueError("Method '{}' is not invoked. (Invoked method " "is '{}'.)".format(method, self._invokedMethod)) # Notice the parenthesis

I've noticed that this is happening in other missions. I can open a PR to fix this if you would like.

Yes, a PR would be great. Also, these can be now changed to use f-strings, that way there are fewer parens, and a bit more easily legible message.

I've fixed the formatting in this module. I'll open a different PR to address the other modules.

bsipocz · 2022-04-13T18:59:51Z

astroquery/esa/hsa/core.py

+from pathlib import Path
+
+from astropy import units as u
+from ...utils import commons


no need for relative import here

bsipocz · 2022-04-13T19:01:15Z

astroquery/esa/hsa/core.py

+    timeout = conf.TIMEOUT
+
+    def __init__(self, tap_handler=None):
+        super(HSAClass, self).__init__()


nitpick: we only support recent enough pythons, so this can be simplified to super().__init__(). A ton of the modules hasn't been cleaned up, so no need to do it in this PR.

bsipocz · 2022-04-13T19:02:31Z

astroquery/esa/hsa/tests/setup_package.py

@@ -0,0 +1,14 @@
+# Licensed under a 3-clause BSD style license - see LICENSE.rst


There are no data files for this module, so no need for this file either.

bsipocz · 2022-04-13T19:04:41Z

astroquery/esa/hsa/tests/test_hsa_remote.py

+        hsa = HSAClass()
+        res = self.access_archive_with_retries(hsa.download_data, parameters)
+        if res is None:
+            pytest.skip("Archive broke the connection {} times, unable to test".format(self.retries))


Here, and everywhere below: I wonder whether it would be more useful to let this case break the test? Otherwise it would be difficult to pick up if there is a genuine issue (or would you always expect a non None return?)

While I was testing these last time I noticed that the connection with the archive was very unstable and, at least for large files, a lot times the connection was broken by the archive. The reason why I decided to skip the tests if the archive is unresponsive is to avoid flakiness in the tests. We always expect a non None return from these functions (the filename) so it should be ok to check for a None value to determine if the connection was broken.

Are there any smaller files maybe to work with?

Also, if this stays as is, maybe xfail is more appropriate than skip?

I agree, xfail suits better here. I've changed it.

Since the changes to make these tests faster I haven't noticed any connection problems. I would still keep the tests with xfail just in case (I don't think it hurts to have it 🙂)

bsipocz · 2022-04-15T15:50:29Z

I was going through @keflavich comments to see whether all got addressed and the only remaining one is about the lack of query_region.
I wonder whether in fact query_observations should/could be renamed, after all it takes a coordinate and a radius and returns a Table. Nothing else is specified in the API, e.g. those returned in the table can be either objects from a catalogue, but also observation ID as in this case.

On the other hand, I see that query_observations was in fact suggested by Adam. So I basically need you @keflavich to double-check whether this is in fact what you prefer. (note: while the naming is logical, we don't have query_observations method anywhere else)

Other than getting this clarified, the PR is ready for merge. Test coverage is up at 85% when the remote tests are run, so ignore the codecov complaint above.

keflavich · 2022-04-15T19:36:38Z

OK, reviewing my comments, I did indeed suggest query_observation and query_region both.

I think query_observation makes sense as-is: it is just getting observation IDs.

I suggeset we add query_region that does nearly the same thing, but instead of

"select top {n_obs} observation_id from hsa.v_active_observation "

it should do the default TAP thing of:

"select top {n_obs} * from hsa.v_active_observation "

i.e., query_region should be a coordinate-based query that defaults to *. I'll add a suggested commit for this.

keflavich · 2022-04-15T19:46:22Z

suggestion added in lvalerom#1

bsipocz · 2022-04-15T19:50:45Z

OK. So, In that case I would suggest we go ahead merge this and you open a PR against main, it would provide a cleaner branch structure, etc (I'm not able to merge the PR that is against the fork).

bsipocz · 2022-04-15T19:59:29Z

Thanks @lvalerom! I'll open issues as reminders for the ideas during the review that were pointing beyond this PR.

bsipocz added New Service to review labels Jul 19, 2021

keflavich requested changes Dec 5, 2021

View reviewed changes

lvalerom requested a review from keflavich January 28, 2022 18:16

keflavich reviewed Feb 2, 2022

View reviewed changes

bsipocz requested changes Feb 25, 2022

View reviewed changes

lvalerom and others added 14 commits March 12, 2022 22:24

hsa_support init

22976b4

hsa support: products and postcards download

6089362

hsa support: documentation

a676975

hsa support: fixing tabulation problem in docs

84bd49a

hsa_support: added documentation

e266db2

minor change in CHANGES.rst

47bf121

added two line after HSAClass definition to complain with pep8 E305

b9a0b8d

Update astroquery/hsa/core.py

4481c55

Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

Update astroquery/hsa/core.py

23bbf48

Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

Update astroquery/hsa/core.py

b1271bb

Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

change InputWarning to ValueError

cc9d950

procedure example in documentation

0c7bd91

addressing review

753dbc8

addressing review

b3c3b03

lvalerom force-pushed the hsa_support branch from 6ece699 to b3c3b03 Compare April 2, 2022 19:47

lvalerom added 2 commits April 2, 2022 23:06

style

0382c12

fixing docs

c30e59e

bsipocz added this to the v0.4.7 milestone Apr 5, 2022

review comments

842f2da

lvalerom and others added 3 commits April 9, 2022 17:29

style

7419d9a

Fix doctests

4d6e353

Fix sphinx links and headings

b4daf03

bsipocz approved these changes Apr 13, 2022

View reviewed changes

review comments

aa4078c

lvalerom requested review from keflavich and bsipocz April 13, 2022 22:20

fix string formatting, and change skip to xfail

9143453

lvalerom mentioned this pull request Apr 15, 2022

Fix string formatting in "dummy handlers" #2361

Merged

bsipocz merged commit 82136f4 into astropy:main Apr 15, 2022

bsipocz mentioned this pull request Apr 15, 2022

Rename query_xyz_tap and query() methods to query_tap #2363

Open

4 tasks

		This will download the product of the observation '1342195355' with the instrument 'PACS' and
		it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API.

		link += "".join("&{0}={1}".format(key, val)
		for key, val in kwargs.items())

		@@ -0,0 +1,117 @@
		# Licensed under a 3-clause BSD style license - see LICENSE.rst


		return filename

		def query_hsa_tap(self, query, *, output_file=None,

		@@ -0,0 +1,14 @@
		# Licensed under a 3-clause BSD style license - see LICENSE.rst

Hsa support #2122

Hsa support #2122

Conversation

lvalerom commented Jul 19, 2021 • edited

pep8speaks commented Jul 19, 2021 • edited

Comment last updated at 2022-04-14 13:39:49 UTC

codecov bot commented Jul 19, 2021 • edited

Codecov Report

keflavich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvalerom commented Jan 6, 2022

keflavich commented Jan 6, 2022

keflavich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvalerom Feb 5, 2022 • edited

Choose a reason for hiding this comment

bsipocz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsipocz commented Feb 25, 2022

bsipocz commented Apr 13, 2022

bsipocz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsipocz commented Apr 15, 2022

keflavich commented Apr 15, 2022

keflavich commented Apr 15, 2022

lvalerom commented Jul 19, 2021 •

edited

pep8speaks commented Jul 19, 2021 •

edited

codecov bot commented Jul 19, 2021 •

edited

lvalerom Feb 5, 2022 •

edited