Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hsa support #2122

Merged
merged 22 commits into from
Apr 15, 2022
Merged

Hsa support #2122

merged 22 commits into from
Apr 15, 2022

Conversation

lvalerom
Copy link
Member

@lvalerom lvalerom commented Jul 19, 2021

Added new astroquery submodule called:

astroquery.esa.hsa

This module allows access to the ESA Herschel mission.

@pep8speaks
Copy link

pep8speaks commented Jul 19, 2021

Hello @lvalerom! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-14 13:39:49 UTC

@codecov
Copy link

codecov bot commented Jul 19, 2021

Codecov Report

Merging #2122 (9143453) into main (b787ee2) will increase coverage by 0.18%.
The diff coverage is 35.66%.

@@            Coverage Diff             @@
##             main    #2122      +/-   ##
==========================================
+ Coverage   62.98%   63.17%   +0.18%     
==========================================
  Files         131      132       +1     
  Lines       17059    17187     +128     
==========================================
+ Hits        10745    10858     +113     
- Misses       6314     6329      +15     
Impacted Files Coverage Δ
astroquery/esa/hsa/core.py 35.66% <35.66%> (ø)
astroquery/utils/tap/taputils.py 67.85% <0.00%> (-11.69%) ⬇️
astroquery/alma/utils.py 18.12% <0.00%> (-11.29%) ⬇️
astroquery/mast/missions.py 76.54% <0.00%> (-2.91%) ⬇️
astroquery/atomic/core.py 35.77% <0.00%> (-1.12%) ⬇️
astroquery/cadc/core.py 77.38% <0.00%> (ø)
astroquery/exceptions.py 0.00% <0.00%> (ø)
astroquery/esa/iso/core.py 64.58% <0.00%> (ø)
astroquery/mast/services.py 79.56% <0.00%> (ø)
astroquery/utils/tap/core.py 42.69% <0.00%> (+0.10%) ⬆️
... and 4 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Contributor

@keflavich keflavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code mostly looks fine, with a few small corrections here and there.

However, the documentation is targeted too much at a technical audience - not at astronomers. There need to be examples of how to find observation IDs based on a position on the sky or some other astronomical criteria. Can that be achieved through this interface?

Is there any way, through the HSA, to access large program data, like the HiGal data?

astroquery/hsa/core.py Outdated Show resolved Hide resolved
astroquery/hsa/core.py Outdated Show resolved Hide resolved
astroquery/hsa/core.py Outdated Show resolved Hide resolved
astroquery/hsa/core.py Outdated Show resolved Hide resolved
astroquery/hsa/core.py Outdated Show resolved Hide resolved
verbose=False,
cache=True, **kwargs):
"""
Download observation from Herschel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say more here - what does it mean to 'download observations'? Are we downloading FITS images? Raw timestream data? Level X processed files? All of the above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if this function was going to be useful or just confusing.

You can achieve the same result using the download_data function with the retrieval_type='OBSERVATION' (which is the default value). Maybe we could delete this function completely.

Answering the question, it retrieves all the products associated with an observation (mostly FITS files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be useful, but it's necessary to explain what "all the products associated with an observation" is likely to mean. Up to you, though.

@lvalerom
Copy link
Member Author

lvalerom commented Jan 6, 2022

Thank you @keflavich for your review.

The way to get observation IDs with this interface is through the TAP (query_hsa_tap). For example, you can make a query to the v_active_observation table and get the observation IDs from there. Once you have them, you can download the products using the download_data or get_observation functions.

I can provide an example of the entire process but I don't know where to put it. Should I add another section in the hsa.rst file?

As far as, I know you cannot access HiGAL data from the HSA. I don't have much experience with that so I might be mistaken.

@keflavich
Copy link
Contributor

I can provide an example of the entire process but I don't know where to put it. Should I add another section in the hsa.rst file?

yes!

Copy link
Contributor

@keflavich keflavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me a while to get to the review.

This looks pretty good internally - I think the code is fine.

The user-facing documentation could use some improvement, though, and we would like an astroquery-compatible API - so, for the "query a region on the sky" TAP query, we want a query_region function.

Comment on lines 134 to 136
-The auxiliary directory: contains all Herschel non-science spacecraft data
-The calibarion directory: contains the uplink and downlink calibration products
-<obs_id> directory: contains the science data distributed in sub-directories called level0/0.5/1/2/2.5/3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how this renders in the docs build; I think you might need different formatting for this to look nice. But I don't remember my .rst well enough to be sure - it might be OK too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/hsa/hsa.rst Outdated
Herschel Science Archive (`astroquery.hsa`)
*******************************************

Herschel was the fourth cornerstone in ESA's Horizon 2000 science programme, designed to observe the 'cool' universe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a link to the Herschel mission page here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

docs/hsa/hsa.rst Outdated
Comment on lines 39 to 41
-------------------------------
2. Getting Observation Product
-------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-------------------------------
2. Getting Observation Product
-------------------------------
-------------------------------
2. Getting Observation Products
-------------------------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

docs/hsa/hsa.rst Outdated
Comment on lines 33 to 34
This will download the product of the observation '1342195355' with the instrument 'PACS' and
it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be helpful to add a few words about what a user can expect to find in an observation tarball, or at least, a link to where one can find that information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, I've added a link.

docs/hsa/hsa.rst Outdated
Comment on lines 51 to 52
This will download the product of the observation '1342195355' with the instrument 'PACS' and
it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more explanation of how "Observation Products" are different from "data". Is it just the interface that's different, or are the products different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few words explaining this.

docs/hsa/hsa.rst Outdated
'http://archives.esac.esa.int/hsa/whsa/'

-------------------------------
3. Getting Herschel Postcard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section also needs description - what is a postcard? Is it a "cutout" or "quicklook"? What does that mean? Is it usable for science? Again, this could be answered with a link to the right part of the archive, but it would be helpful to have descriptive text here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a brief description.

docs/hsa/hsa.rst Outdated
------------------------------------------

This function provides access to the Herschel Science Archive database using the Table Access Protocol (TAP) and via the Astronomical Data
Query Language (ADQL).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to the description of ADQL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

docs/hsa/hsa.rst Outdated

>>> from astroquery.hsa import HSA
>>>
>>> HSA.query_hsa_tap("select top 10 observation_id from hsa.v_active_observation where contains(point('ICRS', hsa.v_active_observation.ra, hsa.v_active_observation.dec),circle('ICRS', 100.2417,9.895, 1.1))=1", output_format='csv', output_file='results.cvs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a wrapper function implemented to do this, e.g.:

def query_observations(coordinate, radius, n_obs=10):
     ...

to match the general astroquery user API.

Copy link
Member Author

@lvalerom lvalerom Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it 👍

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a mixture of comments, some are of things that need to be fixed, others are more advisory or nitpicky.

CHANGES.rst Outdated
hsa
^^^

- New module to access ESA Herschel mission. [#]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs the PR number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,29 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst
"""
HSA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this module under the esa namespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

filename=None,
verbose=False,
cache=True,
**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please list in the signature all the kwargs you added in the docstring. I also wonder what else could be hidden under **kwargs? If it's not a huge list I think it's better to list them all.

Also, minor preference, I think it's nicer not to list everything in a new line, but I'm ok with if it stays as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the parameters in the docstring to the signature. I wouldn't add the rest of them because IMO there are a lot see

filename = os.path.splitext(filename)[0]

if retrieval_type is None:
retrieval_type = "OBSERVATION"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to define the defaults in the signature, I don't see any practical use here for setting it to None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've changed it.

Comment on lines 91 to 92
link += "".join("&{0}={1}".format(key, val)
for key, val in kwargs.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be in one line, it's easier to read, and having ~100char long lines is totally fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from .test_hsa import TestHSA


class TestHSARemote(TestHSA):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to subclass here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed anymore since we're not using the dummyTapHandler anymore.

'observation_id': obs_id,
'instrument_name': "PACS"}
expected_res = obs_id + ".tar"
hsa = HSAClass(self.get_dummy_tap_handler())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly understand why you need the dummy here and below, these remote test should ideally test using direct connection to the archive, just as a user would do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I think I copy-pasted from the other tests and didn't change it. It's fixed now.

res = hsa.download_data(**parameters)
assert res == expected_res
assert os.path.isfile(res)
os.remove(res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a temp_dir type pytest fixture, as these files are not cleaned up in case there is a test failure. There are examples for this approach is a few other modules, e.g in esasky

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using tempfile.TemporaryDirectory() now and I've added a download_dir parameter to all the functions.

else:
self._tap = tap_handler

def download_data(self, *, retrieval_type=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users may want to be able to set the download directory (e.g download_dir in esasky), but still let the filenames figured out automatically. Would be nice if this method and a few below would support it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added download_dir

@@ -0,0 +1,117 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These remote tests are very slow (I have let them run for 30mins, 2 failed one success, and cancelled the remaining), I'm not exactly sure about the reasons. If possible please use obs_ids that belongs to smaller files, etc.
While these tests are not running in CI for pull requests, we run them weekly from CI cron, and also run them manually to check for regression, it's really important that they would run in a reasonably quick timescale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the observations that were being downloaded to smaller ones and I've also changed the query so it only downloads one product_level. Now the tests take ~10 seconds.

@bsipocz
Copy link
Member

bsipocz commented Feb 25, 2022

Oh, and also, please rebase both to resolve the minor conflicts with the changelog and the docs index page, and also to pick up the latest CI matrix. Thanks!

@bsipocz bsipocz added this to the v0.4.7 milestone Apr 5, 2022
@bsipocz
Copy link
Member

bsipocz commented Apr 13, 2022

@lvalerom - I've pushed commits to fix the docs build as well as the remote data doctests.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment that would be great to be addressed in this PR is about the skipping of remote tests when the retries fail. I like that retry is built in, but I feel it would be better to not skip those tests, but I can be convinced otherwise if you would comment on the assumptions that went into the current setup.

The rest of my comments are nitpicks, that don't really need to be cleanup now, or point beyond this PR.


return filename

def query_hsa_tap(self, query, *, output_file=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points outside the scope of this PR, but I this it would make sense API wise it would be better to rename all these query_xyz_tap methods to query_tap. What do you guys, @lvalerom @jespinosaar, think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. I would change it in another PR along with the other missions though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should definitely be done in a separate PR, as the other modules need to be renamed using proper deprecations.

if method == self._invokedMethod:
return
else:
raise ValueError("Method '{}' is not invoked. (Invoked method \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: no linebreaks are needed in these types of strings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I think these ValueError were not properly formatted.

ValueError("Method '{}' is not invoked. (Invoked method \
                   is '{}'.)").format(method, self._invokedMethod)
# Should be
ValueError("Method '{}' is not invoked. (Invoked method "
                  "is '{}'.)".format(method, self._invokedMethod))
# Notice the parenthesis

I've noticed that this is happening in other missions. I can open a PR to fix this if you would like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a PR would be great. Also, these can be now changed to use f-strings, that way there are fewer parens, and a bit more easily legible message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the formatting in this module. I'll open a different PR to address the other modules.

from pathlib import Path

from astropy import units as u
from ...utils import commons
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for relative import here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

timeout = conf.TIMEOUT

def __init__(self, tap_handler=None):
super(HSAClass, self).__init__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we only support recent enough pythons, so this can be simplified to super().__init__(). A ton of the modules hasn't been cleaned up, so no need to do it in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -0,0 +1,14 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no data files for this module, so no need for this file either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

hsa = HSAClass()
res = self.access_archive_with_retries(hsa.download_data, parameters)
if res is None:
pytest.skip("Archive broke the connection {} times, unable to test".format(self.retries))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and everywhere below: I wonder whether it would be more useful to let this case break the test? Otherwise it would be difficult to pick up if there is a genuine issue (or would you always expect a non None return?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was testing these last time I noticed that the connection with the archive was very unstable and, at least for large files, a lot times the connection was broken by the archive. The reason why I decided to skip the tests if the archive is unresponsive is to avoid flakiness in the tests. We always expect a non None return from these functions (the filename) so it should be ok to check for a None value to determine if the connection was broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any smaller files maybe to work with?

Also, if this stays as is, maybe xfail is more appropriate than skip?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I agree, xfail suits better here. I've changed it.
  • Since the changes to make these tests faster I haven't noticed any connection problems. I would still keep the tests with xfail just in case (I don't think it hurts to have it 🙂)

@bsipocz
Copy link
Member

bsipocz commented Apr 15, 2022

I was going through @keflavich comments to see whether all got addressed and the only remaining one is about the lack of query_region.
I wonder whether in fact query_observations should/could be renamed, after all it takes a coordinate and a radius and returns a Table. Nothing else is specified in the API, e.g. those returned in the table can be either objects from a catalogue, but also observation ID as in this case.

On the other hand, I see that query_observations was in fact suggested by Adam. So I basically need you @keflavich to double-check whether this is in fact what you prefer. (note: while the naming is logical, we don't have query_observations method anywhere else)

Other than getting this clarified, the PR is ready for merge. Test coverage is up at 85% when the remote tests are run, so ignore the codecov complaint above.

@keflavich
Copy link
Contributor

OK, reviewing my comments, I did indeed suggest query_observation and query_region both.

I think query_observation makes sense as-is: it is just getting observation IDs.

I suggeset we add query_region that does nearly the same thing, but instead of

"select top {n_obs} observation_id from hsa.v_active_observation "

it should do the default TAP thing of:

"select top {n_obs} * from hsa.v_active_observation "

i.e., query_region should be a coordinate-based query that defaults to *. I'll add a suggested commit for this.

@keflavich
Copy link
Contributor

suggestion added in lvalerom#1

@bsipocz
Copy link
Member

bsipocz commented Apr 15, 2022

OK. So, In that case I would suggest we go ahead merge this and you open a PR against main, it would provide a cleaner branch structure, etc (I'm not able to merge the PR that is against the fork).

@bsipocz bsipocz merged commit 82136f4 into astropy:main Apr 15, 2022
@bsipocz
Copy link
Member

bsipocz commented Apr 15, 2022

Thanks @lvalerom! I'll open issues as reminders for the ideas during the review that were pointing beyond this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants