Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of mirror URLs for IERS download #20

Open
lpsinger opened this issue Jul 11, 2019 · 28 comments
Open

List of mirror URLs for IERS download #20

lpsinger opened this issue Jul 11, 2019 · 28 comments
Labels
enhancement New feature or request

Comments

@lpsinger
Copy link

lpsinger commented Jul 11, 2019

Transferred from astropy/astropy

Both of the USNO hosts (maia.usno.navy.mil, toshi.nofs.navy.mil) for retrieving the IERS Bulletin A dataset are flaky. Instead of a primary URL specified by the configuration setting utils.iers.iers_auto_url and mirror URL by utils.iers.iers_auto_url_mirror, I suggest a single list of URLs.

Here is what I propose for backward compatibility:

  • Understand both iers_auto_url and iers_auto_url_mirror to be either a string or a list of strings.
  • Construct the list of URLs to try by concatenating all of the URLs from iers_auto_url and iers_auto_url_mirror and removing duplicates.
  • Deprecate the iers_auto_url_mirror configuration setting for removal in a future release.

Here is a possible default URL list:

@pllim pllim added the enhancement New feature or request label Jul 11, 2019
@pllim
Copy link
Member

pllim commented Jul 11, 2019

Pinging @taldcroft, @mhvk, @eteq, and @adrn since iers is primarily used in table and coordinates.

Opsie... turns out I did not have enough foresight when I implemented the mirror in astropy/astropy#8308 😬

@mhvk
Copy link
Contributor

mhvk commented Jul 12, 2019

Seems like a good idea. My only question would be if we are not doing something similar already elsewhere - just to be sure we don't duplicate different mechanisms (but the suggested scheme sounds good).

@rtolesnikov
Copy link

At least as of today, https://maia.usno.navy.mil/ser7/finals2000A.all is required for access. http:// (without the 's') no longer responds.

@pllim
Copy link
Member

pllim commented Jul 16, 2019

@lpsinger , is it less flaky now with latest astropy dev after astropy/astropy#8993 is merged?

@lpsinger
Copy link
Author

lpsinger commented Aug 8, 2019

@pllim, no, it's still really flaky. Our project (growth-astro/growth-too-marshal) has to download the IERS files in its CI tests, and even with the https URL, it's failing more often than it is succeeding. I think that we really do need a list of fallback URLs. Also, astroplan needs to be able to use the fallbacks.

@wtgee
Copy link

wtgee commented Aug 18, 2019

Chiming in here as well (in addition to astropy/astroplan#356 (comment)). We have no problem providing our own mirror, mitigating the flakiness issue, but there still seem to be some consistency issues with how it is detected by the library. This might be specific to astroplan itself rather than astropy, but wanted to at least link the issues.

@lpsinger
Copy link
Author

I don't think that you should provide your own mirror unless you can guarantee that you can update it whenever a new copy is issued.

@mhvk
Copy link
Contributor

mhvk commented Aug 19, 2019

Agreed with @lpsinger. I am also not sure whether it is so necessary to download IERS-A in every CI run - it is a hefty file and why test someone else's server? (Or add unnecessary pressure to it) It seems to me for testing one's own code, it is much more important to check that a given state leads to the right outcome.

Anyway, I'm still in favour of the simple solution proposed on top, of just making a list of URLs, starting with just those sites that already claim to provide up-to-date copies.

@lpsinger
Copy link
Author

The issue with Astroplan is that it directly checks for the existence of the file in the cache, and the filename in the cache would depend on which URL is successfully downloaded.

Can Astropy instead manage this?

@wtgee
Copy link

wtgee commented Aug 19, 2019

I don't think that you should provide your own mirror unless you can guarantee that you can update it whenever a new copy is issued.

The issue for us is described in astropy/astroplan#410, namely that the military domains are banned from certain countries and the ftp download doesn't work with the astropy setup (because it is redirecting to ftps). The https://datacenter.iers.org/data/9/finals2000A.all should work but this limits us to only one server and as far as I can tell all of the URLs listed above are hosted on single-server instances that have proved to be far from reliable.

It is trivial to host a 3MB file in a cloud-based CDN and it is trivial to set up something that keeps that file up to date (indeed, it took me five minutes using a combination of Google Cloud Storage, Cloud Functions, and Cloud Scheduler).

The issue with Astroplan is that it directly checks for the existence of the file in the cache, and the filename in the cache would depend on which URL is successfully downloaded.

Yes, this is our bigger issue and a reliable global override of iers.IERS_A_URL. But perhaps I'm missing something?

@eteq
Copy link
Member

eteq commented Aug 20, 2019

I'm also 👍 to the list of URLs solution as in the OP and @mhvk's comment.

@wtgee

It is trivial to host a 3MB file in a cloud-based CDN and it is trivial to set up something that keeps that file up to date

In principle I agree but logistically there's a lot more to it if it's supposed to be permanent. "Easy right now" is not the same thing as "easy to keep running for the next 10 years". After all, that explicitly is the purpose of the IERS itself (the service, not the file)!

That said, combined with the list of URLs this isn't so bad because it makes it one of several options and thus adds something like more fault tolerance.

@wtgee
Copy link

wtgee commented Aug 20, 2019

@eteq agreed about the long-term viability. The real hope would be to host this as a public google dataset. However, I'm not sure that's in keeping with the nature of google public datasets as the data doesn't seem entirely useful by itself. Tagging @mimming for input in that regard.

@wtgee
Copy link

wtgee commented Aug 21, 2019

While I think this PR is good, it doesn't help my problem as the last two urls are invalid and the top two urls are still military domains.

How do I actually change the value of astropy.utils.iers.conf.iers_auto_url?

I've tried to follow the guides in the page on the config system but there is nothing about IERS in there and I can't seem to enter a value in any single file that will persist. The documentation is far from clear. Sorry for posting here, but we don't really need more IERS issues/PRs floating around.

@pllim
Copy link
Member

pllim commented Aug 21, 2019

@wtgee , before I propose a solution for you, how do you envision you want to customize your URL? Is it per session? Are you okay with modifying ~/.astropy/config.astropy.cfg manually? I can think of at least two different ways.

Addendum: IERS_A can be changed using the config system, but not IERS_B. I don't know why.

@wtgee
Copy link

wtgee commented Aug 21, 2019

@wtgee , before I propose a solution for you, how do you envision you want to customize your URL? Is it per session? Are you okay with modifying ~/.astropy/config.astropy.cfg manually? I can think of at least two different ways.

This is to be used on some telescopes that run in a mostly automated fashion (https://projectpanoptes.org). We control the full install of the machine and the software itself is running on some docker images, so I have no problems modifying any of the set config files. We do mostly have a single session (i.e. control daemon) that is running the main telescope, but various other scripts also need to access this data (e.g. weather plotting, some processing, etc) separate from the main control. I can easily have a cron job that is running to update the entire system every week (or whatever). I just don't want to have to manually specify the url in every single script that imports astropy (mostly astroplan).

This could also be related to @lpsinger comment about astroplan: astropy/astroplan#356 (comment), which @bmorris3 is aware of.

As mentioned, our other issue is that we already have placement in a few countries that don't have the ability to access US military domains, so the primary and mirror URLs are a no-go for us (although this will be a minority of installs in the long-run). As also mentioned, we have no problem setting up our own mirror of the data. The issue is just getting this to work globally in the software in a consistent fashion (as below).

Addendum: IERS_A can be changed using the config system, but not IERS_B. I don't know why.

I don't get how to do this permanently. I tried modifying the config file in various places but nothing seemed to persist permanently.

Thanks!

@pllim
Copy link
Member

pllim commented Aug 22, 2019

@wtgee , what version of astropy are you locked into? Can you deploy with dev version, or does it have to be some released version? You are using Python 3, right? What version are you running right now?

@pllim
Copy link
Member

pllim commented Aug 22, 2019

Custom IERS A URLs per-session example

Disclaimer: The "mirror" URLs here are not real mirrors. Do not use them for production.

>>> from astropy.utils import iers
>>> from astropy.utils.iers import conf as iers_conf
>>> iers_conf.iers_auto_url
'https://maia.usno.navy.mil/ser7/finals2000A.all'
>>> iers_conf.iers_auto_url = 'https://astroconda.org/aux/astropy_mirror/iers_a_1/finals2000A.all'
>>> iers_conf.iers_auto_url_mirror = 'https://astroconda.org/aux/astropy_mirror/iers_a_2/finals2000A.all'
>>> table = iers.IERS_Auto.open()  # Note the URL
Downloading https://astroconda.org/aux/astropy_mirror/iers_a_1/finals2000A.all

@pllim
Copy link
Member

pllim commented Aug 22, 2019

Custom IERS A URLs from config file example

Disclaimer: The "mirror" URLs here are not real mirrors. Do not use them for production.

And to test this properly, clear your cache if you want. Cache is in ~/.astropy/cache by default.

In your ~/.astropy/config/astropy.cfg (or wherever you have configured astropy to store it), add this:

[utils.iers.iers]
iers_auto_url = https://astroconda.org/aux/astropy_mirror/iers_a_1/finals2000A.all
iers_auto_url_mirror = https://astroconda.org/aux/astropy_mirror/iers_a_2/finals2000A.all

Then, start a fresh session after you modified that file above, or do the following to force a reload:

>>> from astropy.utils.iers import conf as iers_conf
>>> iers_conf.iers_auto_url
'https://maia.usno.navy.mil/ser7/finals2000A.all'
>>> iers_conf.reload()                                                     
>>> iers_conf.iers_auto_url                                                
'https://astroconda.org/aux/astropy_mirror/iers_a_1/finals2000A.all'

Now table = iers.IERS_Auto.open() would read from your mirrors.

@pllim
Copy link
Member

pllim commented Aug 22, 2019

@lpsinger et al. , as a stop-gap solution that can be backported, how about changing the primary IERS_A_URL to point to https://datacenter.iers.org/data/9/finals2000A.all ? That one is a civilian domain accessible by the affected countries?

@wtgee
Copy link

wtgee commented Aug 22, 2019

@lpsinger et al. , as a stop-gap solution that can be backported, how about changing the primary IERS_A_URL to point to https://datacenter.iers.org/data/9/finals2000A.all ? That one is a civilian domain accessible by the affected countries?

While I'm all about this in theory, that particular domain isn't working because of the https redirect. Following your example and setting both domains to ensure usage:

>>> from astropy.utils import iers
>>> from astropy.utils.iers import conf as iers_conf
>>> iers_conf.iers_auto_url
'https://datacenter.iers.org/data/9/finals2000A.all'
>>> iers_conf.iers_auto_url_mirror
'https://datacenter.iers.org/data/9/finals2000A.all'
>>> table = iers.IERS_Auto.open()
WARNING: failed to download https://datacenter.iers.org/data/9/finals2000A.all and https://datacenter.iers.org/data/9/finals2000A.all, using local IERS-B: HTTP Error 403: Forbidden;HTTP Error 403: Forbidden [astropy.utils.iers.iers]
>>> table  # Note it is IERS_B
<IERS_B length=20405>

Edit: just a note that the above is not from one of the restricted domain countries.

The NASA ftp site used to have the same issues with an ftps redirect but that does appear to be working for me now.

@wtgee
Copy link

wtgee commented Aug 22, 2019

Custom IERS A URLs from config file example

This works great (with my custom mirrors), thanks for both examples. FWIW, I just had [utils.iers] in the config file. In hindsight the solution seems obvious but good to have it spelled out.

The remaining issues are astroplan specific and should be solved by @lpsinger's astropy/astroplan#418

@pllim
Copy link
Member

pllim commented Aug 23, 2019

Great! Until someone has time to work on refactoring for List O' Mirrors support, at least this is not blocking your work anymore. 🤞

@aarchiba
Copy link

astropy/astropy#9182 allows you to call download_file with remote_url set to the "authoritative" location of the data, and whatever is downloaded will be stored in the cache under this URL. But you can provide a list of URLs from which that data can actually be obtained, in order, and it need not include the "authoritative" location. Does this solve your problem?

It would be possible, in principle, to add a mechanism in the config file where users could specify translations for URLs - that is, a user who knew they couldn't, or might not be able to, access a certain URL could provide a list of backup URLs, and these would be used even for download calls internal to astropy.

@bhazelton
Copy link

I'm not sure if this is the right thread, but there's a big red warning on both http://maia.usno.navy.mil/ and http://toshi.nofs.navy.mil/ saying that they will be down for maintenance from Oct. 24 2019 until April 30 2020 (6 months!). Is the community already aware of this?

@bsipocz
Copy link
Member

bsipocz commented Oct 21, 2019

@bhazelton - It was briefly mentioned on a slack thread, and we're asking about USNO recommendations, but it's definitely worth to have its own issue. Would you be interested in opening one?

@bhazelton
Copy link

sure, I wrote it up in astropy/astropy#9427. Feel free to add anything I missed.

@pllim
Copy link
Member

pllim commented Oct 27, 2023

Should we move this to https://github.com/astropy/astropy-iers-data ?

@mhvk
Copy link
Contributor

mhvk commented Oct 28, 2023

Perhaps, though if we add something, it likely still has to be documented in astropy.utils.iers

@pllim pllim transferred this issue from astropy/astropy Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants