Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAST download should efficiently fallback on existing downloads #2820

Open
pllim opened this issue Aug 29, 2023 · 6 comments
Open

MAST download should efficiently fallback on existing downloads #2820

pllim opened this issue Aug 29, 2023 · 6 comments

Comments

@pllim
Copy link
Member

pllim commented Aug 29, 2023

Example use case:

from astroquery.mast import Observations

files = ['jw02727-o002_t062_nircam_clear-f090w_i2d.fits',
         'jw02727-o002_t062_nircam_clear-f277w_i2d.fits'
        ]

for fn in files:
    uri = f"mast:JWST/product/{fn}"
    result = Observations.download_file(uri, local_path=fn)

I already have cached copies of these files.

Problem: When MAST server performance is degraded, it takes a really long time for download_file to complete when I just really want it to grab the cached copy. Even when successful, it would end up just grabbing the cached copy anyway.

Workaround: I have to avoid using download_file altogether, hunt down where that file was downloaded to, and use it directly.

Desired solution: When MAST server takes too long to respond (maybe user can set this timeout?), automatically stops and grab a cached copy. Throw a warning if you like. Throw error as before if no cached copy found.

cc @bmorris3

Related:

@bsipocz
Copy link
Member

bsipocz commented Aug 29, 2023

Let's not call it a "cached" copy, as it's confusing with the already existing cache, stored elsewhere from the downloaded files.

@bsipocz bsipocz changed the title MAST download should efficiently fallback on existing cache when server connection is degraded MAST download should efficiently fallback on existing downloads Aug 29, 2023
@bsipocz
Copy link
Member

bsipocz commented Aug 29, 2023

And I would say, it always fall back on the already existing downloads (unless opted out). If that's not the case, then this is clearly a bug.

@pllim
Copy link
Member Author

pllim commented Aug 29, 2023

It does, but only after a looooong wait that I know is unnecessary.

Actually I would also accept a "just get the downloaded copy, don't connect to internet at all" as solution.

@bsipocz
Copy link
Member

bsipocz commented Aug 29, 2023

"just get the downloaded copy, don't connect to internet at all"

yes, and I think that should be the default. If not available, go for the internet. And have a keyword that would opt out.

@bsipocz
Copy link
Member

bsipocz commented Aug 29, 2023

Btw, I'm refactoring the download utilities, and upstream them to pyvo. So many parts of mast will need to be refactored to use that. (and this will be the default behaviour, I really don't want to see unnecessary repeated downloads)

@aripollak
Copy link
Member

aripollak commented Dec 27, 2023

Maybe an interim workaround for @pllim would be to lower the MAST timeout to something like 1? The default seems to be 600 seconds.

EDIT: Nevermind, I see that downloading a file still requires the request to succeed, and I'm guessing a very low timeout would just cause an exception to be raised in that case and not re-use the cache. I opened #2909 about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants