Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo package download #234

Merged

Conversation

teutoburg
Copy link
Contributor

@teutoburg teutoburg commented Jun 21, 2023

This is basically a complete re-implementation of the IRDB package download functionality.

It's not completely done, mostly missing some merging of similar functions and, most importantly, unit test (test coverage is already at an acceptable level, but the tests could still be improved). But this should fix #223 as well as close AstarVienna/irdb#107 and address the original root cause of AstarVienna/irdb#105. Also the nightly notebook test runs should stop failing finally.

The plan is to see if it actually does fix these things and move on from there.

Renamed the one from urllib3 to HTTPError3 for clarity
Because we tried to catch the wrong HTTPError, this never evaluated.
…names

Add necessary typing stuff
Also fix get_server_folder_contents no actually using unique_str argument.
Also change get_server_folder_contents to return generator,
adapt accordingly wherever needed.
@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Patch coverage: 79.58% and project coverage change: +0.35 🎉

Comparison is base (da9286c) 75.21% compared to head (ff5f664) 75.57%.

Additional details and impacted files
@@              Coverage Diff               @@
##           dev_master     #234      +/-   ##
==============================================
+ Coverage       75.21%   75.57%   +0.35%     
==============================================
  Files             147      151       +4     
  Lines           15248    15501     +253     
==============================================
+ Hits            11469    11715     +246     
- Misses           3779     3786       +7     
Impacted Files Coverage Δ
scopesim/server/OLD_database.py 0.00% <0.00%> (ø)
scopesim/server/example_data_utils.py 37.87% <37.87%> (ø)
scopesim/server/github_utils.py 80.85% <80.85%> (ø)
scopesim/server/database.py 87.55% <86.50%> (+16.44%) ⬆️
scopesim/server/__init__.py 100.00% <100.00%> (ø)
scopesim/tests/tests_server/test_database.py 100.00% <100.00%> (ø)

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hugobuddel
Copy link
Collaborator

A small note about the tests: I've added the webtest marker to indicate that a test requires the internet to run, by adding the @pytest.mark.webtest decorator. Could you also add that decorator?

AstarVienna/DevOps#6 splits up the CI: non-webtests would then be run on the full [linux, mac, windows] x [python 3.8, 3.9, 3.10, 3.11] matrix, but the webtests only on linux + latest python. The coverage would be performed on this webtest run.

@hugobuddel
Copy link
Collaborator

Oh, I noticed I've not yet added that webtest marker to ScopeSim, only to skycalc_ipy, so we can add them to ScopeSim later.

@teutoburg
Copy link
Contributor Author

I added the @pytest.mark.webtest decorator regardless. Doesn't seem to cause problems now, everything passes :)

Even if it currently has no effect, once we add the marker, this is one less module we need to care about.

@teutoburg teutoburg marked this pull request as ready for review June 21, 2023 19:21
Comment on lines 362 to 372
try:
tqdm_kwargs["desc"] = f"Downloading {pkg_name:<{padlen}}"
with tqdm.wrapattr(save_path.open("wb"), "write", miniters=1,
total=int(response.headers.get("content-length", 0)),
**tqdm_kwargs) as file:
for chunk in response.iter_content(chunk_size=chunk_size):
file.write(chunk)
except Exception as error:
raise error
finally:
file.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by this try/except/finally block, because "it shouldn't do anything due to the context manager". But after some investigation the finally tuns out to be necessary, because save_path.open() is not in a context manager.

It seems you could do this:

Suggested change
try:
tqdm_kwargs["desc"] = f"Downloading {pkg_name:<{padlen}}"
with tqdm.wrapattr(save_path.open("wb"), "write", miniters=1,
total=int(response.headers.get("content-length", 0)),
**tqdm_kwargs) as file:
for chunk in response.iter_content(chunk_size=chunk_size):
file.write(chunk)
except Exception as error:
raise error
finally:
file.close()
tqdm_kwargs["desc"] = f"Downloading {pkg_name:<{padlen}}"
total=int(response.headers.get("content-length", 0))
with (
save_path.open("wb") as file_outer,
tqdm.wrapattr(file_outer, "write", miniters=1, total=total, **tqdm_kwargs) as file_inner,
):
for chunk in response.iter_content(chunk_size=chunk_size):
file_inner.write(chunk)

which is 1 indentation level less, and 2 lines shorter (or 3 if you move the sad smiley to the previous line), but does behave the same, as both file_inner and file_outer are closed even if an exception occurs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing code, in case you are interested:

from pathlib import Path
from tqdm import tqdm

try:
    save_path = Path('test.txt')
    with (
        save_path.open("w") as file_outer,
        tqdm.wrapattr(file_outer, "write", total=10, desc="hii") as file_inner
    ):
        for ii in range(10):
            file_inner.write(str(ii))
            assert ii < 5

except:
    print(f"{file_outer.closed=}")
    print(f"{file_inner.closed=}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, the dark art of the double with-block! I originally ran into the problem that the zip file extraction failed, took me a few minutes to figure out the files were not properly released, so I added the close(). But your solution is clearly superior, and the way it should be done in Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Python 3.8 doesn't understand this double with statement in parentheses. But it gets weirder: According to the docs, this is a new feature in Python 3.10. I'm on 3.9.16, but it works on my machine 🤓
Anyway, two nested with-blocks should work. Once we can drop support for 3.8 (or maybe 3.9?), this can be changed to the non-nested variant...

Comment on lines 39 to 45
width, _ = get_terminal_size((50, 20))
width = int(.8 * width)
bar_width = max(width - 50, 10)
tqdm_kwargs = {
"bar_format": f"{{l_bar}}{{bar:{bar_width}}}{{r_bar}}{{bar:-{bar_width}b}}",
"colour": "green"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps put these in a function?

  • I frequently resize my terminal, which would now screw up the progress bars right?
  • If someone where to do from scopesim.server.database import *, they would now get width imported.
  • Now tqdm_kwargs is a global. If we would later parallelize the downloads, would that mess up the progress bars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you're right! I took these from a script in teutoburg/prepipy, where it doesn't really matter, because it's executed as a script, not imported as a module. But I totally agree that it's a different story here. Isolating this in a simple aux function returning tqdm_kwargs, which is called every time we need a progress bar, should solve this.

@hugobuddel
Copy link
Collaborator

It does seem to take a while before the download starts. That is,

scopesim.download_packages(['ELT'])

seems spends several seconds before it starts downloading (which is <1 second), so it seems slower than the original.

I suppose it is doing several requests to retrieve the equivalent of packages.yaml?

This shouldn't be a dealbreaker though, as the the packages.yaml introduced bugs.

Copy link
Collaborator

@hugobuddel hugobuddel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @teutoburg ! Soo much better now, and I learned something.

The comments I made should not stop you from merging this.

@hugobuddel
Copy link
Collaborator

(I sent the comments one-by-one instead of in a review, because I wasn't sure when my battery would die, so you would at least have something. At 60% now, usually it drops to 0% already earlier, so I guess this is it!)

There is now some uptake before the actual download progress bar shows.
While this could be optimized a bit probably, it's good to inform the
user that there is actually something going on in the background.

Using plain print instead of logging, because progress bar prints anyway.
@teutoburg teutoburg merged commit 7ae144b into AstarVienna:dev_master Jun 22, 2023
7 checks passed
@teutoburg teutoburg deleted the 223-some-issues-with-package-download branch June 22, 2023 11:05
@hugobuddel hugobuddel mentioned this pull request Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants