-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
global: refactor downloader code #118
Conversation
|
||
|
||
class Downloader(object): | ||
"""Downloader class for managing pycurl and requests related utilities functions.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go for the object-oriented way, what about bringing it further:
- make Downloader class abstract, with common methods for showing progress etc, and with abstract method download() that will get implemented in each subclass
- create DownloaderPycurl and DownloaderRequests classes implementing download() function using each concrete downloader
This will also leave space for creating new classes:
- DownloaderXRootD class, using xrootdpyfs; this is currently enough. But later we could have another class using vanilla xrootd client (as opposed to xrootdpyfs).
Note: OT1H, the common Downloader() class could take care of both HTTP and XRootD protocols, if it makes sense to have some common functions, such as progress reporting or argument massaging. OTOH, we could also make two distinct class hierarchies, DownloaderHttp (with its DownloaderHttpPycurl, DownloaderHttpRequests subclasses) and DownloaderXrootd, which could make separation cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option with making Downloader
class abstract and implementing DownloaderPycurl
and DownloaderRequests
, DownloaderXRootD
looks good to me.
I am not sure how the progress or any other methods will get implemented in Xrootd
case so making distinct class hierarchies, DownloaderHttp (with its DownloaderHttpPycurl, DownloaderHttpRequests subclasses) and DownloaderXrootd also makes sense.
Bit confused here for choosing any of these. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After MM discussion, proceeded with the second option.
cernopendata_client/cli.py
Outdated
@@ -243,6 +243,13 @@ def get_file_locations(server, recid, doi, title, protocol, expand, verbose): | |||
DOWNLOAD_RETRY_SLEEP | |||
), | |||
) | |||
@click.option( | |||
"--download-lib", | |||
"dl_lib", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dl_lib
is not very descriptive. What about keeping the same variable name as the CLI option name. This will make searchers etc easier.
--download-lib
would be better spelled --download-library
. But if we keep this option also for XRootD and any other future methods, what about simply calling it --downloader
? Or perhaps --downloader-engine
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--downloader-engine
sounds good to me.
@@ -0,0 +1,115 @@ | |||
# -*- coding: utf-8 -*- | |||
# This file is part of cernopendata-client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new file? The class is relatively small, and if we introduce subclasses, the code will be nicely separated in that way. So I guess it can stay in the same downloader.py
file. It would be simpler and more consistent with the other files present in the cernopendata-client package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the new file just to have the class separated to just show and have an idea myself how it will look like, will move it to downloader.py
.
c.close() | ||
|
||
|
||
class DownloaderXrootd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new XRootD downloader does not work locally for me, the troubles are already with checking local file existence before attempting the first download... Here's a simple recipe:
$ git clean -d -ff -x
$ $ cernopendata-client download-files --recid 5500 --protocol xrootd
==> Downloading file 1 of 11
Traceback (most recent call last):
File ".../cernopendata-client/bin/cernopendata-client", line 8, in <module>
sys.exit(cernopendata_client())
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/cli.py", line 354, in download_files
check_error(
File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 211, in check_error
"size": os.path.getsize(file_dest),
File "/usr/lib64/python3.9/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '5500/BuildFile.xml'
@@ -123,6 +123,24 @@ def test_download_files_https_requests(mocker): | |||
os.remove(test_file) | |||
|
|||
|
|||
def test_download_files_download_engine(mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see troubles with Python-2.7 locally. The importlib-resources
was installed but is not Python-2.7 friendly, since it uses f-strings. You need to pin some package versions in setup.py
conditionally, so that they would not be installed for Python-2.7. Here is a test case to try out locally:
$ tox -e py27
...
File "/home/tibor/private/project/opendata/src/cernopendata-client/.eggs/importlib_resources-0.0.0-py2.7.egg/importlib_resources/readers.py", line 77
raise FileNotFoundError(f'{self} is not a file')
^
...
File "/home/tibor/private/project/opendata/src/cernopendata-client/.tox/py27/lib/python2.7/site-packages/pkg_resources/__init__.py", line 786, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'importlib-resources>=1.0' distribution was not found and is required by virtualenv
ERROR: InvocationError for command /home/tibor/private/project/opendata/src/cernopendata-client/.tox/py27/bin/python setup.py test (exited with code 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the $ tox -e py27
in a python2.7 env
. It worked fine without warnings.
log.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved: this was observed with OS native python3-tox
package. Using tox
from PyPI fixes the troubles.
cernopendata_client/cli.py
Outdated
"download_engine", | ||
default="requests", | ||
type=click.Choice(["requests", "pycurl"]), | ||
help="Library to be used in downloading files [requests,pycurl, xrootd] [default=requests]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace "Library by "Engine" here and elsewhere, since the option is called "engine".
cernopendata_client/downloader.py
Outdated
msg="File {} is incomplete. Resuming download.".format( | ||
file_name, | ||
msg_type="error", | ||
msg="{} is not installed on system. Please install it".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is a traceback coming after the message. We should exit without showing traceback. Also, please put a full stop after the error message.
$ cernopendata-client download-files --recid 5500 --download-engine=pycurl
==> Downloading file 1 of 11
==> ERROR: pycurl is not installed on system. Please install it
Traceback (most recent call last):
File ".../cernopendata-client/bin/cernopendata-client", line 8, in <module>
sys.exit(cernopendata_client())
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/cli.py", line 348, in download_files
download_single_file(
File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 322, in download_single_file
downloader.file_downloader()
File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 132, in file_downloader
c = pycurl.Curl()
NameError: name 'pycurl' is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another observation on the same track: is protocol option being checked against download engine option?
$ cernopendata-client download-files --recid 5500 --protocol xrootd --download-engine requests
==> Downloading file 1 of 11
One would expect an error message saying that protocol xrootd and engine requests are not compatible.
Please add some checks and test cases for that.
@@ -243,6 +243,12 @@ def get_file_locations(server, recid, doi, title, protocol, expand, verbose): | |||
DOWNLOAD_RETRY_SLEEP | |||
), | |||
) | |||
@click.option( | |||
"--download-engine", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using download-engine
but elsewhere in the code we have downloader-engine
:
tests/test_cli_download_files.py
127: """Test download_files() command with downloader-engine option."""
145: """Test download_files() command with downloader-engine option and wrong protocol."""
157: """Test download_files() command with downloader-engine option and wrong protocol."""
Let's standardise this.
BTW what about the simplest version downloader
so that people type --downloader pycurl
? Seems good to go nicely with class names...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about downloader
initially but then I thought download-engine
sounds to be more specific.
cernopendata_client/cli.py
Outdated
"--download-engine", | ||
"download_engine", | ||
type=click.Choice(["requests", "pycurl", "xrootdpyfs"]), | ||
help="Engine to be used in downloading files [requests, pycurl, xrootdpyfs] [default=requests(for HTTP), xrootdpyfs(for XRootD)]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download engine to use when downloading files. The available values are 'requests', 'pycurl', 'xrootdpyfs'. [default=requests (for HTTP protocol), xrootdpyfs (for XRootD protocol)]
docs/usage.rst
Outdated
@@ -247,6 +247,22 @@ xrootd`` command-line option to use that protocol instead of HTTP/HTTPS: | |||
-> File: ./5500/mass4l_combine.png | |||
==> Success! | |||
|
|||
**Downloading engine** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Select downloader engine*
to go better with the other subtitles that use the verb style such as "Filter by name..."
docs/usage.rst
Outdated
|
||
You can specify the download engine with ``--download-engine`` option. | ||
|
||
- ``requests`` and ``pycurl`` are supported download engine for **HTTP** protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are two supported download engines for
(plural)
docs/usage.rst
Outdated
You can specify the download engine with ``--download-engine`` option. | ||
|
||
- ``requests`` and ``pycurl`` are supported download engine for **HTTP** protocol. | ||
- ``xrootdpyfs`` is supported download engine for **XRootD** protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only supported
-> File: ./5500/BuildFile.xml | ||
-> Progress: 0/0 KiB (100%) | ||
==> Success! | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra empty line
cernopendata_client/downloader.py
Outdated
if download_engine not in download_engine_protocol_http_map: | ||
display_message( | ||
msg_type="error", | ||
msg="{} is not compatible with {} protocol. Please use requests/pycurl engine.".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Please use requests/pycurl" could be misunderstood. Better say "Please use 'requests' or 'pycurl'."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, in order to have this extensible, perhaps you can have a config.py
variable which protocol supports which download engine options, and print the message dynamically from that config variable?
(E.g. this will come handy when we shall add vanilla 'xrdcp' as opposed to 'xrootdpyfs'.)
Codecov Report
@@ Coverage Diff @@
## master #118 +/- ##
==========================================
- Coverage 90.01% 86.81% -3.21%
==========================================
Files 11 11
Lines 631 675 +44
==========================================
+ Hits 568 586 +18
- Misses 63 89 +26
|
setup.py
Outdated
@@ -27,6 +27,7 @@ | |||
"pytest>=2.8.0", | |||
'pytest-mock>=2.0,<3.0 ; python_version=="2.7"', | |||
'pytest-mock>=3.0 ; python_version>="3"', | |||
"tox>=3.21.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we cannot leave it like this, since tox -e py27 would like to install tox, as part of the run, and it seems to fail for me for py27... So let's leave it out for simplicity. Otherwise we would have to do the python_version
dance, but it is probably not worth it.
Adds `--download-engine` option to select pycurl, requests or xrootdpyfs engines when downloading files.
global: refactor downloader code
cli: add --download-lib option in download-files