Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: refactor downloader code #118

Merged
merged 1 commit into from
Jan 12, 2021
Merged

global: refactor downloader code #118

merged 1 commit into from
Jan 12, 2021

Conversation

ParthS007
Copy link
Member

@ParthS007 ParthS007 commented Jan 5, 2021

global: refactor downloader code
cli: add --download-lib option in download-files



class Downloader(object):
"""Downloader class for managing pycurl and requests related utilities functions."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go for the object-oriented way, what about bringing it further:

  • make Downloader class abstract, with common methods for showing progress etc, and with abstract method download() that will get implemented in each subclass
  • create DownloaderPycurl and DownloaderRequests classes implementing download() function using each concrete downloader

This will also leave space for creating new classes:

  • DownloaderXRootD class, using xrootdpyfs; this is currently enough. But later we could have another class using vanilla xrootd client (as opposed to xrootdpyfs).

Note: OT1H, the common Downloader() class could take care of both HTTP and XRootD protocols, if it makes sense to have some common functions, such as progress reporting or argument massaging. OTOH, we could also make two distinct class hierarchies, DownloaderHttp (with its DownloaderHttpPycurl, DownloaderHttpRequests subclasses) and DownloaderXrootd, which could make separation cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option with making Downloader class abstract and implementing DownloaderPycurl and DownloaderRequests, DownloaderXRootD looks good to me.

I am not sure how the progress or any other methods will get implemented in Xrootd case so making distinct class hierarchies, DownloaderHttp (with its DownloaderHttpPycurl, DownloaderHttpRequests subclasses) and DownloaderXrootd also makes sense.

Bit confused here for choosing any of these. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After MM discussion, proceeded with the second option.

@@ -243,6 +243,13 @@ def get_file_locations(server, recid, doi, title, protocol, expand, verbose):
DOWNLOAD_RETRY_SLEEP
),
)
@click.option(
"--download-lib",
"dl_lib",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_lib is not very descriptive. What about keeping the same variable name as the CLI option name. This will make searchers etc easier.

--download-lib would be better spelled --download-library. But if we keep this option also for XRootD and any other future methods, what about simply calling it --downloader? Or perhaps --downloader-engine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--downloader-engine sounds good to me.

@@ -0,0 +1,115 @@
# -*- coding: utf-8 -*-
# This file is part of cernopendata-client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new file? The class is relatively small, and if we introduce subclasses, the code will be nicely separated in that way. So I guess it can stay in the same downloader.py file. It would be simpler and more consistent with the other files present in the cernopendata-client package.

Copy link
Member Author

@ParthS007 ParthS007 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the new file just to have the class separated to just show and have an idea myself how it will look like, will move it to downloader.py.

@ParthS007 ParthS007 changed the title global: refactor downloader code for pycurl and requests global: refactor downloader code Jan 11, 2021
c.close()


class DownloaderXrootd:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new XRootD downloader does not work locally for me, the troubles are already with checking local file existence before attempting the first download... Here's a simple recipe:

$ git clean -d -ff -x
$ $ cernopendata-client download-files --recid 5500  --protocol xrootd
==> Downloading file 1 of 11
Traceback (most recent call last):
  File ".../cernopendata-client/bin/cernopendata-client", line 8, in <module>
    sys.exit(cernopendata_client())
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/cli.py", line 354, in download_files
    check_error(
  File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 211, in check_error
    "size": os.path.getsize(file_dest),
  File "/usr/lib64/python3.9/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '5500/BuildFile.xml'

@@ -123,6 +123,24 @@ def test_download_files_https_requests(mocker):
os.remove(test_file)


def test_download_files_download_engine(mocker):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see troubles with Python-2.7 locally. The importlib-resources was installed but is not Python-2.7 friendly, since it uses f-strings. You need to pin some package versions in setup.py conditionally, so that they would not be installed for Python-2.7. Here is a test case to try out locally:

$ tox -e py27
...
  File "/home/tibor/private/project/opendata/src/cernopendata-client/.eggs/importlib_resources-0.0.0-py2.7.egg/importlib_resources/readers.py", line 77
    raise FileNotFoundError(f'{self} is not a file')
                                                  ^
...
  File "/home/tibor/private/project/opendata/src/cernopendata-client/.tox/py27/lib/python2.7/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'importlib-resources>=1.0' distribution was not found and is required by virtualenv
ERROR: InvocationError for command /home/tibor/private/project/opendata/src/cernopendata-client/.tox/py27/bin/python setup.py test (exited with code 1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the $ tox -e py27 in a python2.7 env. It worked fine without warnings.
log.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved: this was observed with OS native python3-tox package. Using tox from PyPI fixes the troubles.

"download_engine",
default="requests",
type=click.Choice(["requests", "pycurl"]),
help="Library to be used in downloading files [requests,pycurl, xrootd] [default=requests]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace "Library by "Engine" here and elsewhere, since the option is called "engine".

msg="File {} is incomplete. Resuming download.".format(
file_name,
msg_type="error",
msg="{} is not installed on system. Please install it".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a traceback coming after the message. We should exit without showing traceback. Also, please put a full stop after the error message.

$ cernopendata-client download-files --recid 5500 --download-engine=pycurl
==> Downloading file 1 of 11
==> ERROR: pycurl is not installed on system. Please install it
Traceback (most recent call last):
  File ".../cernopendata-client/bin/cernopendata-client", line 8, in <module>
    sys.exit(cernopendata_client())
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ".../cernopendata-client/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/cli.py", line 348, in download_files
    download_single_file(
  File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 322, in download_single_file
    downloader.file_downloader()
  File ".../cernopendata-client/lib/python3.9/site-packages/cernopendata_client/downloader.py", line 132, in file_downloader
    c = pycurl.Curl()
NameError: name 'pycurl' is not defined

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another observation on the same track: is protocol option being checked against download engine option?

$ cernopendata-client download-files --recid 5500 --protocol xrootd --download-engine requests
==> Downloading file 1 of 11

One would expect an error message saying that protocol xrootd and engine requests are not compatible.

Please add some checks and test cases for that.

@@ -243,6 +243,12 @@ def get_file_locations(server, recid, doi, title, protocol, expand, verbose):
DOWNLOAD_RETRY_SLEEP
),
)
@click.option(
"--download-engine",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using download-engine but elsewhere in the code we have downloader-engine:

tests/test_cli_download_files.py
127:    """Test download_files() command with downloader-engine option."""
145:    """Test download_files() command with downloader-engine option and wrong protocol."""
157:    """Test download_files() command with downloader-engine option and wrong protocol."""

Let's standardise this.

BTW what about the simplest version downloader so that people type --downloader pycurl? Seems good to go nicely with class names...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about downloader initially but then I thought download-engine sounds to be more specific.

"--download-engine",
"download_engine",
type=click.Choice(["requests", "pycurl", "xrootdpyfs"]),
help="Engine to be used in downloading files [requests, pycurl, xrootdpyfs] [default=requests(for HTTP), xrootdpyfs(for XRootD)]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download engine to use when downloading files. The available values are 'requests', 'pycurl', 'xrootdpyfs'. [default=requests (for HTTP protocol), xrootdpyfs (for XRootD protocol)]

docs/usage.rst Outdated
@@ -247,6 +247,22 @@ xrootd`` command-line option to use that protocol instead of HTTP/HTTPS:
-> File: ./5500/mass4l_combine.png
==> Success!

**Downloading engine**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Select downloader engine*

to go better with the other subtitles that use the verb style such as "Filter by name..."

docs/usage.rst Outdated

You can specify the download engine with ``--download-engine`` option.

- ``requests`` and ``pycurl`` are supported download engine for **HTTP** protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are two supported download engines for

(plural)

docs/usage.rst Outdated
You can specify the download engine with ``--download-engine`` option.

- ``requests`` and ``pycurl`` are supported download engine for **HTTP** protocol.
- ``xrootdpyfs`` is supported download engine for **XRootD** protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the only supported

-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Success!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra empty line

if download_engine not in download_engine_protocol_http_map:
display_message(
msg_type="error",
msg="{} is not compatible with {} protocol. Please use requests/pycurl engine.".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Please use requests/pycurl" could be misunderstood. Better say "Please use 'requests' or 'pycurl'."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in order to have this extensible, perhaps you can have a config.py variable which protocol supports which download engine options, and print the message dynamically from that config variable?

(E.g. this will come handy when we shall add vanilla 'xrdcp' as opposed to 'xrootdpyfs'.)

@codecov-io
Copy link

codecov-io commented Jan 12, 2021

Codecov Report

Merging #118 (f4c687c) into master (fb70a46) will decrease coverage by 3.20%.
The diff coverage is 66.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #118      +/-   ##
==========================================
- Coverage   90.01%   86.81%   -3.21%     
==========================================
  Files          11       11              
  Lines         631      675      +44     
==========================================
+ Hits          568      586      +18     
- Misses         63       89      +26     
Impacted Files Coverage Δ
cernopendata_client/downloader.py 68.42% <64.03%> (-9.51%) ⬇️
cernopendata_client/cli.py 96.91% <100.00%> (+0.11%) ⬆️
cernopendata_client/config.py 100.00% <100.00%> (ø)

setup.py Outdated
@@ -27,6 +27,7 @@
"pytest>=2.8.0",
'pytest-mock>=2.0,<3.0 ; python_version=="2.7"',
'pytest-mock>=3.0 ; python_version>="3"',
"tox>=3.21.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we cannot leave it like this, since tox -e py27 would like to install tox, as part of the run, and it seems to fail for me for py27... So let's leave it out for simplicity. Otherwise we would have to do the python_version dance, but it is probably not worth it.

Adds `--download-engine` option to select pycurl, requests or xrootdpyfs engines when downloading files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants