-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated bosh pull to allow multiple zenodo ids to be pulled at once #438
Conversation
- Added tests to see if pulling multiple items worked - Added a test to show that if you duplicate a searched-for zenodo id it collapses it and gives results as if you did not send duplicates - Added a test to show that if only some of the zenodo ids were valid then it would raise an exception on the first invalid zenodo id found - Switched the mocks in test_pull to use `side_effects` because it's cooler and better, and I can more readily mock multiple different types of response - added verbose feedback about squashing duplicate zenodo ids - changed any places using pull to pass a list of ids and to expect a list as a response - ParseArgs kindly takes care of passing no zenodo ids, but there is no test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me! @erinb90, would you have time to look at it before we merge?
self.cached_fname = os.path.join(self.cache_dir, | ||
"zenodo-{0}.json".format(self.zid)) | ||
discarded_zids = zids | ||
zids = list(dict.fromkeys(zids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to remove duplicates, then perhaps add a comment for it.
tools/python/boutiques/puller.py
Outdated
+ downloaded[0]) | ||
json_files.append(downloaded[0]) | ||
else: | ||
raise_error(ZenodoError, "Seached-for descriptor \"{0}\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Searched
tools/python/boutiques/puller.py
Outdated
+ downloaded[0]) | ||
json_files.append(downloaded[0]) | ||
else: | ||
raise_error(ZenodoError, "Seacrhed-for descriptor \"{0}\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's still a typo here 😛
print_info("Downloading descriptor %s" | ||
% file_name) | ||
downloaded = urlretrieve(file_path, entry["fname"]) | ||
if(self.verbose): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the below line should be printed even in non-verbose mode. Without it the user has no idea where the descriptor was downloaded and whether the download was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you don't need parentheses around self.verbose
(for this and everywhere else in the file)
Both the use of verbose and the format of the if are from the original. It is not unreasonable to formalize how we use verbose or to add something to what pycodestyles is checking for, but I think that's unrelated to this issue. |
return downloaded[0] | ||
raise_error(ZenodoError, "Descriptor not found") | ||
if not len(r.json()["hits"]["hits"]): | ||
raise_error(ZenodoError, "Descriptor \"{0}\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be nicer if, instead of raising an exception after the first invalid Zenodo ID, we simply printed an error message without terminating the program. That way, the valid ones would still be downloaded and the user would only have to rerun the command for the invalid ones. The logger has a print_error
function for this purpose.
In |
True, those parentheses were already there. I'm using Pycharm and it underlines everywhere that has redundant parentheses, so I can fix all of them at some point in the future. Pycodestyle doesn't care about it so it's not a big deal. However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree. |
I had considered that, but once again I went with an approach consistent
with the previous implementation. Specifically, raising an exception on
error. Since there is no cost to rerun the whole command again because
cached descriptors should be identical by definition, it seemed the intent
was to force the user to give a fully correct set of parameters, vs a tool
for validating zenodo ids. We expect all correct ids and complain to the
user when, exceptionally, this is not the case. I'm not opposed to a
refinement of the current intent, but unless I have misinterpreted, it may
be best to do that in another issue.
…On Tue, Mar 12, 2019, 10:53 Erin Benderoff ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tools/python/boutiques/puller.py
<#438 (comment)>:
>
- for hit in r.json()["hits"]["hits"]:
- file_path = hit["files"][0]["links"]["self"]
- file_name = file_path.split(os.sep)[-1]
- if hit["id"] == int(self.zid):
- if not os.path.exists(self.cache_dir):
- os.makedirs(self.cache_dir)
- if(self.verbose):
- print_info("Downloading descriptor %s"
- % file_name)
- downloaded = urlretrieve(file_path, self.cached_fname)
- print("Downloaded descriptor to " + downloaded[0])
- return downloaded[0]
- raise_error(ZenodoError, "Descriptor not found")
+ if not len(r.json()["hits"]["hits"]):
+ raise_error(ZenodoError, "Descriptor \"{0}\" "
I think it might be nicer if, instead of raising an exception after the
first invalid Zenodo ID, we simply printed an error message without
terminating the program. That way, the valid ones would still be downloaded
and the user would only have to rerun the command for the invalid ones. The
logger has a print_error function for this purpose.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#438 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABunQOOoFEtSzOC1CY45s1FFAKjRzjZHks5vV79SgaJpZM4bm--2>
.
|
Well it made sense to raise an exception when we were only pulling one ID at a time, but now that we accept a list, we need to decide whether to terminate after seeing an invalid one or letting it proceed through the whole thing. I personally think the latter is better, otherwise you might end up with some descriptors downloaded and others not, which can be confusing. @glatard what do you think? |
When checking for consistency in the searcher update I noted again that we only show things found in cache in verbose mode, but as you say, originally always posted the download URL. This seems inconsistent with the rest of puller.py. It's all a matter of the purpose of pull, currently given as I'd note that puller doesn't actually do what its description says it does. it |
Good points. Ok, so can you change the description of the puller to what you said? And I realize now that it's probably better not to show any info messages in non-verbose mode. When using |
Conversation is the best way to make good documentation... now I'm
gonna try to sneak in my change to the description of pull as I put it
back :D
…On Tue, Mar 12, 2019 at 2:04 PM Erin Benderoff ***@***.***> wrote:
However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree.
When checking for consistency in the searcher update I noted again that we only show things found in cache in verbose mode, but as you say, originally always posted the download URL. This seems inconsistent with the rest of puller.py. It's all a matter of the purpose of pull, currently given as Download a descriptor from Zenodo.
I'd note that puller doesn't actually do what its description says it does. it Ensures that Zenodo descriptors are locally cached, downloading them if needed. With that description it might be clearer why I think that we should be more consistent with the verbose output, either putting it all under verbose or always showing the found location. Even all that bugs me because what really matters is the end-location in cache, not whether it was downloaded or the URL of the download; I can see that being desirable if you wanted to be able to see and nuke the local cache to force a download, but we currently only have one remote source and descriptors on zenodo are immutable so this seems like over-engineering.
Good points. Ok, so can you change the description of the puller to what you said?
And I realize now that it's probably better not to show any info messages in non-verbose mode. When using bosh pull as part of another command (e.g. bosh exec launch with a Zenodo ID), we don't care about the location. We just want to use the descriptor, whether from cache or downloaded. Sorry for making you change it and then change it back
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
--
Stuart Thiel, P. Eng.
|
This closes #435
it collapses it and gives results as if you did not send duplicates
then it would raise an exception on the first invalid zenodo id found
side_effects
because it'scooler and better, and I can more readily mock multiple different types
of response
list as a response
test for this