Updated bosh pull to allow multiple zenodo ids to be pulled at once #438

ramou · 2019-03-10T00:25:06Z

This closes #435

Added tests to see if pulling multiple items worked
- Added a test to show that if you duplicate a searched-for zenodo id
  it collapses it and gives results as if you did not send duplicates
- Added a test to show that if only some of the zenodo ids were valid
  then it would raise an exception on the first invalid zenodo id found
Switched the mocks in test_pull to use side_effects because it's
cooler and better, and I can more readily mock multiple different types
of response
added verbose feedback about squashing duplicate zenodo ids
changed any places using pull to pass a list of ids and to expect a
list as a response
ParseArgs kindly takes care of passing no zenodo ids, but there is no
test for this

- Added tests to see if pulling multiple items worked - Added a test to show that if you duplicate a searched-for zenodo id it collapses it and gives results as if you did not send duplicates - Added a test to show that if only some of the zenodo ids were valid then it would raise an exception on the first invalid zenodo id found - Switched the mocks in test_pull to use `side_effects` because it's cooler and better, and I can more readily mock multiple different types of response - added verbose feedback about squashing duplicate zenodo ids - changed any places using pull to pass a list of ids and to expect a list as a response - ParseArgs kindly takes care of passing no zenodo ids, but there is no test for this

coveralls · 2019-03-10T00:35:55Z

Coverage decreased (-0.05%) to 93.527% when pulling 8b13cfe on ramou:i435 into f1c4cc7 on boutiques:develop.

…into i435

glatard

This all looks great to me! @erinb90, would you have time to look at it before we merge?

glatard · 2019-03-11T21:29:28Z

tools/python/boutiques/puller.py

-        self.cached_fname = os.path.join(self.cache_dir,
-                                         "zenodo-{0}.json".format(self.zid))
+        discarded_zids = zids
+        zids = list(dict.fromkeys(zids))


If this is meant to remove duplicates, then perhaps add a comment for it.

glatard · 2019-03-11T21:35:19Z

tools/python/boutiques/puller.py

+                                   + downloaded[0])
+                    json_files.append(downloaded[0])
+                else:
+                    raise_error(ZenodoError, "Seached-for descriptor \"{0}\" "


Typo: Searched

erinb90 · 2019-03-12T14:29:58Z

tools/python/boutiques/puller.py

+                                   + downloaded[0])
+                    json_files.append(downloaded[0])
+                else:
+                    raise_error(ZenodoError, "Seacrhed-for descriptor \"{0}\" "


there's still a typo here 😛

erinb90 · 2019-03-12T14:32:36Z

tools/python/boutiques/puller.py

+                        print_info("Downloading descriptor %s"
+                                   % file_name)
+                    downloaded = urlretrieve(file_path, entry["fname"])
+                    if(self.verbose):


I think that the below line should be printed even in non-verbose mode. Without it the user has no idea where the descriptor was downloaded and whether the download was successful.

Also, you don't need parentheses around self.verbose (for this and everywhere else in the file)

ramou · 2019-03-12T14:42:04Z

Both the use of verbose and the format of the if are from the original. It is not unreasonable to formalize how we use verbose or to add something to what pycodestyles is checking for, but I think that's unrelated to this issue.

erinb90 · 2019-03-12T14:46:21Z

tools/python/boutiques/puller.py

-                return downloaded[0]
-        raise_error(ZenodoError, "Descriptor not found")
+            if not len(r.json()["hits"]["hits"]):
+                raise_error(ZenodoError, "Descriptor \"{0}\" "


I think it might be nicer if, instead of raising an exception after the first invalid Zenodo ID, we simply printed an error message without terminating the program. That way, the valid ones would still be downloaded and the user would only have to rerun the command for the invalid ones. The logger has a print_error function for this purpose.

erinb90 · 2019-03-12T14:48:22Z

In searcher.py, at the part where it prints [ INFO ] Search successful, could you add the search query to this info message? Now that we're doing multiple searches in a single bosh command, this would make things more clear.

erinb90 · 2019-03-12T14:53:25Z

Both the use of verbose and the format of the if are from the original. It is not unreasonable to formalize how we use verbose or to add something to what pycodestyles is checking for, but I think that's unrelated to this issue.

True, those parentheses were already there. I'm using Pycharm and it underlines everywhere that has redundant parentheses, so I can fix all of them at some point in the future. Pycodestyle doesn't care about it so it's not a big deal.

However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree.

ramou · 2019-03-12T15:10:28Z

I had considered that, but once again I went with an approach consistent with the previous implementation. Specifically, raising an exception on error. Since there is no cost to rerun the whole command again because cached descriptors should be identical by definition, it seemed the intent was to force the user to give a fully correct set of parameters, vs a tool for validating zenodo ids. We expect all correct ids and complain to the user when, exceptionally, this is not the case. I'm not opposed to a refinement of the current intent, but unless I have misinterpreted, it may be best to do that in another issue.

…

On Tue, Mar 12, 2019, 10:53 Erin Benderoff ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tools/python/boutiques/puller.py <#438 (comment)>: > - for hit in r.json()["hits"]["hits"]: - file_path = hit["files"][0]["links"]["self"] - file_name = file_path.split(os.sep)[-1] - if hit["id"] == int(self.zid): - if not os.path.exists(self.cache_dir): - os.makedirs(self.cache_dir) - if(self.verbose): - print_info("Downloading descriptor %s" - % file_name) - downloaded = urlretrieve(file_path, self.cached_fname) - print("Downloaded descriptor to " + downloaded[0]) - return downloaded[0] - raise_error(ZenodoError, "Descriptor not found") + if not len(r.json()["hits"]["hits"]): + raise_error(ZenodoError, "Descriptor \"{0}\" " I think it might be nicer if, instead of raising an exception after the first invalid Zenodo ID, we simply printed an error message without terminating the program. That way, the valid ones would still be downloaded and the user would only have to rerun the command for the invalid ones. The logger has a print_error function for this purpose. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#438 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABunQOOoFEtSzOC1CY45s1FFAKjRzjZHks5vV79SgaJpZM4bm--2> .

erinb90 · 2019-03-12T15:21:55Z

I had considered that, but once again I went with an approach consistent with the previous implementation. Specifically, raising an exception on error. Since there is no cost to rerun the whole command again because cached descriptors should be identical by definition, it seemed the intent was to force the user to give a fully correct set of parameters, vs a tool for validating zenodo ids. We expect all correct ids and complain to the user when, exceptionally, this is not the case. I'm not opposed to a refinement of the current intent, but unless I have misinterpreted, it may be best to do that in another issue.
…
On Tue, Mar 12, 2019, 10:53 Erin Benderoff @.> wrote: @.* commented on this pull request. ------------------------------ In tools/python/boutiques/puller.py <#438 (comment)>: > - for hit in r.json()["hits"]["hits"]: - file_path = hit["files"][0]["links"]["self"] - file_name = file_path.split(os.sep)[-1] - if hit["id"] == int(self.zid): - if not os.path.exists(self.cache_dir): - os.makedirs(self.cache_dir) - if(self.verbose): - print_info("Downloading descriptor %s" - % file_name) - downloaded = urlretrieve(file_path, self.cached_fname) - print("Downloaded descriptor to " + downloaded[0]) - return downloaded[0] - raise_error(ZenodoError, "Descriptor not found") + if not len(r.json()["hits"]["hits"]): + raise_error(ZenodoError, "Descriptor "{0}" " I think it might be nicer if, instead of raising an exception after the first invalid Zenodo ID, we simply printed an error message without terminating the program. That way, the valid ones would still be downloaded and the user would only have to rerun the command for the invalid ones. The logger has a print_error function for this purpose. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#438 (review)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABunQOOoFEtSzOC1CY45s1FFAKjRzjZHks5vV79SgaJpZM4bm--2 .

Well it made sense to raise an exception when we were only pulling one ID at a time, but now that we accept a list, we need to decide whether to terminate after seeing an invalid one or letting it proceed through the whole thing. I personally think the latter is better, otherwise you might end up with some descriptors downloaded and others not, which can be confusing. @glatard what do you think?

ramou · 2019-03-12T17:28:09Z

However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree.

When checking for consistency in the searcher update I noted again that we only show things found in cache in verbose mode, but as you say, originally always posted the download URL. This seems inconsistent with the rest of puller.py. It's all a matter of the purpose of pull, currently given as Download a descriptor from Zenodo.

I'd note that puller doesn't actually do what its description says it does. it Ensures that Zenodo descriptors are locally cached, downloading them if needed. With that description it might be clearer why I think that we should be more consistent with the verbose output, either putting it all under verbose or always showing the found location. Even all that bugs me because what really matters is the end-location in cache, not whether it was downloaded or the URL of the download; I can see that being desirable if you wanted to be able to see and nuke the local cache to force a download, but we currently only have one remote source and descriptors on zenodo are immutable so this seems like over-engineering.

erinb90 · 2019-03-12T18:04:21Z

However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree.

When checking for consistency in the searcher update I noted again that we only show things found in cache in verbose mode, but as you say, originally always posted the download URL. This seems inconsistent with the rest of puller.py. It's all a matter of the purpose of pull, currently given as Download a descriptor from Zenodo.

I'd note that puller doesn't actually do what its description says it does. it Ensures that Zenodo descriptors are locally cached, downloading them if needed. With that description it might be clearer why I think that we should be more consistent with the verbose output, either putting it all under verbose or always showing the found location. Even all that bugs me because what really matters is the end-location in cache, not whether it was downloaded or the URL of the download; I can see that being desirable if you wanted to be able to see and nuke the local cache to force a download, but we currently only have one remote source and descriptors on zenodo are immutable so this seems like over-engineering.

Good points. Ok, so can you change the description of the puller to what you said?

And I realize now that it's probably better not to show any info messages in non-verbose mode. When using bosh pull as part of another command (e.g. bosh exec launch with a Zenodo ID), we don't care about the location. We just want to use the descriptor, whether from cache or downloaded. Sorry for making you change it and then change it back 😛

ramou · 2019-03-12T18:06:41Z

Conversation is the best way to make good documentation... now I'm gonna try to sneak in my change to the description of pull as I put it back :D

…

On Tue, Mar 12, 2019 at 2:04 PM Erin Benderoff ***@***.***> wrote: However, the "Downloaded descriptor to..." line was intentionally printed in non-verbose mode before, so I'd like to keep it that way if others agree. When checking for consistency in the searcher update I noted again that we only show things found in cache in verbose mode, but as you say, originally always posted the download URL. This seems inconsistent with the rest of puller.py. It's all a matter of the purpose of pull, currently given as Download a descriptor from Zenodo. I'd note that puller doesn't actually do what its description says it does. it Ensures that Zenodo descriptors are locally cached, downloading them if needed. With that description it might be clearer why I think that we should be more consistent with the verbose output, either putting it all under verbose or always showing the found location. Even all that bugs me because what really matters is the end-location in cache, not whether it was downloaded or the URL of the download; I can see that being desirable if you wanted to be able to see and nuke the local cache to force a download, but we currently only have one remote source and descriptors on zenodo are immutable so this seems like over-engineering. Good points. Ok, so can you change the description of the puller to what you said? And I realize now that it's probably better not to show any info messages in non-verbose mode. When using bosh pull as part of another command (e.g. bosh exec launch with a Zenodo ID), we don't care about the location. We just want to use the descriptor, whether from cache or downloaded. Sorry for making you change it and then change it back — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Stuart Thiel, P. Eng.

Stuart Thiel added 2 commits March 9, 2019 19:23

This didn't get staged and I didn't notice. Done now.

00e67fa

ramou changed the base branch from master to develop March 11, 2019 15:50

Merge branch 'develop' of https://github.com/boutiques/boutiques.git …

67b08e8

…into i435

glatard approved these changes Mar 11, 2019

View reviewed changes

Thiels851 and others added 4 commits March 11, 2019 17:56

Made suggested corrections

3e9602f

Merge branch 'i435' of https://github.com/ramou/boutiques.git into i435

10df10c

Merge branch 'develop' into i435

8c1c190

Fixed pycodestyle

04bb3e1

erinb90 reviewed Mar 12, 2019

View reviewed changes

Fixed silly typo... again. Reverted download message to always print.

e478d50

Changed search to indicate the keywords in the verbose success string

14c108e

Updated pull description and changed download info to verbose mode

8b13cfe

glatard merged commit 1f9d722 into boutiques:develop Apr 1, 2019

ramou deleted the i435 branch May 28, 2019 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated bosh pull to allow multiple zenodo ids to be pulled at once #438

Updated bosh pull to allow multiple zenodo ids to be pulled at once #438

ramou commented Mar 10, 2019

coveralls commented Mar 10, 2019 •

edited

Loading

glatard left a comment

glatard Mar 11, 2019

glatard Mar 11, 2019

erinb90 Mar 12, 2019

erinb90 Mar 12, 2019 •

edited

Loading

erinb90 Mar 12, 2019 •

edited

Loading

ramou commented Mar 12, 2019

erinb90 Mar 12, 2019

erinb90 commented Mar 12, 2019

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 via email

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 •

edited

Loading

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 via email

Updated bosh pull to allow multiple zenodo ids to be pulled at once #438

Updated bosh pull to allow multiple zenodo ids to be pulled at once #438

Conversation

ramou commented Mar 10, 2019

coveralls commented Mar 10, 2019 • edited Loading

glatard left a comment

Choose a reason for hiding this comment

glatard Mar 11, 2019

Choose a reason for hiding this comment

glatard Mar 11, 2019

Choose a reason for hiding this comment

erinb90 Mar 12, 2019

Choose a reason for hiding this comment

erinb90 Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

erinb90 Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

ramou commented Mar 12, 2019

erinb90 Mar 12, 2019

Choose a reason for hiding this comment

erinb90 commented Mar 12, 2019

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 via email

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 • edited Loading

erinb90 commented Mar 12, 2019

ramou commented Mar 12, 2019 via email

coveralls commented Mar 10, 2019 •

edited

Loading

erinb90 Mar 12, 2019 •

edited

Loading

erinb90 Mar 12, 2019 •

edited

Loading

ramou commented Mar 12, 2019 •

edited

Loading