Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addurls: Improve result displayed for subdataset creation #5689

Merged
merged 3 commits into from
Aug 5, 2021

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented May 21, 2021

When addurls creates a subdataset, the create result, as shown by the default renderer, doesn't make it clear which subdataset was created:

script
set -eu

cd "$(mktemp -d "${TMPDIR:-/tmp}"/dl-XXXXXXX)"
datalad create >/dev/null 2>&1

datalad -lwarning \
        addurls -x '' --key='{key}' -t csv \
        - '{url}' '{_url_basename:.2}//{_url_basename}' <<EOF
key,url
SHA256E-s8669--a1954d14daa74af097d5cb1c65efaadb781f1924f42e31ac6dc5af57e4a8e955.svg,http://handbook.datalad.org/en/0.12/_images/install.svg
SHA256E-s49162--13c1bcb98945af4c51082b7431de291f509d2e25edf9e14631a27bc548d9eaa1.svg,http://handbook.datalad.org/en/0.12/_images/student.svg
EOF
create(ok): . (dataset)
fromkey(ok): /tmp/dl-JhgrkS0/in/install.svg (file) [registered URL]
create(ok): . (dataset)
fromkey(ok): /tmp/dl-JhgrkS0/st/student.svg (file) [registered URL]
save(ok): st (dataset)
save(ok): in (dataset)
add(ok): in (file)
add(ok): st (file)
add(ok): .gitmodules (file)
save(ok): . (dataset)
action summary:
  add (ok: 3)
  create (ok: 2)
  fromkey (ok: 2)
  save (ok: 3)

This series, in particular the last commit, defines a custom result renderer that changes that to

create(ok): in (dataset)
fromkey(ok): /tmp/dl-TF8m3S7/in/install.svg (file) [registered URL]
create(ok): st (dataset)
fromkey(ok): /tmp/dl-TF8m3S7/st/student.svg (file) [registered URL]
save(ok): st (dataset)
save(ok): in (dataset)
add(ok): in (file)
add(ok): st (file)
add(ok): .gitmodules (file)
save(ok): . (dataset)
action summary:
  add (ok: 3)
  create (ok: 2)
  fromkey (ok: 2)
  save (ok: 3)

Most of the other commits leading up to the last commit make it possible to easily display the default summary despite defining a custom_result_renderer method.


I think this at least partially addresses gh-4989, though the request there might be for more progress than just result records. (And I suspect no create results are shown in the output there because addurls was called from the Python API where result_renderer=None would be the default.)


@codecov
Copy link

codecov bot commented May 21, 2021

Codecov Report

Merging #5689 (597c53c) into master (258b5c5) will decrease coverage by 2.81%.
The diff coverage is 100.00%.

❗ Current head 597c53c differs from pull request most recent head 3e11103. Consider uploading reports for the commit 3e11103 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5689      +/-   ##
==========================================
- Coverage   90.63%   87.81%   -2.82%     
==========================================
  Files         309      306       -3     
  Lines       42404    42404              
==========================================
- Hits        38433    37239    -1194     
- Misses       3971     5165    +1194     
Impacted Files Coverage Δ
datalad/local/addurls.py 95.97% <100.00%> (-1.02%) ⬇️
datalad/local/tests/test_addurls.py 99.57% <100.00%> (+<0.01%) ⬆️
datalad/plugin/wtf.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/addurls.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/no_annex.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/add_readme.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/check_dates.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/support/tests/utils.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/export_archive.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/export_to_figshare.py 0.00% <0.00%> (-100.00%) ⬇️
... and 104 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 258b5c5...3e11103. Read the comment docs.

@kyleam kyleam force-pushed the addurls-subdataset-feedback branch from e584dbf to 443f808 Compare May 25, 2021 21:02
@kyleam kyleam marked this pull request as ready for review May 25, 2021 21:02
@kyleam kyleam added the do not merge Not to be merged, will trigger WIP app "not passed" status label May 25, 2021
The next commit will change addurls() to use a tailored result
renderer.  As a consequence, calls to addurls() will not use
result_renderer=None by default.

Explicitly specify result_renderer=None to maintain the current output
noise level.
When addurls() creates a subdataset, an unhelpful

  create(ok): . (dataset)

is displayed.  That's because the result comes directly from
<subds>.create().

To display the subdataset path, one option would be to intercept the
result and adjust the refds field to point to the top-level addurls()
dataset.  However, this is a problematic change because it would break
any result hooks.  Aside from that, it's an artificial result that
could mislead and confuse those inspecting it.

Instead define a custom result renderer that limits the adjustment to
the rendered result.
@kyleam kyleam force-pushed the addurls-subdataset-feedback branch from 443f808 to c13cfb5 Compare May 26, 2021 14:30
@kyleam kyleam removed the do not merge Not to be merged, will trigger WIP app "not passed" status label May 28, 2021
@yarikoptic
Copy link
Member

create(ok): . (dataset)
to
create(ok): in (dataset)

I think for all those, #4454 was to provide a more generic and somewhat more ultimate solution. May be should stretch some 24h and resurrect that effort, and shame on me for not reviewing it in time, so may be Kyle could have looked there instead. Not feeling really happy about adding yet another magical handling for one command.

@mih
Copy link
Member

mih commented Jul 28, 2021

In principle I am in favor of systematic solutions. However, given that such solution did not emerge in full for a considerable time, and this addition is rather minimal I do not think that it is the most sensible strategy to reject this PR.

Minus the adjustments in the tests (which have no impact on the presence or absence of a generic solution, now or later), here is the relevant diff that remains wrt to present master

diff --git a/datalad/local/addurls.py b/datalad/local/addurls.py
index 75039ab01..10c39b809 100644
--- a/datalad/local/addurls.py
+++ b/datalad/local/addurls.py
@@ -1286,6 +1290,8 @@ class Addurls(Interface):
             rows."""),
     )
 
+    result_renderer = "tailored"
+
     @staticmethod
     @datasetmethod(name='addurls')
     @eval_results
@@ -1428,9 +1434,12 @@ class Addurls(Interface):
                     "Not creating subdataset at existing path: %s",
                     subds_path)
             else:
-                yield from subds.create(result_xfm=None,
+                for res in subds.create(result_xfm=None,
                                         cfg_proc=cfg_proc,
-                                        return_type='generator')
+                                        return_type='generator'):
+                    if res.get("action") == "create":
+                        res["addurls.refds"] = ds_path
+                    yield res
                 created_subds.append(subpath)
             repo = subds.repo  # "expensive" so we get it once
 
@@ -1542,3 +1551,16 @@ filename_format='{filenameformat}'"""
                 message=message_addurls,
                 jobs=jobs,
                 return_type='generator')
+
+    @staticmethod
+    def custom_result_renderer(res, **kwargs):
+        refds = res.get("addurls.refds")
+        if refds:
+            res = dict(res, refds=refds)
+        default_result_renderer(res)
+
+    custom_result_summary_renderer_pass_summary = True
+
+    @staticmethod
+    def custom_result_summary_renderer(_, action_summary):
+        render_action_summary(action_summary)

I'd say that is rather minimal and comprehensible. Moreover, looking at #4454 I see that it still inspects refds, and that property being wrong as what is worked around here. So I am not confident that #4454 is actually capable to fix #4989 out of the box.

@yarikoptic
Copy link
Member

thanks @mih for the analysis! Ok, let's then proceed as is. For the paranoid myself, I will push a fresh merge of master to see if no new developments emerged since original PR tests run.

…feedback

* origin/master: (97 commits)
  RF: Standardize logger object name to match code base
  RF: Normalize logging (levels) for process execution
  BF: Let ORA handle URL keys properly
  BF: Prevent needless call to 7z when there is no archive
  TST: Basic handling of URL keys in ORA code
  MNT: Reduce unused imports and harmonize imports within some modules
  Update datalad/tests/test_witless_runner.py
  Add a smoketest for newly added branch state information in WTF
  UX: Report dataset branches and states in dataset report of WTF
  Update test_witless_runner.py
  TST: Add test to check that identical remote and common-data-src names raise a ValueError
  UX: Validity-check that sibling name and auto-enabled special remote name are not identical
  Adjust wording of MissingExternalDependency error
  One import per line
  simplify writer thread
  support zero length stdin data
  add a writer thread to write data to subprocess-stdin
  BF: set DATALAD_LOG_EXC via envvar in case of tests run with GIT_HOME set (Debian)
  BF: setup_package - move import of _TEMP_PATHS_GENERATED
  RF(TST): remove DATALAD_LOG_EXC specification from CI workflows
  ...
@yarikoptic yarikoptic added this to the 0.15.0 milestone Aug 5, 2021
@yarikoptic yarikoptic merged commit c6a8838 into datalad:master Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants