-
Notifications
You must be signed in to change notification settings - Fork 109
NF: foreach-dataset --o-s relpath - capture and pass-through prefixing with path to subds #7071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I understand the problem and hence the desire for that kind of result rendering.
But I'm a bit confused by the appraoch. Why entangling a result rendering option with the output stream option? This kinda destroys a basic concept: That a datalad
option like -f
should not interfere with actual command behavior. But now it does. If I specify -f
, I overrule --output-stream
which is far from obvious. It also means, that if I disable result rendering altogether, I don't get any output "passed-through". The doc declaring that relpath
essentially is like pass-through suggests otherwise and is technically a lie. It's capture
+prefix+spit out again (as long as one doesn't specify a result renderer). That's a different concept.
I feel this is an abuse of the result renderer concept (including writing to sys.stdout/err directly instead of ui
). And I agree - it's an overfit to grep
.
Not happy with it, but also not instantly clear to me, what's the better alternative, though. I think what you really want is disable result rendering and read+prefix+write the actual output pipes as things are coming in. That would suggest a runner protocol for it (but another construct for python callables, I guess). Hm.
An interesting analysis/concern @bpoldrack. But even with But I am open for any pragmatic suggestion on how to tune it up. As I have mentioned in the original description I was even considering smth like |
8819c57
to
200159f
Compare
Codecov ReportBase: 87.75% // Head: 90.81% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #7071 +/- ##
==========================================
+ Coverage 87.75% 90.81% +3.06%
==========================================
Files 325 325
Lines 44074 44142 +68
Branches 0 5862 +5862
==========================================
+ Hits 38675 40089 +1414
+ Misses 5399 4038 -1361
- Partials 0 15 +15
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
On second thought, @yarikoptic: I think, what we need to do instead is properly enabling result filtering. What's missing from your I would like to not conflate the "output stream of a command" with its rendered results. Those are two different concepts. While rendered results can be part of the output stream of a command, that's not generally true and an output stream of a command can contain things that are not rendered results (as is the case with Another aspect of this is the writing to stdout directly from within the result renderer. Not going through |
what about needing to prefix each line in
We would need to differ On the other hand, the fact that everything centralizes via |
200159f
to
0ed8f78
Compare
note: rebased on top of master (was on maint) since diff was poisoned with other maint diffs |
Registering an idea here (no time yet to properly test it). I think, something like this may be the solution: class PrefixStdout(WitlessProtocol):
proc_out = True
def __init__(self, prefix, done_future=None, encoding=None):
super().__init__(done_future, encoding)
self._prefix = prefix
def pipe_data_received(self, fd, data):
for line in data.splitlines():
os.write(1, self._prefix + line) and make the new parameter ( WDYT, @yarikoptic? |
I like the idea, thank you @bpoldrack . My concerns are
|
… around context adjust test to patch correct/used logger. Which logger it is I do not think matters for anywhere but the test.
…g with path to subds Motivation for this feature is frequent use of `foreach-dataset git grep PATTERN` on e.g captured by con/tinuous logs as I did in e.g. datalad#6848 : $> datalad foreach-dataset "git grep 'FAILED .*test_nested_pushclone_cycle_allplatforms' | grep cron || :" foreach-dataset(ok): /mnt/datasets/datalad/ci/logs/2022 (dataset) 03/cron/20220603T193647/173dc6f/travis-14100-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms foreach-dataset(ok): /mnt/datasets/datalad/ci/logs/2022/01 (dataset) 10/cron/20220610T193717/029f839/travis-14136-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 20/cron/20220520T193556/84f979a/travis-14036-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 27/cron/20220527T193600/310fd9f/travis-14055-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms foreach-dataset(ok): /mnt/datasets/datalad/ci/logs/2022/06 (dataset) foreach-dataset(ok): /mnt/datasets/datalad/ci/logs/2022/05 (dataset) ... As you could see - the default pass-through + default rendering there is "suboptimal" since - main hurdle: paths from `grep` are relative to that subdataset and not readily cut-pasteable since we need to join with subdataset path first - pollutes with default renderer 'ok' or not for a given dataset (hence needed || : to avoid complains) - prints all those (ok) which we do not quite care about -- we just want to see output as this hierarchy was a single dataset, so if nothing hit -- we just carry forward. With this PR and "relpath" mode, we adjust renderer so that output looks like: (git)smaug:/mnt/datasets/datalad/ci/logs/2022[master]git $> datalad foreach-dataset --o-s relpath "git grep 'FAILED .*test_nested_pushclone_cycle_allplatforms' | grep cron " 06/03/cron/20220603T193647/173dc6f/travis-14100-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 06/10/cron/20220610T193717/029f839/travis-14136-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 05/20/cron/20220520T193556/84f979a/travis-14036-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 05/27/cron/20220527T193600/310fd9f/travis-14055-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 07/15/cron/20220715T194006/52e822a/travis-14276-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms pros: - readily usable path which melded relative path to the subdataset with output of grep - since we ignore status -- no errors reported even though I had no ||: - no useless ok's - with "smart" treatment of either "dataset" argument is provided or not IMHO reported relative paths are quite neat despite C1 (below) - I thought it could be sufficiently closely matched simply via `datalad -f '{path}{stdout}' foreach-dataset --o-s capture` but "not": $> datalad -f '{path}/{stdout}' foreach-dataset --o-s capture "git grep 'FAILED .*test_nested_pushclone_cycle_allplatforms' | grep cron ||:" /mnt/datasets/datalad/ci/logs/2022/ /mnt/datasets/datalad/ci/logs/2022/01/ /mnt/datasets/datalad/ci/logs/2022/06/03/cron/20220603T193647/173dc6f/travis-14100-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 10/cron/20220610T193717/029f839/travis-14136-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms /mnt/datasets/datalad/ci/logs/2022/05/20/cron/20220520T193556/84f979a/travis-14036-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms 27/cron/20220527T193600/310fd9f/travis-14055-failed/14.txt:FAILED ../core/distributed/tests/test_push.py::test_nested_pushclone_cycle_allplatforms /mnt/datasets/datalad/ci/logs/2022/04/ since we would get bunch of empty lines, those paths for datasets without hits etc. So - decided that it is worth adding this new mode cons: - C1: overfitting "grep" use case since we prefix with actual path and without any separation (like via :). So may be we should name it `grep` instead of `relpath`. - C2: someone might want full path! But hey -- there is a reason why I named it "relpath" both cons somewhat point to may be choosing another, better describing name. Note: "relpath" idea in output rendering was more generally excercised but never finalized in other PRs - datalad#2679 - datalad#4454 and argued in datalad#3994 to be "less useful" which I would disagree.
I liked the idea of PrefixStdout (despite that would need to also add
well, we just With that it begs to question if it is a worthwhile approach? ATM I have been using this PR version successfully for my use case of invoking |
@yarikoptic |
Thanks @bpoldrack for the recommendation
0ed8f78
to
5b22b4d
Compare
Code Climate has analyzed commit 5b22b4d and detected 7 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
Ok, I linked the failing archive test in the issue - don't think it's related. Let's proceed then. Thx, @yarikoptic! |
PR released in |
First commit is just a helper, 2nd is the actual changes with following ATM description:
Motivation for this feature is frequent use of
foreach-dataset git grep PATTERN
on e.gcaptured by con/tinuous logs as I did in e.g. #6848 :
As you could see - the default pass-through + default rendering there is "suboptimal" since
grep
are relative to that subdataset and notreadily cut-pasteable since we need to join with subdataset path first
|| : to avoid complains)
as this hierarchy was a single dataset, so if nothing hit -- we just carry forward.
With this PR and "relpath" mode, we adjust renderer so that output looks like:
pros:
readily usable path which melded relative path to the subdataset
with output of grep
since we ignore status -- no errors reported even though I had no ||:
no useless ok's
with "smart" treatment of either "dataset" argument is provided or not
IMHO reported relative paths are quite neat despite C1 (below)
I thought it could be sufficiently closely matched simply via
datalad -f '{path}{stdout}' foreach-dataset --o-s capture
but "not":since we would get bunch of empty lines, those paths for datasets without hits etc.
So - decided that it is worth adding this new mode
cons:
any separation (like via :). So may be we should name it
grep
instead ofrelpath
.both cons somewhat point to may be choosing another, better describing name. Please suggest one!
Note: "relpath" idea in output rendering was more generally excercised but
never finalized in other PRs
and argued in #3994 to be "less useful" which I would disagree.
NB code is based off
maint
but positioning against master since a new feature, it would be a sin to propose tomaint
.