BF: don't write output from within sshrun #7072

bpoldrack · 2022-10-07T15:59:20Z

datalad sshrun explicitly calls SSH with log_output=False which results in the use of NoCapture protocol with the runner. Meaning, stdout/stderr of SSH is written out anyway already. When SSH returns, sshrun tried to write both to its stdout/stderr. But: It could not possibly have anything to write. That would not be an issue in and of itself, but sshrun is not necessarily used directly. In particular it is called by git (due to GIT_SSH_COMMAND=datalad sshrun). This resulted in a problem when apparently git has closed the pipe to its ssh executable (sshrun) already and we tried to write to it (although we really didn't even have something to write).

This ultimately led to issue #6599, where the actual ssh ... git-upload-pack execution succeeded and returned 0, but datalad sshrun itself produced a broken pipe error trying to write to stdout and hence returning non-zero.

It's not entirely clear when exactly this happens. When exactly the pipe is closed may be depend on git version as the failing builds are running 2.35.1 (MacOS on appveyor) whereas otherbuilds have either newer or older versions of git. In any case:
There can't be anything to write out to begin with, so don't even try.

Closes #6599
Closes #7078

mih

I think a plain removal of these lines is not good. It is all but clear that there could not be any output to write from the code in this file. We are doomed to swallow something important at some point.

If this assertion hold (and needs to hold), an assertion should be in place.

But why not simple guard the write call, and turn it into a best-effort attempt? If a pipe is closed, it wont matter, it could not go anywhere anyways. But of it is open, and there is something, it will come out.

And this seems better than an assertion that does not hold, as it will leave the user with a crash.

WDYT?

bpoldrack · 2022-10-07T16:16:00Z

@mih : Well, I could do that obviously, but I don't see why. log_output=False is unconditional and it should be. A possibly captured output of the SSH call is currently piped through already. Why would we want to change that eventually in favor of a delayed output at the end (same content)? It sabotages any stdin/stdout communication with the caller for no obvious benefit.

Edit: The only reason I can see, how it's "all but clear that there could not be any output to write from the code" is, that the behavior of the log_output option to SSHConnection's __call__ is not documented. That's certainly worth changing. Do you see another reason why it's not clear?

bpoldrack · 2022-10-07T16:47:16Z

@mih : How about this 2231ba2 ? (Will squash if you agree).
Thing is: If out and err exist, future debugging efforts can easily be set on the wrong track, assuming that there should be something in there, since the return value is stored (and even tried to use for writing).

codecov · 2022-10-07T17:42:30Z

Codecov Report

Base: 74.76% // Head: 75.90% // Increases project coverage by +1.13% 🎉

Coverage data is based on head (2231ba2) compared to base (748e5c6).
Patch coverage: 50.00% of modified lines in pull request are covered.

❗ Current head 2231ba2 differs from pull request most recent head e0b357d. Consider uploading reports for the commit e0b357d to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##            maint    #7072      +/-   ##
==========================================
+ Coverage   74.76%   75.90%   +1.13%     
==========================================
  Files         354      354              
  Lines       58945    58940       -5     
  Branches     6310     6310              
==========================================
+ Hits        44072    44736     +664     
+ Misses      14858    14189     -669     
  Partials       15       15

Impacted Files	Coverage Δ
datalad/core/distributed/tests/test_clone.py	`76.31% <ø> (+0.82%)`	⬆️
datalad/support/sshrun.py	`90.24% <33.33%> (-7.26%)`	⬇️
datalad/support/sshconnector.py	`65.13% <100.00%> (-0.97%)`	⬇️
datalad/_version.py	`45.68% <0.00%> (-0.28%)`	⬇️
datalad/support/tests/test_annexrepo.py	`70.49% <0.00%> (+0.09%)`	⬆️
datalad/__init__.py	`86.00% <0.00%> (+2.50%)`	⬆️
datalad/tests/utils.py	`51.98% <0.00%> (+20.65%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mih · 2022-10-08T08:08:14Z

Added documentation is essential. Other than that I think the new implementation is as problematic re wrong assumptions about behavior as the one before. Your rational is based on assertions that are not in the code.

Anyways, no need to convince me.

bpoldrack · 2022-10-08T08:44:53Z

Your rational is based on assertions that are not in the code.

I honestly don't understand what that assertion is. The now documented behavior of log_output=False is the implemented behavior of it and it's the entire point of the switch existing to begin with. I have no idea what "is not in the code". The switch says: "do not grab the output".

To me it's kinda the opposite. If I in 6 months time see sshrun implementing something like this:

out, err = ssh(...)

try:
   os.write(1, out)
except
   pass

I wonder how long it will take me to realize again, that with current implementation there can't be anything in that out and err.

May be I'm not seeing something. I really don't know what it is.

WDYT, @datalad/developers ?

yarikoptic

my thinking about it -- I think we better assert the assumptions of the code especially if there was prior code thinking otherwise

changelog.d/20221007_180414_benjaminpoldrack_fix_6599.md

datalad/support/sshrun.py

`datalad sshrun` explicitly calls SSH with `log_output=False` which results in the use of `NoCapture` protocol with the runner. Meaning, stdout/stderr of SSH is written out anyway already. When SSH returns, `sshrun` tried to write both to its stdout/stderr. But: It could not possibly have anything to write. That would not be an issue in and of itself, but `sshrun` is not necessarily used directly. In particular it is called by `git` (due to `GIT_SSH_COMMAND=datalad sshrun`). This resulted in a problem when apparently `git` has closed the pipe to its ssh executable (`sshrun`) already and we tried to write to it (although we really didn't even have something to write). This ultimately led to issue datalad#6599, where the actual `ssh ... git-upload-pack` execution succeeded and returned 0, but `datalad sshrun` itself produced a broken pipe error trying to write to stdout and hence returning non-zero. It's not entirely clear when exactly this happens. It may be depend on git version when the pipe is closed as the failing builds are running 2.35.1 (MacOS on appveyor) whereas otherbuilds have either newer or older versions of git. In any case: There can't be anything to write out to begin with, so don't even try. Also: Make it clear in the code, that and why we don't expect any captured output from the SSH subprocess by not storing the empty return value, so future changes (and debuggers) don't falsely assume that 1. Output can simply be captured (with existing protocols) or 2. The returned value would currently be of any use simply b/c it's there. (Closes datalad#6599) (Closes datalad#7078)

yarikoptic · 2022-10-12T13:46:03Z

Thank you @bpoldrack - let's proceed!

yarikoptic-gitmate · 2022-10-14T18:45:59Z

PR released in 0.17.7

mih reviewed Oct 7, 2022

View reviewed changes

bpoldrack force-pushed the fix-6599 branch from 446405f to 4d26f92 Compare October 7, 2022 16:09

bpoldrack added the semver-patch Increment the patch version when merged label Oct 7, 2022

bpoldrack mentioned this pull request Oct 8, 2022

test_get_subdataset_direct_fetch is flaky on OSX on appveyor #7078

Closed

bpoldrack force-pushed the fix-6599 branch from 2231ba2 to 216bde5 Compare October 8, 2022 09:28

bpoldrack marked this pull request as ready for review October 8, 2022 09:28

yarikoptic requested changes Oct 11, 2022

View reviewed changes

changelog.d/20221007_180414_benjaminpoldrack_fix_6599.md Outdated Show resolved Hide resolved

datalad/support/sshrun.py Outdated Show resolved Hide resolved

bpoldrack force-pushed the fix-6599 branch from 216bde5 to 6a10b4f Compare October 11, 2022 14:21

bpoldrack requested a review from yarikoptic October 11, 2022 14:21

bpoldrack force-pushed the fix-6599 branch from 6a10b4f to e0b357d Compare October 11, 2022 15:00

yarikoptic merged commit 18f0bf9 into datalad:maint Oct 12, 2022

bpoldrack mentioned this pull request Oct 14, 2022

PR template: remove Changelog entries, refer to scriv #7081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BF: don't write output from within sshrun #7072

BF: don't write output from within sshrun #7072

bpoldrack commented Oct 7, 2022 •

edited

mih left a comment

bpoldrack commented Oct 7, 2022 •

edited

bpoldrack commented Oct 7, 2022

codecov bot commented Oct 7, 2022 •

edited

mih commented Oct 8, 2022

bpoldrack commented Oct 8, 2022 •

edited

yarikoptic left a comment

yarikoptic commented Oct 12, 2022

yarikoptic-gitmate commented Oct 14, 2022

BF: don't write output from within sshrun #7072

BF: don't write output from within sshrun #7072

Conversation

bpoldrack commented Oct 7, 2022 • edited

mih left a comment

Choose a reason for hiding this comment

bpoldrack commented Oct 7, 2022 • edited

bpoldrack commented Oct 7, 2022

codecov bot commented Oct 7, 2022 • edited

Codecov Report

mih commented Oct 8, 2022

bpoldrack commented Oct 8, 2022 • edited

yarikoptic left a comment

Choose a reason for hiding this comment

yarikoptic commented Oct 12, 2022

yarikoptic-gitmate commented Oct 14, 2022

bpoldrack commented Oct 7, 2022 •

edited

bpoldrack commented Oct 7, 2022 •

edited

codecov bot commented Oct 7, 2022 •

edited

bpoldrack commented Oct 8, 2022 •

edited