Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCfix: fix docstring GeneratorStdOutErrCapture to say that treats both stdout and stderr identically #6930

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

yarikoptic
Copy link
Member

I am not sure though if it does capture really, although I guess it should due to
StdOutErrCapture... but that is a different story I guess. The main point is that
there is no stderr_buffer to be seen and we subclass StdOutErrCapture which
has Capture in name for both Out and Err

@christian-monch please approach this fix "critically" since I have little to no clue on how exactly all the layers of the runner work ATM so I could be entirely wrong and stderr_buffer is right there looking into my face somehow ;-)

…th stdout and stderr identically

I am not sure though if it does capture really, although I guess it should due to
StdOutErrCapture... but that is a different story I guess.  The main point is that
there is no stderr_buffer to be seen and we subclass StdOutErrCapture which
has Capture in name for both Out and Err
@yarikoptic yarikoptic added semver-documentation Changes only affect the documentation, no impact on version DX developer experience labels Aug 10, 2022
@codecov
Copy link

codecov bot commented Aug 10, 2022

Codecov Report

Merging #6930 (88586fd) into maint (6d74da1) will increase coverage by 0.97%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##            maint    #6930      +/-   ##
==========================================
+ Coverage   90.12%   91.09%   +0.97%     
==========================================
  Files         354      354              
  Lines       46250    46250              
  Branches     6597     6597              
==========================================
+ Hits        41682    42131     +449     
+ Misses       4552     4103     -449     
  Partials       16       16              
Impacted Files Coverage Δ
datalad/dataset/gitrepo.py 96.81% <ø> (ø)
datalad/downloaders/http.py 78.12% <0.00%> (-2.09%) ⬇️
datalad/_version.py 45.68% <0.00%> (-0.28%) ⬇️
datalad/tests/utils.py 65.10% <0.00%> (+11.03%) ⬆️
datalad/tests/test_tests_utils.py 92.34% <0.00%> (+92.34%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@yarikoptic
Copy link
Member Author

yarikoptic commented Aug 10, 2022

but then getting back to "if it does capture", looking at

            def pipe_data_received(self, fd, data):
                if fd in (1, 2):
                    self.send_result((fd, self.decode(fd, data, self.encoding)))
                else:
                    StdOutErrCapture.pipe_data_received(self, fd, data)

it seems it would capture everything but 1 and 2 in this case. if that is what is intended, then if CommandError is raised while processing output in runner/nonasyncrunner.py(273)_check_result()->None then that CommandError lacks any stdout/stderr content for analysis. That means that code outside must iterate and capture all the output by itself, and cannot rely that it would be captured in CommandError.

But likely it was not really meant to capture because there is code like https://github.com/datalad/datalad/blob/HEAD/datalad/dataset/gitrepo.py#L489 in call_git_items_ which uses git_ignore_check and wipes out stdout from the exception since None buffer is provided. So the Capture in the name might indeed not correspond to the purpose and may be name, not docstring/code to be adjusted ? Or should we fix up git_ignore_check to not wipe out a possible value present in the exception?

meanwhile I guess I would need to do that "capture output outside" like is done in the points of git_ignore_check usage... eh

yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Aug 12, 2022
…present

A little more discussion at datalad#6930 and initially
identified while working on datalad#6932 and being
unable to get CommandError.stdout for matching while using call_git (but works
with _call_git using different code path, which does not have this git_ignore_check).

One of the two locations where this function is used has stdout=None, so this
lead to wiping out of e.stdout even if it was present.  I think it was not intentional
but @christian-monch would know better
@yarikoptic yarikoptic changed the title DOCfix: fix docstring GeneratorStdOutErrCapture to say that treats both stound and stderr identically DOCfix: fix docstring GeneratorStdOutErrCapture to say that treats both stdout and stderr identically Aug 18, 2022
@christian-monch
Copy link
Contributor

but then getting back to "if it does capture", looking at

            def pipe_data_received(self, fd, data):
                if fd in (1, 2):
                    self.send_result((fd, self.decode(fd, data, self.encoding)))
                else:
                    StdOutErrCapture.pipe_data_received(self, fd, data)

it seems it would capture everything but 1 and 2 in this case. if that is what is intended, then if CommandError is raised while processing output in runner/nonasyncrunner.py(273)_check_result()->None then that CommandError lacks any stdout/stderr content for analysis. That means that code outside must iterate and capture all the output by itself, and cannot rely that it would be captured in CommandError.

Let me quickly describe the dataflow in the basic protocols and the generator protocols:

All protocols

Level 0:
ThreadedRunner.process_queue() calls
runner.protocol.pipe_data_received(file_number, data)
for file_number 1 if 'protocol.proc_out' is True, for file_number 2 if 'protocol.proc_err' is True

Now there is a distinction between basic protocols and generator protocols. Let's handle the basic protocols first:

Basic protocols

Level 1 .. n:
Protocol.protocol.pipe_data_received() defines further handling

Generator protocols

Level 1:
protocol.pipe_data_received() can decide to either:

  • Call GeneratorProtocol.send_result() with the received data. That leads to yielding the result from the generator eventually.

  • Not call GeneratorProtocol.send_result(), the data will never be yielded by the generator, and might be processed differently.

datalad/dataset/gitrepo.py:305: defines a generator-protocol. All stdout and stderr data is sent to the generator queue and eventually yielded. Other file descriptors that 1 and 2 should actually not be passed to pipe_data_received, if they appear, their processing is delegated to StdOutErrCapture. This is the intended behavior (at least what I intended), stdout and stderr data is yielded as soon as possible. Consequently, they are not stored in the protocol entity and cannot be retrieved in the error case.

W.r.t. naming. The protocol might better be called GeneratorStdOutErrYielding.

The docstring is definitely wrong (there was an implementation before that captured stderr in a provided buffer and yielded stdout).

But likely it was not really meant to capture because there is code like https://github.com/datalad/datalad/blob/HEAD/datalad/dataset/gitrepo.py#L489 in call_git_items_ which uses git_ignore_check and wipes out stdout from the exception since None buffer is provided. So the Capture in the name might indeed not correspond to the purpose and may be name, not docstring/code to be adjusted ? Or should we fix up git_ignore_check to not wipe out a possible value present in the exception?

meanwhile I guess I would need to do that "capture output outside" like is done in the points of git_ignore_check usage... eh

Yes, we should do it as is done with git_ignore_check, i.e. assemble stderr buffer on the outside.

Copy link
Contributor

@christian-monch christian-monch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an artifact from a test implementations. Thx

@yarikoptic yarikoptic merged commit 35e184a into datalad:maint Aug 24, 2022
@yarikoptic yarikoptic deleted the bf-docgen branch October 14, 2022 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DX developer experience semver-documentation Changes only affect the documentation, no impact on version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants