Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WitlessRunner() to replace gitpy.Repo.clone_from() with plain git-clone call, but keep progress reporting #4080

Merged
merged 44 commits into from Feb 7, 2020

Conversation

mih
Copy link
Member

@mih mih commented Jan 24, 2020

  • actually replace GitPython in clone()
  • implement a leaner runner that is capable of pulling out progress info from a git clone run
  • implement a progress info extractor for git-clone based on GitPython's approach, but simplified
  • bring back progress reporting for `GitRepo.clone()
  • document WitlessRunner
  • find answer to test WTF
  • show that this is sufficient for fetch/pull/push #4087 (concrete usage needs more work, but no structural limitation were identified, which was the aim within the context of this PR)
  • basic tests
  • decide whether to capture or kill all process output, instead of relaying it to the parent by default -- decided: better to relay then to hide by default.
  • works just as well on windows

@codecov
Copy link

@codecov codecov bot commented Jan 24, 2020

Codecov Report

Merging #4080 into master will decrease coverage by 0.05%.
The diff coverage is 94.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4080      +/-   ##
==========================================
- Coverage   89.67%   89.61%   -0.06%     
==========================================
  Files         274      275       +1     
  Lines       36744    36997     +253     
==========================================
+ Hits        32951    33156     +205     
- Misses       3793     3841      +48
Impacted Files Coverage Δ
datalad/log.py 89.9% <ø> (ø) ⬆️
datalad/support/exceptions.py 82.45% <100%> (ø) ⬆️
datalad/support/tests/test_annexrepo.py 94.6% <100%> (-1.16%) ⬇️
datalad/core/distributed/clone.py 92.24% <100%> (-0.84%) ⬇️
datalad/cmd.py 91.49% <86.44%> (-0.8%) ⬇️
datalad/support/gitrepo.py 90.09% <96.07%> (+0.35%) ⬆️
datalad/tests/test_witless_runner.py 98.83% <98.83%> (ø)
datalad/support/tests/test_cookies.py 85.71% <0%> (-14.29%) ⬇️
datalad/support/tests/utils.py 84.61% <0%> (-11.54%) ⬇️
datalad/interface/tests/test_unlock.py 96.19% <0%> (-1.91%) ⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e59f35c...f0b1f55. Read the comment docs.

@mih mih marked this pull request as ready for review Jan 25, 2020
@mih mih changed the title RF: Replace gitpy.Repo.clone_from() with plain git-clone call LeanRunner() to replace gitpy.Repo.clone_from() with plain git-clone call, but keep progress reporting Jan 25, 2020
@mih
Copy link
Member Author

@mih mih commented Jan 25, 2020

Peculiar test failures:

datalad.distribution.tests.test_install:test_install_subds_from_another_remote
datalad.distribution.tests.test_install:test_install_subds_with_space

[ERROR  ] Failed to clone from any candidate source URL. Encountered errors per each url were:
| - localhost:/tmp/datalad_temp_test_install_subds_with_space9hgjc1_9
  CommandError: command '['git', 'clone', '--progress', 'localhost:/tmp/datalad_temp_test_install_subds_with_space9hgjc1_9', '/tmp/datalad_temp_test_install_subds_with_spaceyjzaxonl']' failed with exitcode 128
Failed to run ['git', 'clone', '--progress', 'localhost:/tmp/datalad_temp_test_install_subds_with_space9hgjc1_9', '/tmp/datalad_temp_test_install_subds_with_spaceyjzaxonl'].
stdout=
stderr=Cloning into '/tmp/datalad_temp_test_install_subds_with_spaceyjzaxonl'...
datalad sshrun: 1: datalad: not found
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Don't even know why it would attempt to run ssrun... But I also see that locally.

@mih
Copy link
Member Author

@mih mih commented Jan 26, 2020

Don't even know why it would attempt to run ssrun... But I also see that locally.

Because we place ssrun as GIT_SSH_COMMAND in the environment -- and that makes a lot of sense. However, it turns out that Runner(env=) was effectively incremental, and inheriting the original environment. For LeanRunner() I thought this would limit its flexibility, hence in the clone() case we need to inherit explicitly when we aim to patch/ammend it with the SSH helper.

@mih mih added the merge-if-ok OP considers this work done, and requests review/merge label Jan 26, 2020
@mih
Copy link
Member Author

@mih mih commented Jan 26, 2020

Edit: all of this was resolved

The log monster travis run now exceeds maximum log length. Not sure how to mitigate without going against the purpose of its existence.

Otherwise all green. Will now look into using the same approach for fetch() to see if it generalizes sufficiently.

Update: Didn't go well, because I fail to make git fetch give progress info when not connected to the terminal:

% rm -rf dummy; mkdir -p dummy; git -C dummy init; git -C dummy remote add origin https://github.com/datalad/datalad.git; git -C dummy fetch --progress --all 2> fetch_log.txt && grep 'Receiving' fetch_log.txt || echo Nothing
Nothing

But it does report without --all. Hmmm...

% rm -rf dummy; mkdir -p dummy; git -C dummy init; git -C dummy remote add origin https://github.com/datalad/datalad.git; git -C dummy fetch --progress 2> fetch_log.txt && grep 'Receiving' fetch_log.txt ||echo "Nothing"
Receiving objects: 100% (66974/66974), 20.06 MiB | 4.35 MiB/s, done.

Maybe that is why gitpy doesn't support --all in fetch...

@mih
Copy link
Member Author

@mih mih commented Jan 26, 2020

This is all good now from my POV.

Copy link
Member

@yarikoptic yarikoptic left a comment

Sweet! tested locally -- seems to work. I think any "adopted" code should get some note about original copyright/license
Also note that

  • besides "testing in action" with clone, there is no single dedicated test to LeanRunner's functionality. And because of that - there is no coverage for some exception handling in the new code and the case where neither stdout/stderr handling is provided. So "surprises" might resurface later . It would be great to at least get the basic case of run without handling + the keyboardinterrupt (could patch mock subprocess.Popen to return an object which would raise KeyboardInterrupt upon .poll())
  • it calls itself Lean, but no performance comparison against regular Runner... may be it would be Bulkier but speedier (or not) -- it could be super useful to know as to guide further adoption of LeanRunner as a possible replacement for a Runner

datalad/support/gitrepo.py Show resolved Hide resolved
@mih
Copy link
Member Author

@mih mih commented Jan 27, 2020

Sweet! tested locally -- seems to work.

It also passes the tests!

They surface all the time, hence #4087

It would be great to at least get the basic case of run without handling + the keyboardinterrupt (could patch mock subprocess.Popen to return an object which would raise KeyboardInterrupt upon .poll())

I will look into porting some of the tests, once I have abandoned the plan to replace Runner() entirely with this code and use all of its tests instead .... hold my beer....

For now manual:

Traceback (most recent call last):                                                           
  File "cloneme.py", line 4, in <module>
    gr = GitRepo.clone(sys.argv[1], sys.argv[2])
  File "/home/mih/hacking/datalad/git/datalad/support/gitrepo.py", line 1083, in clone
    proc_stderr=progress,
  File "/home/mih/hacking/datalad/git/datalad/cmd.py", line 274, in run
    raise exc_info[1]
  File "/home/mih/hacking/datalad/git/datalad/cmd.py", line 207, in run
    timeout=self._poll_period,
  File "/usr/lib/python3.7/subprocess.py", line 964, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.7/subprocess.py", line 1715, in _communicate
    ready = selector.select(timeout)
  File "/usr/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
  • it calls itself Lean, but no performance comparison against regular Runner... may be it would be Bulkier but speedier (or not) -- it could be super useful to know as to guide further adoption of LeanRunner as a possible replacement for a Runner

I will rename to LessCodeAndLessFeaturesRunner for clarity ;-)

@mih
Copy link
Member Author

@mih mih commented Jan 28, 2020

I have renamed the class to something that can no longer be mistaken for any ambition.

mih added 14 commits Jan 28, 2020
This temporarily removes progress reporting for clone operations.

Ping dataladgh-2970 dataladgh-2879 dataladgh-450
Some usage encodes these into the message to achieve the same affect,
for the price of duplication or omission from the structured report.
Would happen whenever the "update"-amount is zero, which is not
inconceivable for a progress report.
It aims to achieve the minimum necessary to drive a command, and to
process its output online (for progress reporting, etc.)
We need to inherit the rest of the environment to not cause unintended
breakage (virtual env setup etc.)
Also fix call to log() which takes substitutions as args, not as a
tuple.
To make clearer that no fixed frequency is maintained.
@mih
Copy link
Member Author

@mih mih commented Feb 5, 2020

ValueError: Invalid file object: <_io.BufferedReader name=4>

This crash is no longer happening with the current implementation. I also verified that manual input reaches SSH and a clone succeeds.

mih added 3 commits Feb 5, 2020
Too large API conflicts suggest that inheritence from WitlessRunner
is not appropriate.
In case stderr is not explicitly scanned for relevant content, this
ensures that '-l debug' would bring them up in any case.

Also classify lines starting with 'warning:' as non-progress
immediately.
@mih
Copy link
Member Author

@mih mih commented Feb 5, 2020

This might be working now :)

@mih mih requested review from yarikoptic and kyleam Feb 5, 2020
datalad/cmd.py Show resolved Hide resolved
@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 6, 2020

Guh, last night I posted to the wrong pr:

I haven't had a chance to review the updates, but I can confirm that the progress bars seem to be functioning nicely on my end now. Thanks.

@mih mih requested a review from kyleam Feb 6, 2020
datalad/cmd.py Outdated Show resolved Hide resolved
kyleam
kyleam approved these changes Feb 6, 2020
Copy link
Contributor

@kyleam kyleam left a comment

Thanks for the updates. I think this is in a good place, and the progress bars have shown up nicely for everything I've tried to clone.

@mih
Copy link
Member Author

@mih mih commented Feb 7, 2020

Thx @kyleam Will merge and continue in #4087

@mih mih merged commit 653406e into datalad:master Feb 7, 2020
17 checks passed
@mih mih deleted the rf-plain-gitclone branch Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants