New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitRepo.fetch/pull/push() without GitPython #4087
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4087 +/- ##
==========================================
- Coverage 89.68% 85.49% -4.19%
==========================================
Files 275 276 +1
Lines 37084 37167 +83
==========================================
- Hits 33258 31777 -1481
- Misses 3826 5390 +1564
Continue to review full report at Codecov.
|
I haven't had a chance to review the updates, but I can confirm that the progress bars seem to be functioning nicely on my end now. Thanks. Edit: Sorry, this was supposed to go to gh-4080. |
I will stop here. The PR is already largish. Let's see what the tests will say. |
[ 2 cents from one reviewer ] I find series like gh-4080 to be unpleasant and hard to review due to having lots of fixup commits piled on top (i.e., the github-encouraged workflow where the end result that is merged is considered the unit of change, not the series itself). My opinion about this isn't new of course, and I in general don't make such comments because it conflicts with the workflow used by this project. But, given you're rebasing this PR on gh-4080 anyway, I would find it lovely if you would clean up some of the fixup commits (just at this initial stage). For example, the now non-existent LeanRunner is used instead of WitlessRunner until the 13th commit of the series. |
@kyleam Will do! Thx for the comment. |
@kyleam Condensed the history. Unfortunately, the changes were very interdependent, so I could not achieve the desired granularity without going into conflict resolution (will pay more attention to this in the future and squash while I am going). |
FTR: Drop in coverage are conditional in the Push/FetchInfo parser code that we do not exercise in our tests. |
git_options=git_options) | ||
|
||
# XXX Consider removing this method. It is only used in `update()`, | ||
# where it could be easily replaced with fetch+merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no longer used in update()
after gh-4118.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# commit handling, could be message or commit info | ||
old_commit = None | ||
if summary.startswith('['): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this string matching of git's porcelain (and potentially translated) output makes me cringe, but this of course is taken from gitpython and is what already happens underneath, so we're not in any worse position. Plus, I can't think of a way to avoid it, aside from my unpopular opinion that we should relay git's output untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the fence whether I should start arguing this way. I decided to provide this implementation to bridge the time between now and a decision that we should make.
If our approach with publish
and update
is that we make precise calls that do exactly (and only) what we need -- or -- that we just want to report to a user what was done -- I perfectly agree with you and we should stop with this business ASAP.
If, OTOH, we want to be able to "just push" and then be able to react on it at the datalad-level, we need to keep doing something like this.
This PR is helping to move towards a better publish()
, which in the future should be able to emit proper results that could be used with hooks. They would require appropriate machine-interpretable reporting.
@@ -220,7 +220,7 @@ def test_publish_plain_git(origin, src_path, dst_path): | |||
# amend and change commit msg in order to test for force push: | |||
source.repo.commit("amended", options=['--amend']) | |||
# push should be rejected (non-fast-forward): | |||
assert_raises(IncompleteResultsError, | |||
assert_raises(CommandError, | |||
publish, dataset=source, to='target', result_xfm='datasets') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change in behavior looks problematic both from a presentation standpoint, as well as a results handling standpoint. For example, consider this script that sets up a superdataset with two subdatasets, one where the push will fail.
script
#!/bin/sh
set -eux
cd "$(mktemp -d --tmpdir dl-XXXXXXX)"
datalad create origin
(
cd origin
datalad create z-fail
datalad create a-pass
datalad save --recursive
git config receive.denyCurrentBranch false
git -C a-pass config receive.denyCurrentBranch false
)
datalad install --recursive --source origin loc
(
cd loc
git -C z-fail commit --allow-empty -m c-z-fail
git -C a-pass commit --allow-empty -m c-a-pass
datalad save --recursive
datalad --on-failure=continue publish --recursive || :
git log --graph --decorate --oneline master origin/master
git -C z-fail log --graph --decorate --oneline master origin/master
git -C a-pass log --graph --decorate --oneline master origin/master
)
Before (specifically on 7888e04), datalad --on-failure=continue publish --recursive
would show this output:
[...]
+ datalad --on-failure=continue publish --recursive
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc/z-fail> to origin
[ERROR] failed to push to origin: master -> origin/master [remote rejected] (branch is currently checked out); pushed: ['58337eb..641a3f5'] [publish(/tmp/dl-FqwYUhV/loc/z-fail)]
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc/a-pass> to origin
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc> to origin
publish(error): z-fail (dataset) [failed to push to origin: master -> origin/master [remote rejected] (branch is currently checked out); pushed: ['58337eb..641a3f5']]
publish(ok): a-pass (dataset) [pushed to origin: ['1260c8d..31794ac', 'f1e353f..9d1579c']]
publish(ok): . (dataset) [pushed to origin: ['30562bc..eb62938', '8d86f4f..4a88467']]
action summary:
publish (error: 1, ok: 2)
+ :
+ git log --graph --decorate --oneline master origin/master
* 4a88467 (HEAD -> master, origin/master, origin/HEAD, synced/master) [DATALAD] Recorded changes
* 8d86f4f [DATALAD] Recorded changes
* 0e32a6d [DATALAD] new dataset
+ git -C z-fail log --graph --decorate --oneline master origin/master
* cace822 (HEAD -> master, synced/master) c-z-fail
* ccb080f (origin/master, origin/HEAD) [DATALAD] new dataset
+ git -C a-pass log --graph --decorate --oneline master origin/master
* 9d1579c (HEAD -> master, origin/master, origin/HEAD, synced/master) c-a-pass
* f1e353f [DATALAD] new dataset
It displayed a result error for the failing subdataset, and, due to the --on-failure=continue
, it published all the subdatasets that it could.
Now it dumps the command error and doesn't honor --on-failure=continue
:
[...]
+ datalad --on-failure=continue publish --recursive
[INFO] Publishing <Dataset path=/tmp/dl-yQVMqHj/loc/z-fail> to origin
[INFO] Start fetching remotes for <AnnexRepo path=/tmp/dl-yQVMqHj/loc/z-fail (<class 'datalad.support.annexrepo.AnnexRepo'>)>
[... 36 lines ...]
[INFO] Finished writing objects
[INFO] Finished pushing remotes for <AnnexRepo path=/tmp/dl-yQVMqHj/loc/z-fail (<class 'datalad.support.annexrepo.AnnexRepo'>)>
CommandError: command '['git', 'push', '--progress', '--porcelain', 'origin', 'master', 'git-annex']' failed with exitcode 1
Failed to run ['git', 'push', '--progress', '--porcelain', 'origin', 'master', 'git-annex'] at '/tmp/dl-yQVMqHj/loc/z-fail'.
stdout=To /tmp/dl-yQVMqHj/origin/z-fail
refs/heads/git-annex:refs/heads/git-annex 1cfffab..2657d54
! refs/heads/master:refs/heads/master [remote rejected] (branch is currently checked out)
[... 18 lines ...]
To /tmp/dl-yQVMqHj/origin/z-fail
refs/heads/git-annex:refs/heads/git-annex 1cfffab..2657d54
! refs/heads/master:refs/heads/master [remote rejected] (branch is currently checked out)
[... 16 lines ...]
error: failed to push some refs to '/tmp/dl-yQVMqHj/origin/z-fail'
+ :
+ git log --graph --decorate --oneline master origin/master
* a8bbb28 (HEAD -> master) [DATALAD] Recorded changes
* c016fc6 (origin/master, origin/HEAD) [DATALAD] Recorded changes
* cda0d7e [DATALAD] new dataset
+ git -C z-fail log --graph --decorate --oneline master origin/master
* f560fa2 (HEAD -> master, synced/master) c-z-fail
* 7f41170 (origin/master, origin/HEAD) [DATALAD] new dataset
+ git -C a-pass log --graph --decorate --oneline master origin/master
* 7f12cdd (HEAD -> master) c-a-pass
* ab068e5 (origin/master, origin/HEAD) [DATALAD] new dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I need to reconsider the error reporting. Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and pushed. be67cc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outcome of the script I posted above is the same after 57cc112 (BF: Restore previous behavior on push errors, 2020-02-13). I haven't really looked into it, but I'd guess that is because a CommandError is raised unless a specific error is matched. When I initially raised the issue, I had assumed the publish._push
would need to catch the CommandError's in order to retain the previous behavior.
Following established conventions, GitRepo.fetch/pull/push_() yields, while GitRepo.fetch/pull/push() returns a list. Only the latter retains the ability to accept GitPython-style kwargs. Logic stays largely the same. Differences are: - no longer returns "fetch/push_info" structures, which are internal to GitPython, instead - use dict filled by parser based on a simplified version of GitPython's parser - no resolving/decoding against repository, just a straight-up parsing of the output - use of literal string labels (like elsewhere in DataLad), instead of bitwise codes - additional progressbar wrapping the fetch/push of multiple remotes - homogenize behavior and args across methods - disable test for presence of undesired GitPython internals - deprecated GitPython-style kwargs, instead of a call_git...()-like list of options, but kept functional. - no introspection of options, sanity-checks, workarounds are left to a caller.
OK, this seems to be complete. Unless there are objections or further requests, I'd merge this and move on to the next task towards 0.12.3 |
travis is not entirely happy. nose output also became noisy (is that all non-captured stderr since stdout should be captured by nose without -s):
here is a lesser noisiness of a recent master build for comparison1748.10s$ http_proxy= PATH=$PWD/../tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -A "$NOSE_SELECTION_OP($NOSE_SELECTION)" --with-doctest --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
...............
.Traceback (most recent call last):
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/resources/procedures/cfg_yoda.py", line 39, in <module>
[s['path'] for s in dirty]
RuntimeError: Stopping, because to be modified dataset content was found dirty: ['/tmp/datalad_temp_tree_test_dirtyg188_9uf/README.md']
.....................................S......S.....S.........S...................................................SS...................S.SSSSSS..........................................S..S............S..S.............................................S..............................ssh: Could not resolve hostname localhost: Name or service not known
...CommandError: command '['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/d5d79049', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42']' failed with exitcode 42
Failed to run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/d5d79049', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42'] under None. Exit code=42.
stdout=
stderr=
.......S...........................................................................S.SSSS........................................S...S................................................S.................
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
.......SS.....................................................SS..S..............................................SS...S........S....... |
I dunno about the tests, but the actual interactive use looks as desired:
Test is fixed, should have been reverted to previous behavior of reporting an error instead of raising an exception -- like @kyleam had pointed out in #4087 (comment) |
Make uniform with fetch() too.
The stdout leakage happened in |
Alrighty, here it comes... |
Sit on top of #4080 to show that implementation can generalize, but is kept as a separate PR, because discussions might be needed.
Logic stays largely the same. Differences are:
GitPython -- unclear of a replacement is actually needed. Will be done
once it becomes necessary.
needed, or if all should be attempted
TODO
fetch
,pull
, andpush
(do at the end, when implementations are known to be complete), likely introduce a common helper method