Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitRepo.fetch/pull/push() without GitPython #4087

Merged
merged 8 commits into from Feb 14, 2020
Merged

Conversation

mih
Copy link
Member

@mih mih commented Jan 26, 2020

Sit on top of #4080 to show that implementation can generalize, but is kept as a separate PR, because discussions might be needed.

Logic stays largely the same. Differences are:

  • no longer returns "fetch_info" structures, which are internal to
    GitPython -- unclear of a replacement is actually needed. Will be done
    once it becomes necessary.
  • additional progressbar wrapping the fetch of multiple remotes
  • kwargs are ignored for the moment. Needs discussion on which ones are
    needed, or if all should be attempted

TODO

  • needs SSH connection sharing setup
  • deduplicate code between fetch, pull, and push (do at the end, when implementations are known to be complete), likely introduce a common helper method
  • add support for arbitrary kwargs for fetch, pull, and push
  • needs result reporting(GitPython *_info replacement) -- current code base only uses it for error reporting.
    • for fetch
    • for push

@mih mih added this to In progress in Refurbish `publish` via automation Jan 26, 2020
@mih mih changed the title GitRepo.fetch() without GitPython GitRepo.fetch/pull() without GitPython Jan 26, 2020
@codecov
Copy link

@codecov codecov bot commented Jan 26, 2020

Codecov Report

Merging #4087 into master will decrease coverage by 4.18%.
The diff coverage is 88.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4087      +/-   ##
==========================================
- Coverage   89.68%   85.49%   -4.19%     
==========================================
  Files         275      276       +1     
  Lines       37084    37167      +83     
==========================================
- Hits        33258    31777    -1481     
- Misses       3826     5390    +1564
Impacted Files Coverage Δ
datalad/core/distributed/clone.py 91.3% <ø> (+0.64%) ⬆️
datalad/distribution/publish.py 84.26% <100%> (-3.5%) ⬇️
datalad/support/tests/test_gitrepo.py 99.45% <100%> (-0.33%) ⬇️
datalad/support/gitrepo.py 90.44% <87.5%> (+0.91%) ⬆️
datalad/tests/utils_testdatasets.py 11.76% <0%> (-88.24%) ⬇️
datalad/interface/tests/test_rerun_merges.py 18.24% <0%> (-81.76%) ⬇️
datalad/interface/tests/test_annotate_paths.py 35.5% <0%> (-64.5%) ⬇️
datalad/plugin/tests/test_addurls.py 58.97% <0%> (-41.03%) ⬇️
datalad/metadata/tests/test_base.py 61.11% <0%> (-38.2%) ⬇️
datalad/distribution/tests/test_get.py 62.5% <0%> (-37.5%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217f977...95dd723. Read the comment docs.

@mih mih changed the title GitRepo.fetch/pull() without GitPython GitRepo.fetch/pull/push() without GitPython Jan 26, 2020
@mih mih changed the title GitRepo.fetch/pull/push() without GitPython [WIP] GitRepo.fetch/pull/push() without GitPython Jan 26, 2020
@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 6, 2020

I haven't had a chance to review the updates, but I can confirm that the progress bars seem to be functioning nicely on my end now. Thanks.

Edit: Sorry, this was supposed to go to gh-4080.

@mih mih added this to the 0.12.3 milestone Feb 7, 2020
@mih
Copy link
Member Author

@mih mih commented Feb 7, 2020

I will stop here. The PR is already largish. Let's see what the tests will say.

@mih mih changed the title [WIP] GitRepo.fetch/pull/push() without GitPython GitRepo.fetch/pull/push() without GitPython Feb 7, 2020
@mih mih added merge-if-ok OP considers this work done, and requests review/merge and removed feedback-desired labels Feb 7, 2020
@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 7, 2020

[ 2 cents from one reviewer ] I find series like gh-4080 to be unpleasant and hard to review due to having lots of fixup commits piled on top (i.e., the github-encouraged workflow where the end result that is merged is considered the unit of change, not the series itself). My opinion about this isn't new of course, and I in general don't make such comments because it conflicts with the workflow used by this project.

But, given you're rebasing this PR on gh-4080 anyway, I would find it lovely if you would clean up some of the fixup commits (just at this initial stage). For example, the now non-existent LeanRunner is used instead of WitlessRunner until the 13th commit of the series.

@mih
Copy link
Member Author

@mih mih commented Feb 7, 2020

@kyleam Will do! Thx for the comment.

@mih
Copy link
Member Author

@mih mih commented Feb 7, 2020

@kyleam Condensed the history. Unfortunately, the changes were very interdependent, so I could not achieve the desired granularity without going into conflict resolution (will pay more attention to this in the future and squash while I am going).

@mih
Copy link
Member Author

@mih mih commented Feb 8, 2020

FTR: Drop in coverage are conditional in the Push/FetchInfo parser code that we do not exercise in our tests.

git_options=git_options)

# XXX Consider removing this method. It is only used in `update()`,
# where it could be easily replaced with fetch+merge
Copy link
Contributor

@kyleam kyleam Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer used in update() after gh-4118.

Copy link
Member Author

@mih mih Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Copy link
Member Author

@mih mih Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# commit handling, could be message or commit info
old_commit = None
if summary.startswith('['):
Copy link
Contributor

@kyleam kyleam Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this string matching of git's porcelain (and potentially translated) output makes me cringe, but this of course is taken from gitpython and is what already happens underneath, so we're not in any worse position. Plus, I can't think of a way to avoid it, aside from my unpopular opinion that we should relay git's output untouched.

Copy link
Member Author

@mih mih Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence whether I should start arguing this way. I decided to provide this implementation to bridge the time between now and a decision that we should make.

If our approach with publish and update is that we make precise calls that do exactly (and only) what we need -- or -- that we just want to report to a user what was done -- I perfectly agree with you and we should stop with this business ASAP.

If, OTOH, we want to be able to "just push" and then be able to react on it at the datalad-level, we need to keep doing something like this.

This PR is helping to move towards a better publish(), which in the future should be able to emit proper results that could be used with hooks. They would require appropriate machine-interpretable reporting.

@mih mih mentioned this pull request Feb 11, 2020
@@ -220,7 +220,7 @@ def test_publish_plain_git(origin, src_path, dst_path):
# amend and change commit msg in order to test for force push:
source.repo.commit("amended", options=['--amend'])
# push should be rejected (non-fast-forward):
assert_raises(IncompleteResultsError,
assert_raises(CommandError,
publish, dataset=source, to='target', result_xfm='datasets')
Copy link
Contributor

@kyleam kyleam Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in behavior looks problematic both from a presentation standpoint, as well as a results handling standpoint. For example, consider this script that sets up a superdataset with two subdatasets, one where the push will fail.

script
#!/bin/sh

set -eux

cd "$(mktemp -d --tmpdir dl-XXXXXXX)"
datalad create origin
(
    cd origin
    datalad create z-fail
    datalad create a-pass
    datalad save --recursive
    git config receive.denyCurrentBranch false
    git -C a-pass config receive.denyCurrentBranch false
)

datalad install --recursive --source origin loc

(
    cd loc
    git -C z-fail commit --allow-empty -m c-z-fail
    git -C a-pass commit --allow-empty -m c-a-pass
    datalad save --recursive
    datalad --on-failure=continue publish --recursive || :
    git log --graph --decorate --oneline master origin/master
    git -C z-fail log --graph --decorate --oneline master origin/master
    git -C a-pass log --graph --decorate --oneline master origin/master
)

Before (specifically on 7888e04), datalad --on-failure=continue publish --recursive would show this output:

[...]
+ datalad --on-failure=continue publish --recursive
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc/z-fail> to origin 
[ERROR] failed to push to origin: master -> origin/master [remote rejected] (branch is currently checked out); pushed: ['58337eb..641a3f5'] [publish(/tmp/dl-FqwYUhV/loc/z-fail)] 
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc/a-pass> to origin 
[INFO] Publishing <Dataset path=/tmp/dl-FqwYUhV/loc> to origin 
publish(error): z-fail (dataset) [failed to push to origin: master -> origin/master [remote rejected] (branch is currently checked out); pushed: ['58337eb..641a3f5']]
publish(ok): a-pass (dataset) [pushed to origin: ['1260c8d..31794ac', 'f1e353f..9d1579c']]
publish(ok): . (dataset) [pushed to origin: ['30562bc..eb62938', '8d86f4f..4a88467']]
action summary:
  publish (error: 1, ok: 2)
+ :
+ git log --graph --decorate --oneline master origin/master
* 4a88467 (HEAD -> master, origin/master, origin/HEAD, synced/master) [DATALAD] Recorded changes
* 8d86f4f [DATALAD] Recorded changes
* 0e32a6d [DATALAD] new dataset
+ git -C z-fail log --graph --decorate --oneline master origin/master
* cace822 (HEAD -> master, synced/master) c-z-fail
* ccb080f (origin/master, origin/HEAD) [DATALAD] new dataset
+ git -C a-pass log --graph --decorate --oneline master origin/master
* 9d1579c (HEAD -> master, origin/master, origin/HEAD, synced/master) c-a-pass
* f1e353f [DATALAD] new dataset

It displayed a result error for the failing subdataset, and, due to the --on-failure=continue, it published all the subdatasets that it could.

Now it dumps the command error and doesn't honor --on-failure=continue:

[...]
+ datalad --on-failure=continue publish --recursive
[INFO] Publishing <Dataset path=/tmp/dl-yQVMqHj/loc/z-fail> to origin 
[INFO] Start fetching remotes for <AnnexRepo path=/tmp/dl-yQVMqHj/loc/z-fail (<class 'datalad.support.annexrepo.AnnexRepo'>)> 
[... 36 lines ...]
[INFO] Finished writing objects 
[INFO] Finished pushing remotes for <AnnexRepo path=/tmp/dl-yQVMqHj/loc/z-fail (<class 'datalad.support.annexrepo.AnnexRepo'>)> 
CommandError: command '['git', 'push', '--progress', '--porcelain', 'origin', 'master', 'git-annex']' failed with exitcode 1
Failed to run ['git', 'push', '--progress', '--porcelain', 'origin', 'master', 'git-annex'] at '/tmp/dl-yQVMqHj/loc/z-fail'.
stdout=To /tmp/dl-yQVMqHj/origin/z-fail
 	refs/heads/git-annex:refs/heads/git-annex	1cfffab..2657d54
!	refs/heads/master:refs/heads/master	[remote rejected] (branch is currently checked out)
[... 18 lines ...]
To /tmp/dl-yQVMqHj/origin/z-fail
 	refs/heads/git-annex:refs/heads/git-annex	1cfffab..2657d54
!	refs/heads/master:refs/heads/master	[remote rejected] (branch is currently checked out)
[... 16 lines ...]
error: failed to push some refs to '/tmp/dl-yQVMqHj/origin/z-fail'
+ :
+ git log --graph --decorate --oneline master origin/master
* a8bbb28 (HEAD -> master) [DATALAD] Recorded changes
* c016fc6 (origin/master, origin/HEAD) [DATALAD] Recorded changes
* cda0d7e [DATALAD] new dataset
+ git -C z-fail log --graph --decorate --oneline master origin/master
* f560fa2 (HEAD -> master, synced/master) c-z-fail
* 7f41170 (origin/master, origin/HEAD) [DATALAD] new dataset
+ git -C a-pass log --graph --decorate --oneline master origin/master
* 7f12cdd (HEAD -> master) c-a-pass
* ab068e5 (origin/master, origin/HEAD) [DATALAD] new dataset

Copy link
Member Author

@mih mih Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I need to reconsider the error reporting. Thx!

Copy link
Member Author

@mih mih Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and pushed. be67cc3

Copy link
Contributor

@kyleam kyleam Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outcome of the script I posted above is the same after 57cc112 (BF: Restore previous behavior on push errors, 2020-02-13). I haven't really looked into it, but I'd guess that is because a CommandError is raised unless a specific error is matched. When I initially raised the issue, I had assumed the publish._push would need to catch the CommandError's in order to retain the previous behavior.

mih added 5 commits Feb 13, 2020
Following established conventions, GitRepo.fetch/pull/push_() yields,
while GitRepo.fetch/pull/push() returns a list. Only the latter retains
the ability to accept GitPython-style kwargs.

Logic stays largely the same. Differences are:

- no longer returns "fetch/push_info" structures, which are internal to
  GitPython, instead
  - use dict filled by parser based on a simplified version of GitPython's
    parser
  - no resolving/decoding against repository, just a straight-up parsing
    of the output
  - use of literal string labels (like elsewhere in DataLad), instead of
    bitwise codes
- additional progressbar wrapping the fetch/push of multiple remotes
- homogenize behavior and args across methods
- disable test for presence of undesired GitPython internals
- deprecated GitPython-style kwargs, instead of a call_git...()-like
  list of options, but kept functional.
- no introspection of options, sanity-checks, workarounds are left
  to a caller.
@mih
Copy link
Member Author

@mih mih commented Feb 13, 2020

OK, this seems to be complete. Unless there are objections or further requests, I'd merge this and move on to the next task towards 0.12.3

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Feb 13, 2020

travis is not entirely happy. nose output also became noisy (is that all non-captured stderr since stdout should be captured by nose without -s):

2130.91s$ http_proxy= PATH=$PWD/../tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -A "$NOSE_SELECTION_OP($NOSE_SELECTION)" --with-doctest --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
...............
...................................................S........S..............................................................Updating 444c732..c085d68
Fast-forward
 .gitmodules | 5 +++++
 subds1      | 1 +
 2 files changed, 6 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 subds1
......................................S...............................Updating afb8153..f86a22a
Fast-forward
 file.dat | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 file.dat
.Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/resources/procedures/cfg_yoda.py", line 39, in <module>
    [s['path'] for s in dirty]
RuntimeError: Stopping, because to be modified dataset content was found dirty: ['/tmp/datalad_temp_tree_test_dirty07ptwl_a/README.md']
.....................................S......S.....S.........S...................................................SS...................S.SSSSSS...........................F..............S..S............S..S.......Updating b74ece9..4037a37
Fast-forward
 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| " | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| "
Updating 273992f..4d4eba8
Fast-forward
 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| " | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| "
Updating cdb3e32..5a2efe2
Fast-forward
 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| " | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| "
Updating 8eab75b..7418535
Fast-forward
 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| " | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| "
Updating d7acdca..da3d73c
Fast-forward
 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| " | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 " \"';a&b&c\316\224\320\231\327\247\331\205\340\271\227\343\201\202 `| "
......................................S..............................ssh: Could not resolve hostname localhost: Name or service not known
...CommandError: command '['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/4f9e64b9', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42']' failed with exitcode 42
Failed to run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/4f9e64b9', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42'] under None. Exit code=42.
stdout=
stderr=
.......S...........................................................................S.SSSS........................................S...S................................................S.................
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
.......SS.....................................................SS..S..............................................SS...S........S.......
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_GitRepo_ssh_push
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/tests/utils.py", line 161, in newfunc
    return func(*args, **kwargs)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/tests/utils.py", line 213, in newfunc
    return func(*args, **kwargs)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/tests/utils.py", line 732, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/tests/utils.py", line 732, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/support/tests/test_gitrepo.py", line 651, in test_GitRepo_ssh_push
    assert_raises(CommandError, repo.push, remote="ssh-remote", refspec="ssh-test")
AssertionError: CommandError not raised by push
here is a lesser noisiness of a recent master build for comparison

https://travis-ci.org/datalad/datalad/jobs/649863150?utm_medium=notification&utm_source=github_status

1748.10s$ http_proxy= PATH=$PWD/../tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -A "$NOSE_SELECTION_OP($NOSE_SELECTION)" --with-doctest --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
...............
.Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/datalad/resources/procedures/cfg_yoda.py", line 39, in <module>
    [s['path'] for s in dirty]
RuntimeError: Stopping, because to be modified dataset content was found dirty: ['/tmp/datalad_temp_tree_test_dirtyg188_9uf/README.md']
.....................................S......S.....S.........S...................................................SS...................S.SSSSSS..........................................S..S............S..S.............................................S..............................ssh: Could not resolve hostname localhost: Name or service not known
...CommandError: command '['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/d5d79049', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42']' failed with exitcode 42
Failed to run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/d5d79049', 'localhost', 'export "PATH=/usr/lib/git-annex.linux:$PATH"; exit 42'] under None. Exit code=42.
stdout=
stderr=
.......S...........................................................................S.SSSS........................................S...S................................................S.................
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
.......SS.....................................................SS..S..............................................SS...S........S.......

@mih
Copy link
Member Author

@mih mih commented Feb 14, 2020

I dunno about the tests, but the actual interactive use looks as desired:

% rm -rf dummy; mkdir -p dummy/src; mkdir dummy/target; git -C dummy/src init; git -C dummy/src remote add origin https://github.com/datalad/datalad.git; datalad -C dummy/src update -s origin --merge; git -C dummy/target init --bare; git -C dummy/src remote add pushtest localhost:$(readlink -f dummy/target); datalad -C dummy/src publish --to pushtest

Initialized empty Git repository in /tmp/dummy/src/dummy/src/.git/
[INFO   ] Fetching updates for <Dataset path=/tmp/dummy/src/dummy/src> 
[INFO   ] Applying updates to <Dataset path=/tmp/dummy/src/dummy/src>                                                        
update(ok): . (dataset)                                                                                                      
datalad -C dummy/src update -s origin --merge  5.68s user 0.78s system 62% cpu 10.291 total
Initialized empty Git repository in /tmp/dummy/src/dummy/target/
[INFO   ] Publishing <Dataset path=/tmp/dummy/src/dummy/src> to pushtest                                                     
publish(ok): . (dataset) [pushed to pushtest: ['[new branch]']]                                                              

Test is fixed, should have been reverted to previous behavior of reporting an error instead of raising an exception -- like @kyleam had pointed out in #4087 (comment)

Make uniform with fetch() too.
@mih
Copy link
Member Author

@mih mih commented Feb 14, 2020

The stdout leakage happened in GitRepo.pull(). I fixed it, although it is unused after gh-4118 and is likely going to be removed soon.

@mih
Copy link
Member Author

@mih mih commented Feb 14, 2020

Alrighty, here it comes...

@mih mih merged commit 2e179b5 into datalad:master Feb 14, 2020
15 of 17 checks passed
Refurbish `publish` automation moved this from In progress to Done Feb 14, 2020
@mih mih deleted the rf-fetch branch Feb 14, 2020
@mih mih mentioned this pull request Feb 14, 2020
15 tasks
@yarikoptic yarikoptic modified the milestones: 0.12.3, 0.12.x Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants