New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF/RF: GitRepo.push -- push first refspec separately first for github (or if remote.{remote}.datalad-push-default-first is set) #5010
Conversation
FTR: if I read this correctly, it will turn one push into two if more than one refspec needs pushing. This will double the effort for remote helpers like the OSF one (or the googledrive one), and essentially require two uploads of the entire repo per push. If that is true, it is expensive and better be worth the investment. |
Can you provide an example that triggers the undesired result? I tried with a range of repositories and failed create the situation that this change tries to prevent. |
I am not sure what other "examples" to provide beyond what I have provided (and linked to the original report by @jwodder, so I am not special really) in #5010 . FWIW: I thought that it might be somehow git version dependent, since that is the most relevant piece here. So I have tried a few (if we are still using the bundled with git annex by default, I have tried also with full dump of "examples"$> datalad wtf -S dependencies | grep git
- cmd:annex: 8.20200810+git143-gdee38c54d-1~ndall+1
- cmd:bundled-git: 2.24.0
- cmd:git: 2.24.0
- cmd:system-git: 2.27.0
- git: 3.1.0
- gitdb: 4.0.2
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28527.....................................:Sat 10 Oct 2020 07:09:22 PM EDT:.
(git)lena:~datalad/trash[master]git
$> DATALAD_USE_DEFAULT_GIT=1 datalad wtf -S dependencies | grep git
- cmd:annex: 8.20200810+git143-gdee38c54d-1~ndall+1
- cmd:bundled-git: 2.24.0
- cmd:git: 2.27.0
- cmd:system-git: 2.27.0
- git: 3.1.0
- gitdb: 4.0.2
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28528.....................................:Sat 10 Oct 2020 07:09:28 PM EDT:.
(git)lena:~datalad/trash[master]git
$> bash gh-default-branch
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-inv8GCu
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-inv8GCu/testgh
[INFO ] Scanning for unlocked files (this may take some time)
create(ok): /home/yoh/.tmp/dl-inv8GCu/testgh (dataset)
> cd testgh
> datalad create-sibling-github -s github --existing replace testgh
repository "testgh" already exists on GitHub.
Do you really want to remove it? (choices: yes, [no]): yes
[WARNING] Authentication failed using a token.
Do you really want to remove it? (choices: yes, [no]): yes
.: github(-) [https://github.com/yarikoptic/testgh.git (git)]
'https://github.com/yarikoptic/testgh.git' configured as sibling 'github' for Dataset(/home/yoh/.tmp/dl-inv8GCu/testgh)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-inv8GCu/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): /home/yoh/.tmp/dl-inv8GCu/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]
> datalad clone git://github.com/yarikoptic/testgh ../dest
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): /home/yoh/.tmp/dl-inv8GCu/dest (dataset)
> git -C ../dest branch -a
* git-annex
remotes/origin/HEAD -> origin/git-annex
remotes/origin/git-annex
remotes/origin/master
bash gh-default-branch 9.84s user 4.00s system 53% cpu 25.639 total
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28529.....................................:Sat 10 Oct 2020 07:10:02 PM EDT:.
(git)lena:~datalad/trash[master]git
$> DATALAD_USE_DEFAULT_GIT=1 bash gh-default-branch
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-0rQM0dS
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-0rQM0dS/testgh
[INFO ] Scanning for unlocked files (this may take some time)
create(ok): /home/yoh/.tmp/dl-0rQM0dS/testgh (dataset)
> cd testgh
> datalad create-sibling-github -s github --existing replace testgh
repository "testgh" already exists on GitHub.
Do you really want to remove it? (choices: yes, [no]): yes
[WARNING] Authentication failed using a token.
Do you really want to remove it? (choices: yes, [no]): yes
.: github(-) [https://github.com/yarikoptic/testgh.git (git)]
'https://github.com/yarikoptic/testgh.git' configured as sibling 'github' for Dataset(/home/yoh/.tmp/dl-0rQM0dS/testgh)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-0rQM0dS/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): /home/yoh/.tmp/dl-0rQM0dS/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]
> datalad clone git://github.com/yarikoptic/testgh ../dest
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): /home/yoh/.tmp/dl-0rQM0dS/dest (dataset)
> git -C ../dest branch -a
* git-annex
remotes/origin/HEAD -> origin/git-annex
remotes/origin/git-annex
remotes/origin/master
DATALAD_USE_DEFAULT_GIT=1 bash gh-default-branch 7.74s user 2.98s system 49% cpu 21.628 total
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28530.....................................:Sat 10 Oct 2020 07:10:30 PM EDT:.
(git)lena:~datalad/trash[master]git
$> apt-cache policy git-annex-standalone
git-annex-standalone:
Installed: 8.20200810+git143-gdee38c54d-1~ndall+1
Candidate: 8.20200908+git175-g95d02d6e2-1~ndall+1
Version table:
8.20200908+git175-g95d02d6e2-1~ndall+1 500
500 http://neuro.debian.net/debian-devel buster/main amd64 Packages
*** 8.20200810+git143-gdee38c54d-1~ndall+1 100
100 /var/lib/dpkg/status
7.20190819+git2-g908476a9b-1~ndall+1 500
500 http://neuro.debian.net/debian buster/main amd64 Packages
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28531.....................................:Sat 10 Oct 2020 07:10:36 PM EDT:.
(git)lena:~datalad/trash[master]git
$> sudo apt-get install git-annex-standalone=7.20190819+git2-g908476a9b-1~ndall+1
[sudo] password for yoh:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
containernetworking-plugins libboost-log1.67.0 libboost-regex1.67.0 libconfig-simple-perl libdbd-sqlite3-perl
libexif-dev libexif-doc libfs6 libgdal27 libgdcm-dev libgdcm3.0 libghc-aeson-compat-dev libghc-erf-dev
libghc-fingertree-dev libghc-reducers-dev libghc-rio-dev libghc-tf-random-dev libgphoto2-dev libgsoap-2.8.91
libicu65 libilmbase-dev liblept5 libnetcdf-c++4 libnetcdf15 libnx-x11-6 libopencv-calib3d4.2
libopencv-contrib4.2 libopencv-core-dev libopencv-core4.2 libopencv-dnn-dev libopencv-dnn4.2
libopencv-features2d4.2 libopencv-flann-dev libopencv-flann4.2 libopencv-highgui4.2 libopencv-imgcodecs-dev
libopencv-imgcodecs4.2 libopencv-imgproc-dev libopencv-imgproc4.2 libopencv-ml-dev libopencv-ml4.2
libopencv-objdetect4.2 libopencv-photo-dev libopencv-photo4.2 libopencv-shape-dev libopencv-shape4.2
libopencv-stitching4.2 libopencv-superres4.2 libopencv-video-dev libopencv-video4.2 libopencv-videoio4.2
libopencv-videostab4.2 libopencv-viz-dev libopencv-viz4.2 libopencv4.2-java libopencv4.2-jni libopenexr-dev
libpocketsphinx3 libprotobuf22 libqhull8.0 librabbitmq4 libre2-6 libsocket++1 libsphinxbase3 libswitch-perl
libtbb-dev libtesseract4 libvtk6.3 libx2go-config-perl libx2go-log-perl libx2go-server-db-perl
libx2go-server-perl libx2go-utils-perl libxcompshad3 nx-x11-common nxagent pwgen x11-apps x11-session-utils
x11-xfs-utils x2goserver-common x2goserver-x2goagent xinit
Use 'sudo apt autoremove' to remove them.
Suggested packages:
xdot bup tor magic-wormhole tahoe-lafs uftp
The following packages will be DOWNGRADED:
git-annex-standalone
0 upgraded, 0 newly installed, 1 downgraded, 0 to remove and 2807 not upgraded.
Need to get 0 B/64.1 MB of archives.
After this operation, 4,447 kB disk space will be freed.
Do you want to continue? [Y/n]
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
(commit or discard the untracked or modified content in submodules)
modified: ansible (modified content, untracked content)
no changes added to commit (use "git add" and/or "git commit -a")
warning: etckeeper failed to commit changes in /etc using git
dpkg: warning: downgrading git-annex-standalone from 8.20200810+git143-gdee38c54d-1~ndall+1 to 7.20190819+git2-g908476a9b-1~ndall+1
(Reading database ... 784916 files and directories currently installed.)
Preparing to unpack .../git-annex-standalone_7.20190819+git2-g908476a9b-1~ndall+1_amd64.deb ...
Unpacking git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) over (8.20200810+git143-gdee38c54d-1~ndall+1) ...
Setting up git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) ...
Processing triggers for mime-support (3.64) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for gnome-menus (3.36.0-1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for desktop-file-utils (0.24-1) ...
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
(commit or discard the untracked or modified content in submodules)
modified: ansible (modified content, untracked content)
no changes added to commit (use "git add" and/or "git commit -a")
warning: etckeeper failed to commit changes in /etc using git
Scanning processes...
Scanning candidates...
Scanning processor microcode...
Scanning linux images...
Running kernel seems to be up-to-date.
The processor microcode seems to be up-to-date.
Restarting services...
Service restarts being deferred:
systemctl restart NetworkManager.service
/etc/needrestart/restart.d/dbus.service
systemctl restart docker.service
systemctl restart gdm.service
systemctl restart gdm3.service
systemctl restart systemd-logind.service
systemctl restart unattended-upgrades.service
No containers need to be restarted.
User sessions running outdated binaries:
Debian-gdm @ user manager service: at-spi-bus-laun[1438], gnome-shell[1463], ibus-daemon[1523], systemd[1290]
yoh @ session #2: gdm-session-wor[1642], gdm-x-session[1784], gnome-keyring-d[1708]
yoh @ user manager service: at-spi-bus-laun[2579], chrome-sandbox[6541], chromium[6527,6542,6544],
git-annex[3636,3710], gnome-session-b[2622], gnome-shell[2643], gvfsd[1715], ibus-daemon[3242],
pycharm[3591064], sh[386621], skypeforlinux[3408], systemd[1677], zsh[42034,411901]
sudo apt-get install git-annex-standalone=7.20190819+git2-g908476a9b-1~ndall+ 24.79s user 11.11s system 92% cpu 38.874 total
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28532.....................................:Sat 10 Oct 2020 07:11:24 PM EDT:.
(git)lena:~datalad/trash[master]git
$> datalad wtf -S dependencies | grep git
- cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
- cmd:bundled-git: 2.20.1
- cmd:git: 2.20.1
- cmd:system-git: 2.27.0
- git: 3.1.0
- gitdb: 4.0.2
cached/staged changes:
submodules.txt | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(dev3) 1 28533.....................................:Sat 10 Oct 2020 07:11:35 PM EDT:.
(git)lena:~datalad/trash[master]git
$> bash gh-default-branch
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-HdIDbtR
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-HdIDbtR/testgh
create(ok): /home/yoh/.tmp/dl-HdIDbtR/testgh (dataset)
> cd testgh
> datalad create-sibling-github -s github --existing replace testgh
repository "testgh" already exists on GitHub.
Do you really want to remove it? (choices: yes, [no]): yes
[WARNING] Authentication failed using a token.
Do you really want to remove it? (choices: yes, [no]): yes
.: github(-) [https://github.com/yarikoptic/testgh.git (git)]
'https://github.com/yarikoptic/testgh.git' configured as sibling 'github' for Dataset(/home/yoh/.tmp/dl-HdIDbtR/testgh)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-HdIDbtR/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): /home/yoh/.tmp/dl-HdIDbtR/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]
> datalad clone git://github.com/yarikoptic/testgh ../dest
install(ok): /home/yoh/.tmp/dl-HdIDbtR/dest (dataset)
> git -C ../dest branch -a
* git-annex
remotes/origin/HEAD -> origin/git-annex
remotes/origin/git-annex
remotes/origin/master
I have upgraded system one to 2.28.0-1 (and used with and without DATALAD_USE_DEFAULT_GIT=1) -- the same result :-/
That is why I also don't like this one (better solutions are most welcome), but it would be "in effect" only on the first |
"funny" thing: I have changed my reproducer to just use gh-default-branch-local#!/bin/bash
export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
datalad create testgh;
cd testgh;
datalad create-sibling -s github --existing replace ../dest
datalad push --to github;
#datalad publish --to github;
#datalad clone git://github.com/yarikoptic/testgh ../dest
git -C ../dest branch -a It doesn't show wrong behavior/result BUT according to the order of result records -- it pushes (well "publish"es) full output$> bash gh-default-branch-local
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-Wtku2jl
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-Wtku2jl/testgh
create(ok): /home/yoh/.tmp/dl-Wtku2jl/testgh (dataset)
> cd testgh
> datalad create-sibling -s github --existing replace ../dest
[INFO ] Considering to create a target dataset /home/yoh/.tmp/dl-Wtku2jl/testgh at /home/yoh/.tmp/dl-Wtku2jl/dest of localhost
[INFO ] Fetching updates for Dataset(/home/yoh/.tmp/dl-Wtku2jl/testgh)
[INFO ] Adjusting remote git configuration
[INFO ] Running post-update hooks in all created siblings
create_sibling(ok): /home/yoh/.tmp/dl-Wtku2jl/testgh (dataset)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-Wtku2jl/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex ce5da71..34b045e]
publish(ok): /home/yoh/.tmp/dl-Wtku2jl/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
> git -C ../dest branch -a
git-annex
* master
And even if I swap order (push git-annex first) and use git push directly, I still get git-annex first$> cat gh-default-branch-git-swaporder
#!/bin/bash
export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
datalad create testgh;
cd testgh;
datalad create-sibling-github -s github --existing replace testgh
GIT_TRACE=1 git push github git-annex:git-annex master:master
datalad clone git://github.com/yarikoptic/testgh ../dest
git -C ../dest branch -a
.....
> GIT_TRACE=1
> git push github git-annex:git-annex master:master
19:35:02.179751 git.c:444 trace: built-in: git push github git-annex:git-annex master:master
19:35:02.180477 run-command.c:663 trace: run_command: unset GIT_PREFIX; ssh git@github.com 'git-receive-pack '\''yarikoptic/testgh.git'\'''
19:35:03.140376 run-command.c:663 trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
19:35:03.145795 git.c:444 trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 12 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (11/11), 961 bytes | 320.00 KiB/s, done.
Total 11 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.
To github.com:yarikoptic/testgh.git
* [new branch] git-annex -> git-annex
* [new branch] master -> master
> datalad clone git://github.com/yarikoptic/testgh ../dest
install(ok): /home/yoh/.tmp/dl-97ReF8j/dest (dataset)
> git -C ../dest branch -a
* git-annex
remotes/origin/HEAD -> origin/git-annex
remotes/origin/git-annex
remotes/origin/master
bash gh-default-branch-git-swaporder 6.31s user 2.53s system 45% cpu 19.252 total |
FWIW, yes, but only on initial push. Ideal solution should find GitHub API calls to set default branch fire the repo if possible at all, or then check at user level and alert if it is main when local one is master and then push in two steps I guess. Unlikely we want to change default branch on user behalf |
my current thinking is that |
Setting the repo's default branch via a call to the github api (specifically this, I guess) was mentioned initially in gh-4997. (edit: I see you've mentioned it above too.) Making |
Hmm, using an api call from within
patchdiff --git a/datalad/support/github_.py b/datalad/support/github_.py
index d8771fd9f8..f912da4308 100644
--- a/datalad/support/github_.py
+++ b/datalad/support/github_.py
@@ -320,6 +320,8 @@ def _make_github_repos(
ncredattempts += 1
for ds, reponame in rinfo:
lgr.debug("Trying to create %s for %s", reponame, ds)
+ repo = ds.repo
+ branch = repo.get_corresponding_branch() or repo.get_active_branch()
try:
res_ = _make_github_repo(
github_login,
@@ -327,7 +329,8 @@ def _make_github_repos(
reponame,
existing,
access_protocol,
- dryrun)
+ dryrun,
+ branch)
# output will contain whatever is returned by _make_github_repo
# but with a dataset prepended to the record
res.append((ds,) + assure_tuple_or_list(res_))
@@ -364,7 +367,8 @@ def _make_github_repos(
raise RuntimeError("Did not even try to create a repo on github")
-def _make_github_repo(github_login, entity, reponame, existing, access_protocol, dryrun):
+def _make_github_repo(github_login, entity, reponame, existing, access_protocol,
+ dryrun, branch):
repo = None
try:
repo = entity.get_repo(reponame)
@@ -434,6 +438,9 @@ def _make_github_repo(github_login, entity, reponame, existing, access_protocol,
raise RuntimeError(
'something went wrong, we got no Github repository')
+ if branch:
+ repo.edit(default_branch=branch)
+
if dryrun:
return '{}:github/.../{}'.format(access_protocol, reponame), False
else: |
FWIW given that the api call can't be used on empty repos, I haven't been able to come up with a better option than your proposal. |
…te branches yet This is a central place for logic of pushing refspecs used by push and publish, so I decided to fix in a single spot instead of duplicating logic or providing yet another additional wrapper, and thus possibly of help to any other code which uses .push(). Note: - logic is in effect only if remote and refspecs are provided. So it will be of no effect if either of those not provided, but most likely there would be no relevant use case since it would require manual configuration first of the refspecs and default remote for a branch within config before submitting a push Q: may be logic should move even deeper into push_ generator? Closes: datalad#4997
…st refspec This is to not incure the run time penalty for other (than github) types of remotes which do (? at least push by git itself via ssh does) take the first pushed refspec to be the default branch. I also removed previously proposed checks on existing branches, since in case we reconfigure an existing remote (but we do not fetch --prune) we might still have old remote branches hanging around. So decided to act solely based on the config variable. Moreover, such variable could be set globally if so desired to e.g. overcome similar issue if detected with other provider without waiting for a fix in the codebase. We will unset only "local"ly set value
e3460c7
to
efde850
Compare
ok, rebased and redone in efde850 so if tests pass - candidate for merging. |
Codecov Report
@@ Coverage Diff @@
## maint #5010 +/- ##
==========================================
- Coverage 90.16% 84.73% -5.43%
==========================================
Files 296 293 -3
Lines 41868 41933 +65
==========================================
- Hits 37750 35534 -2216
- Misses 4118 6399 +2281
Continue to review full report at Codecov.
|
Restarted nfs run on Travis which failed with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said:
FWIW given that the api call can't be used on empty repos, I haven't been able to come up with a better option than your proposal.
Looking at this again, that's still my assessment. Not pretty but at least the workaround is fairly isolated and hidden, and I don't have any better ideas.
I've pushed a test. Of course feel free to drop or tweak it.
merging since got an approval and no objections, thanks @kyleam |
Released in 0.14.1 |
This is a central place for logic of pushing refspecs used by push and publish,
so I decided to fix in a single spot instead of duplicating logic or providing
yet another additional wrapper, and thus possibly of help to any other code which
uses .push().
Note:
of no effect if either of those not provided, but most likely there would be
no relevant use case since it would require manual configuration first of the
refspecs and default remote for a branch within config before submitting a push
Qs:
git-annex
branch the default #4997create-sibling-github
to do separate initial push(es) only in that case (not always upon initial push). I am not 100% satisfied in such approach since would make it github specific, and might be needed for other (gitlab?) if they start behaving the same wayCloses: #4997