-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create-sibling-github created repos make git-annex
branch the default
#4997
Comments
oh -- in above there was a partial screw up of mine -- I was in "incoming" branch, and used
but it is |
The branch that is first pushed being the default on github has been the case for as long as I can remember. I don't think we need an explicit API call to rectify this in DataLad. It is easily fixed by hand https://github.com/datalad/datalad/settings/branches and does not happen with the recommended |
I did use the combo in the example right above your comment (previous repo was removed from github manually first). And according to the It is not as "easily" fixed, when you are to create multiple datasets. |
Push order was an explicit devgoal for push. I tried again with a fresh repo and master was default. |
So order isn't enough and behavior seems to vary between people. |
May be there is some race condition at GitHub side and adding a slight delay after pushing master (if we see it is new branch) would mitigate it.... Didn't try yet but that explanation is the only one which makes sense to me. Alternative to delay might be to follow push of the branch with its forced fetch if new. |
FWIW, using #5008 reproducer (assumes my github login, adjust for yours)#!/bin/bash
export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
datalad create testgh;
cd testgh;
datalad create-sibling-github -s github --existing replace testgh
datalad push --to github;
datalad clone git://github.com/yarikoptic/testgh ../dest
git -C ../dest branch -a
output$> bash gh-default-branch
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-z8xaY6l
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-z8xaY6l/testgh
[INFO ] Scanning for unlocked files (this may take some time)
create(ok): /home/yoh/.tmp/dl-z8xaY6l/testgh (dataset)
> cd testgh
> datalad create-sibling-github -s github --existing replace testgh
.: github(-) [https://github.com/yarikoptic/testgh.git (git)]
'https://github.com/yarikoptic/testgh.git' configured as sibling 'github' for Dataset(/home/yoh/.tmp/dl-z8xaY6l/testgh)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-z8xaY6l/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): /home/yoh/.tmp/dl-z8xaY6l/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]
> datalad clone git://github.com/yarikoptic/testgh ../dest
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): /home/yoh/.tmp/dl-z8xaY6l/dest (dataset)
> git -C ../dest branch -a
* git-annex
remotes/origin/HEAD -> origin/git-annex
remotes/origin/git-annex
remotes/origin/master
bash gh-default-branch 8.21s user 3.42s system 69% cpu 16.748 total
and the thing is that we are pushing ALL refspecs at once in a single git push call. Breaking that apart and pushing one at a time (dirty patch)diff --git a/datalad/core/distributed/push.py b/datalad/core/distributed/push.py
index 121bc5bcd..dc2711c86 100644
--- a/datalad/core/distributed/push.py
+++ b/datalad/core/distributed/push.py
@@ -670,11 +670,13 @@ def _append_branch_to_refspec_if_needed(ds, refspecs, branch):
def _push_refspecs(repo, target, refspecs, force_git_push, res_kwargs):
- push_res = repo.push(
+ push_res = sum([
+ repo.push(
remote=target,
- refspec=refspecs,
+ refspec=refspec,
git_options=['--force'] if force_git_push else None,
- )
+ ) for refspec in refspecs
+ ], [])
# TODO maybe compress into a single message whenever everything is
# OK?
for pr in push_res:
seems to resolve the issue$> bash gh-default-branch
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-uuIWhQz
> datalad create testgh
[INFO ] Creating a new annex repo at /home/yoh/.tmp/dl-uuIWhQz/testgh
[INFO ] Scanning for unlocked files (this may take some time)
create(ok): /home/yoh/.tmp/dl-uuIWhQz/testgh (dataset)
> cd testgh
> datalad create-sibling-github -s github --existing replace testgh
repository "testgh" already exists on GitHub.
Do you really want to remove it? (choices: yes, [no]): yes
[WARNING] Authentication failed using a token.
Do you really want to remove it? (choices: yes, [no]): yes
.: github(-) [https://github.com/yarikoptic/testgh.git (git)]
'https://github.com/yarikoptic/testgh.git' configured as sibling 'github' for Dataset(/home/yoh/.tmp/dl-uuIWhQz/testgh)
> datalad push --to github
publish(ok): /home/yoh/.tmp/dl-uuIWhQz/testgh (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): /home/yoh/.tmp/dl-uuIWhQz/testgh (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]
> datalad clone git://github.com/yarikoptic/testgh ../dest
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): /home/yoh/.tmp/dl-uuIWhQz/dest (dataset)
> git -C ../dest branch -a
git-annex
* master
remotes/origin/HEAD -> origin/master
remotes/origin/git-annex
remotes/origin/master
so the solution would be, if remote has no branches yet, first push the first refspec and then all the rest. |
…te branches yet This is a central place for logic of pushing refspecs used by push and publish, so I decided to fix in a single spot instead of duplicating logic or providing yet another additional wrapper, and thus possibly of help to any other code which uses .push(). Note: - logic is in effect only if remote and refspecs are provided. So it will be of no effect if either of those not provided, but most likely there would be no relevant use case since it would require manual configuration first of the refspecs and default remote for a branch within config before submitting a push Q: may be logic should move even deeper into push_ generator? Closes: datalad#4997
The reproducer fails to reproduce for me.
|
Lucky you! May be has something to do with going via password (in my case was one of the tokens) |
Maybe this is of relevance here (i.e. a change that has happened at GitHub recently that might break behavior): https://github.blog/changelog/2020-08-26-set-the-default-branch-for-newly-created-repositories/ |
May be. So did you set it to master? (For me it is main) |
FYI - I have been noticing that behaviour lately with our CONP dataset automatically crawled. I ran some test locally as I suspected this could be due to Github renaming the default branch for new repos to main instead of master. In If the default branch is set to If the default branch is set to That made me suspect that Github just picks the first branch in alphabetical order amongst the branches available as the default branch, which would correspond to To confirm that, I created a third dataset with branch No idea if this is helpful to you but just in case, I figured I would share my little investigation of the day :-). |
Thank you @cmadjar ! Not sure if may be some order of events wasn't desired, but I did try to change default branch to be master for @dandibot, and then created/pushed sibling and it still ended up with git-annex to be the default one :-( https://github.com/dandisets/000003 |
ok -- found 1 more, there is also at organization level. Hopefully next one would work better |
@yarikoptic I think it has to be changed at the organization level setting for it to work. At least, this is the one I played with with my tests yesterday. Curious to see if your next test will work the same way it did in my tests. Questions:
At the moment, we are sticking with |
As far as I'm aware, it hasn't been discussed one way or the other. But, assuming git 2.28 or later, you can use
You could also set
I'm not sure, as there are probably lots of angles to consider. DataLad core should be in a good place with respect to not hard coding the default branch name (and we test with a custom |
FWIW, hopefully we still finalize #5010 one way or another (most likely making it github specific) so the issue would go away |
In datalad 0.15.5 with git 2.35.0, there is no such option --initial-branch. If anything changed there, it is not documented, the online manual still mentions --initial-branch in the FAQ. What is the current approach to solving this? |
I think this is just confusingly worded in the FAQ, I will fix this.
I checked Git 2.35.0's docs and it still lists the parameter. I'm running 0.35.1 locally and this works: (handbook2) adina@muninn in /tmp
❱ datalad create abc --initial-branch lookatme 1 !
[INFO ] Creating a new annex repo at /tmp/abc
create(ok): /tmp/abc (dataset)
(handbook2) adina@muninn in /tmp
❱ cd abc
(handbook2) adina@muninn in /tmp/abc on git:lookatme
❱ git branch
git-annex
* lookatme Does that work for you as well? Thanks for bringing the suboptimal wording to our attention, I can totally see how this is not clear at all. |
I think it was not happening before and might be due to changes in github and getting away from
master
being a default...Originally reported by @jwodder in the scope of dandi/dandi-cli#250 (comment)
I wanted to do a demonstration and also ran into it
Since we just say
"new branch"
I do not know yet either may be it is due to the order we are pushing the branches -- and ifgit-annex
is first -- it would make it the default.DataLad 0.13.4.dev10 WTF (configuration, datalad, dataset, dependencies, environment, extensions, git-annex, location, metadata_extractors, python, system)
WTF
configuration <SENSITIVE, report disabled by configuration>
datalad
dataset
dependencies
environment
extensions
git-annex
location
metadata_extractors
python
system
The text was updated successfully, but these errors were encountered: