Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems using a remote not named 'origin' #4

Closed
billsacks opened this issue Nov 10, 2017 · 3 comments
Closed

Problems using a remote not named 'origin' #4

billsacks opened this issue Nov 10, 2017 · 3 comments

Comments

@billsacks
Copy link
Member

The tool seems to assume that the remote you want to work with is named ‘origin’. If I have something like this:

<repo_url>git@github.com:billsacks/clm-demo-externals.git</repo_url>
<branch>mybranch</branch>

and I have done:

git remote add billsacks git@github.com:billsacks/clm-demo-externals.git

I get:

ERROR: Invalid repository in /Users/sacks/cesm_code/cesm-demo-externals/components/clm, url = git@github.com:NCAR/clm-demo-externals.git, should be git@github.com:billsacks/clm-demo-externals.git

I have also encountered this error after merging in a branch that points to an external from a different remote.

My understanding (maybe wrong) is that the name of remotes in git (e.g., “origin”, or “billsacks” as above) is just a convenience, and that anywhere where you could have a short remote name like this it’s also valid to list the full remote URL (like git@github.com:billsacks/clm-demo-externals.git). If that’s right, then I’m thinking that there’s no need for checkout_model.py to assume anything regarding remote alias names (“origin” or anything else): any time it needs to interact with the remote, it can just give the full path to the remote.

@bandre-ucar
Copy link
Contributor

@billsacks Update: It's not quite as simple as being 'just a convenience'. They are more like keys in a key value store that is used internally by git. For instance:

$ cat .git/config 
...
...
[remote "ncar"]
	url = git@github.com:NCAR/manage_externals.git
	fetch = +refs/heads/*:refs/remotes/ncar/*
[branch "master"]
	remote = ncar
	merge = refs/heads/master
[remote "bja"]
	url = git@github.com:bandre-ucar/manage_externals.git
	fetch = +refs/heads/*:refs/remotes/bja/*
[remote "origin"]
	url = git@github.com:NCAR/manage_externals.git
	fetch = +refs/heads/*:refs/remotes/origin/*

 $ ls .git/refs/remotes/
bja/    ncar/   origin/

The references are stored by the name rather than url.

One can fetch or pull directly from a url. But it's not added to your remotes so you don't continue to track it.

The initial implementation of the git rep wrappers uses a bunch of plumbing commands using these names. It doesn't appear that there is a way to retrieve a name from the url without scraping output, which is probably why origin is hardcoded several places. I'm not sure any of that is necessary. It should be possible to do this from a higher level, but I need to research it and think about the edge cases a bit.

I'll work on it more next week.

@billsacks
Copy link
Member Author

@bandre-ucar thanks for giving this some thought. I see how this is harder than I imagined.

Here are some thoughts from a bit of looking... I could easily be missing something, though.

It looks like there are three places where origin is assumed:

(1) RE_REMOTEBRANCH: This is just used in _git_ref_type, which is only used by _git_remote, which is only used by _git_update, which isn't called by anyone. So maybe RE_REMOTEBRANCH - and all of those functions - can be deleted?

(2) _git_checkout

(3) _checkout_branch_command

It looks like (2) and (3) are somewhat related in their use of 'origin'. I'm thinking that we could replace the use of 'origin' in (2) and (3) with some code that loops through the different remotes defined in the given repository and checks the URLs of each, comparing against the expected url. Something like:

remotes = subprocess.check_output(['git','remote','show']).splitlines()
remote_urls = {}
for remote in remotes:
    remote_urls[remote] = subprocess.check_output(['git','remote','get-url',remote])
# Then determine which remote, if any, has the expected url

But actually, we probably shouldn't abort if the given remote url can't be found in the list of remotes: What about this use case?:

  1. Clone CESM, checkout tag cesm2.0.beta08, run checkout_externals

  2. checkout tag cesm2.0.beta09. This points to a version of CLM that uses a different remote from the one used in cesm2.0.beta08 (maybe because it's pointing to someone's branch tag rather than a tag of master, for example).

  3. Rerun checkout_externals

I think that, in this case, we'd want checkout_externals to add a remote corresponding to the new url, if such a remote doesn't already exist. We'd have to come up with a name for the remote, maybe by taking some part of the URL (e.g., for URL https://github.com/CESM-Development/cime, we'd call the remote CESM-Development).

To me this is feeling messy but probably doable. And I think it's important that we solve this multiple-remote problem in some fashion. But I'm understanding more what you were saying a few weeks ago about the challenges we'd run into managing remotes....

This was referenced Nov 21, 2017
@billsacks
Copy link
Member Author

This seems to work right now - thank you @bandre-ucar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants