Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle checking out a branch when that branch is already checked out #34

Open
billsacks opened this issue Nov 27, 2017 · 22 comments
Open
Milestone

Comments

@billsacks
Copy link
Member

If I have (for example) a CESM repository that points to a git repo for CLM, and do the following:

  1. From components/clm: git checkout -b mybranch

  2. Make some changes in components/clm

  3. From components/clm: git push -u origin mybranch

  4. Change CESM.cfg to point to mybranch (modifying the repo_url and setting the branch appropriately)

  5. Run checkout_externals

then I end up in detached head state in components/clm.

In fact, any time you run checkout_externals pointing to a branch, you'll end up in detached head state - though @bandre-ucar has convinced me that this is okay (and probably preferable) in the case where you weren't already on the given branch.

However, in this relatively common case where you're already on the given branch, I'd like to end up in a state where I'm still on that branch. There are issues with what to do if the branch on the remote is ahead or behind the local branch. But for starters, I think it would be very helpful to have the following special-purpose logic:

(Solution 1) If the cfg file specifies a branch, and (after fetching from the remote) the sha of the remote branch matches the currently-checked-out sha, then do nothing. (Can this be done by comparing git rev-parse HEAD with git rev-parse remotename/branchname?)

I also like @bandre-ucar 's idea:

(Solution 2) If you give repo_url as "." for a git repository, then all operations will be done on local tags and branches.

However, I think that even if we have (Solution 2), I'd still like to have (Solution 1), because this will facilitate doing cross-component development on multiple machines.

@billsacks
Copy link
Member Author

For extra helpful output with (Solution 1), it could be nice (but not essential) to output a WARNING message if (after fetching from the given remote, but before doing the checkout) we detect that:

  1. We are currently on a branch that is tracking the specified remotename/branchname, AND

  2. Either:

    a. There are local commits that are not on remotename/branchname, OR

    b. There are commits on remotename/branchname that are not on HEAD

(1) could be determined by scraping the output of git status -sb

(2) could be determined by checking the output of git log --oneline @{u}..HEAD and git log --oneline HEAD..@{u}

In the case of (2a), we could suggest doing:

git checkout branchname
git push

In the case of (2b), we could suggest doing:

git checkout branchname
git merge remotename/branchname

(However, I'm also fine waiting to add this extra outputs until we see that this actually bites someone in practice.)

@gold2718
Copy link
Collaborator

For the record, I am in favor of working in real local branches when a branch is specified in the externals configuration file.
I think we need to define the semantics of what happens when checkout_externals encounters a branch entry instead of a tag. My suggestions:

  1. If the working copy is not clean, abort.
  2. If the working copy is clean, create/checkout a local branch tracking the listed branch.
    a. If a local branch is found that is tracking the listed remote branch. try to just check out that branch. Perhaps show the status so the user is aware of being ahead or behind the upstream.
    b. If no local branch is found, create a local tracking branch with the same name (e.g., origin/foo becomes foo) and check it out. If a non-tracking branch exists with the desired name, abort.

Some notes / questions:

Something easy to parse to check for tracking branches:

 git for-each-ref --shell --format='%(refname:short) %(upstream:short)' refs/heads

Do we ever want to attempt a merge? Is there any harm in running:

git merge --ff-only @{u}

@billsacks
Copy link
Member Author

Thanks for your thoughts @gold2718

What's tricky here is that

(1) Different people could have different expectations regarding the final state following running checkout_externals

(2) Different people could have different expectations regarding what (if anything) is done to any local branches.

@bandre-ucar has convinced me that it's probably best if we stick to the simplest, most consistent rules possible: This will help avoid surprises. Along those lines, I like @bandre-ucar 's simple-to-remember rule: When you run checkout_externals, it makes your local sandbox match the state given by the .cfg file. (I hope I'm paraphrasing this correctly.)

I would add a corollary of this rule: If two people run checkout_externals with the same .cfg file, they should end up with the same final state in terms of what SHAs are checked out (unless an error was encountered that causes the tool to abort). In particular: unless an error was encountered, the final state after running checkout_externals should not depend on the history of what a user has done in their sandbox (e.g., creating branches, adding commits, etc.). I believe the current behavior follows this rule (unless there are some edge cases I'm not aware of); I think preserving this rule is important to avoid surprises.

For this reason, I don't like the idea of just checking out an existing branch, which may be ahead of or behind the branch on the remote.

I also don't like the idea of having checkout_externals doing a merge into a local branch. I think we should avoid having checkout_externals ever mess with your local branches. Part of my reason for feeling this way is to allow us to have simple, consistent rules: I feel pretty strongly that we should not attempt a merge if a fast-forward merge is impossible: I feel this would be surprising, and could lead to conflicts that the user needs to resolve. Then, given that we're not doing merges in all cases, I feel we also should not do a merge even if a fast-forward merge is possible. Otherwise we need to explain to people when we do and don't do a merge; I feel this would be hard for new users to understand, and hard for even experienced users to remember. It's better to have a consistent rule: We don't mess with your local branches.

This leaves the question of whether to create a tracking branch. Given all of my other arguments in this comment, I'd say no: Given my argument that we should never update local branches, the problem comes down to what happens when you rerun checkout_externals. Consider this common scenario:

  1. You run checkout_externals with a branch listed in the .cfg file for the first time. Assume this creates a tracking branch for you.

  2. The remote branch is updated; you haven't made any changes locally.

  3. You rerun checkout_externals

Given my above arguments, the result of (3) should be that you're in detached head state, with a warning message issued. This state makes sense if you manually created the given tracking branch (presumably because you wanted to do some work on it). But for someone who didn't create the branch themselves, the warning message and resulting state will likely be confusing.

For this reason, I'd vote for maintaining the current behavior of not creating tracking branches for you. That is, any time you run checkout_externals, regardless of whether it's pointing at tags or branches, you are always left in detached head state; it's up to you to create a branch if you need one. (And so, in the above scenario, no warning would be issued, since there is no tracking branch.)

All that said: I'm open to being convinced that some other behavior would be better.

@gold2718
Copy link
Collaborator

(1) Different people could have different expectations regarding the final state following running checkout_externals

(2) Different people could have different expectations regarding what (if anything) is done to any local branches.

Isn't that what documentation is for?

@bandre-ucar has convinced me that it's probably best if we stick to the simplest, most consistent rules possible

It seems to me that the simplest, most consistent action is the no-branch (detached head) action. Did I understand correctly? Is the simplest really best? What about the desired workflows?

Maybe I'm misunderstanding you but it seems that according to your rules, if you:

  1. Run checkout_externals
  2. Create a local branch in one of the checked-out repos.
  3. Run checkout_externals again

then you are back where you were after step 1. Did I get that right? Would that not be surprising to users?

Under your proposed rules, how do I create a branch or fork of a repo with some of the tags replaced with branches? Do I have to manually create local branches every time I clone the repo?

@billsacks
Copy link
Member Author

@gold2718 I'll be honest, I'm struggling here myself. I was probably wrong to say "simplest", but I'll stand by "most consistent" as a goal, and I've been struggling to see how we can remain consistent if we sometimes create a local branch for people.

Regarding this:

Maybe I'm misunderstanding you but it seems that according to your rules, if you:

  1. Run checkout_externals
  2. Create a local branch in one of the checked-out repos.
  3. Run checkout_externals again

then you are back where you were after step 1. Did I get that right? Would that not be surprising to users?

Sorry, I wasn't clear in my recent, long comment. I still stand by my initial request in this issue:

(Solution 1) If the cfg file specifies a branch, and (after fetching from the remote) the sha of the remote branch matches the currently-checked-out sha, then do nothing. (Can this be done by comparing git rev-parse HEAD with git rev-parse remotename/branchname?)

Regarding this:

Under your proposed rules, how do I create a branch or fork of a repo with some of the tags replaced with branches? Do I have to manually create local branches every time I clone the repo?

I think you mean (for example) that you're creating a branch of CESM where (e.g.) CLM and MOSART point to branches. In this case, under my suggestions: You can clone the repo and use it (create a case, etc.) without creating local branches. But if you want to do further development (e.g., of CLM) in that new clone, then yes: you would need to manually create a local CLM branch.

I think the critical questions to answer are these. I'm giving my thoughts, but I'm not claiming that my thoughts are the only ones that matter :-)

  1. When you run checkout_externals, should you be able to expect that your local state is always updated to match the state of the given remote branch (unless errors are encountered)? i.e., should the locally-checked-out sha match the head of the remote branch?

    I argued 'yes' above, and I feel pretty strongly about this. We could buy ourselves some flexibility, though, through careful choice of when we throw an error.

  2. Should checkout_externals ever update a local branch?

    I argued 'no' above, though I don't feel as strongly here. Again, we could buy ourselves some flexibility through careful choice of when we throw an error.

(2) seems like the critical piece here for allowing checkout_externals to automatically checkout a branch for you. I could imagine the following alternative in order to allow this to happen:

  • checkout_externals automatically checks out a tracking branch for you if none exists

  • If a tracking branch already exists, checkout_externals does a fast-forward merge if one is possible, issuing a message that this is happening

  • If a fast-forward merge is impossible, then checkout_externals aborts with a message describing what you need to do (I'm not exactly sure what would be suggested in this case... I guess a manual merge followed by a push)

I'd be okay with this solution as long as that last (aborting) piece is in place. I could even be convinced that it's better than my original suggestions. @gold2718 and @bandre-ucar what do you think about this?

@billsacks
Copy link
Member Author

Short summary of my last post: I'd be fine with having checkout_externals automatically check out a tracking branch for you and do a fast-forward merge when possible, as long as it aborts with an error message whenever a fast-forward merge is impossible.

A couple more thoughts:

  • If we do that, I'd also like for checkout_externals to write the previously-checked-out sha to both stdout and the log file. This seems important in the case where you weren't expecting your local branch to get updated but it got updated for you, and you want to be able to temporarily back up, or at least review what has changed.

  • I'd personally be okay with our "going live" with this before all of the branch support is completely worked out - treating that as an unsupported, expert-only feature at first.

@mnlevy1981
Copy link
Contributor

Reviving an old discussion because some of my thoughts on #56 seem better suited in this ticket... reading through this thread, I agree with @billsacks that

If two people run checkout_externals with the same .cfg file, they should end up with the same final state in terms of what SHAs are checked out (unless an error was encountered that causes the tool to abort).

It seems like there are several scenarios where the .cfg file could be interpreted in many different ways, and I would propose the following solutions to try to keep with the stated goal above:

User requests a branch rather than a tag
Repository is left in a state where local branch is checked out and tracking the specified remote. As a developer, if I specify a branch rather than a tag it's because I want to work on that branch and ending up in a detached head would not be desirable.

User creates local copy of branch and then reruns checkout_externals
If the sha of local branch matches the sha of the remote branch then behave as previous scenario. If the sha doesn't match, report an error along the lines of local and remote branches are not synced [report sha info and maybe commit date?] and abort without changing anything in the repository. Key point: do not try to do any merging, leave that up to the user!

User creates a new branch locally but branch does not exist on remote
Report an error along the lines of [branch] does not exist on [remote]

User creates branch2 from sha of origin/branch and local copy is on branch2
Even though the sha matches, the tool should still switch from branch2 back to specified branch

So of @gold2718 suggestions:

  1. If the working copy is not clean, abort.
  2. If the working copy is clean, create/checkout a local branch tracking the listed branch.
    a. If a local branch is found that is tracking the listed remote branch. try to just check out that branch. Perhaps show the status so the user is aware of being ahead or behind the upstream.
    b. If no local branch is found, create a local tracking branch with the same name (e.g., origin/foo becomes foo) and check it out. If a non-tracking branch exists with the desired name, abort.

I agree with (1) and (2b). For (2a), the logic I would expect is

  1. Is the local branch currently checked out?
    a. If so, does the current sha match that of the remote (leave as-is if yes, abort if no)
    b. If not, is the repo in a detached head state at the sha matching the remote branch? (leave as-is if yes)
    c. If not (b), does the head of that branch match that of remote (check out local branch if yes, abort if no)

As a user, I would not expect this tool to change the state of the checkout if the local tracking branch is currently checked out or if I am already at a detached head with the sha of the remote branch. Note that if the local tracking branch is checked out at the correct sha but the working copy is not clean this should still trigger an abort -- it keeps with the stated goal of identical .cfg files producing identical results barring errors.

If a user runs checkout_externals and then updates a branch locally, he or she has two options to successfully run checkout_externals again:

  1. push local branch to remote
  2. intentionally put local copy in a detached head state by checking out the sha matching the remote branch

@mnlevy1981
Copy link
Contributor

Talking with @billsacks I realized that my previous comment is somewhat unclear and also very git-centric; to clarify, here is the decision tree I think checkout_externals should go through:

  1. Does branch exist in remote (origin)?
    -- NO => abort

  2. Is working copy clean?
    -- NO => abort

  3. Is current checkout detached HEAD at sha of remote branch?
    -- YES => END

  4. Does local copy of branch exist?
    -- NO => create it (tracking origin/branch) & check it out (END)

  5. Does local copy of branch track origin/branch?
    -- NO => abort (user created a branch manually and did not set upstream)

  6. Does sha of local copy of branch match sha of remote?
    -- NO => abort (either branch is behind remote and needs to be fast-forwarded or is ahead of remote and needs to be pushed)
    -- checkout_externals could have --allow-fast-forward option so if branch is behind remote then fast-forward but still abort if branch is ahead of remote)

  7. Is local branch checked out?
    -- NO => check out branch (END)


And most of the git concepts can be applied to svn as well

  1. svn status -q non-empty => dirty state & abort
  2. If local copy exists / is a checkout of requested branch: if local copy is at revision M and repository is at revision N (N >= M by definition), then see if branch has changed between M and N (abort if yes; note that given our svn hooks, this should never occur for a tag)
  3. If local copy exists and is not a checkout of requested branch: clobber with new checkout (note that you are already aborting if dirty state)

@billsacks
Copy link
Member Author

For the record: I'm happy with @mnlevy1981 's latest comment. The one thing I'm a little uncomfortable with is that it does NOT cleanly support the workflow of: You have a branch of cesm which has a CESM.cfg file that itself points to branches of multiple components; you periodically run checkout_externals in order to update all components to the latest versions on the remote (this is the 3-step "common scenario" that I presented in #34 (comment)). But I'm okay with requiring a flag like --allow-fast-forward for this to work smoothly.

@gold2718
Copy link
Collaborator

gold2718 commented Jan 4, 2018

I have one question about @mnlevy1981's decision tree.
How would we get to 5? To me, this goes back to 4 -- how do we decide if a local copy of the branch exists? Currently, I think that if a branch is tracked (has an upstream), we check to make sure we are tracking the correct branch.
However, for a local (no upstream) branch, we just compare names which I do not think is robust. Let's say the cfg points to branch, foo.

git checkout bad_branch
git branch --no-track foo
git checkout foo

I think my foo branch will pass the test but the code will actually be from bad_branch. Is this correct? Is that acceptable?

@mnlevy1981
Copy link
Contributor

@gold2718 asked

How would we get to 5? To me, this goes back to 4 -- how do we decide if a local copy of the branch exists?

If you are asking for specific git commands, then I don't know the answer. If you're asking a conceptual question, then I'll give you three scenarios based on the assumption that the cfg file looks like

[$COMPNAME]
branch = foo
protocol = git
repo_url = $REPO_URL
local_path = $LOCAL_PATH
required = True

Note that we've already checked that if origin points to $REPO_URL then origin/foo already exists (and also that our working directory is clean). So the three options are

  1. origin/foo exists but foo (local branch) does not: checkout_externals will create foo and it will track origin/foo

  2. foo and origin/foo both exist, and foo tracks origin/foo: if foo and origin/foo have the same sha then checkout_externals will checkout foo

  3. foo and origin/foo both exist, but foo does not track origin/foo: My initial comment was that this should result in an error (clearly stating that foo must track origin/foo) but I could see an argument for this resulting in a detached head checkout of the sha of origin/foo.

If we are in case (3) but the user has a branch foo2 that tracks origin/foo then I am okay with checkout_externals aborting. If others think that should result in foo2 being checked out, then I am okay with checking to see if any branch tracks origin/foo before checking to see if foo exists.


However, for a local (no upstream) branch, we just compare names which I do not think is robust. Let's say the cfg points to branch, foo.

git checkout bad_branch
git branch --no-track foo
git checkout foo

I think my foo branch will pass the test but the code will actually be from bad_branch. Is this correct? Is that acceptable?

I'm not sure what you mean... if repo_url is pointing to a remote then by my logic checkout_externals would abort (foo doesn't track origin/foo), and if repo_url is local_path (is that even allowed?) then foo and bad_branch both point to the same sha, but because foo is in the cfg file, the end state would be that foo is checked out (because that was the branch requested).

@gold2718
Copy link
Collaborator

gold2718 commented Jan 4, 2018

@mnlevy1981, thanks for the explanation. I did not follow your original logic correctly.
I think we are in agreement as to what should happen (your last post on 2017-01-03).
However, I do not think that sequence is currently being followed. If we agree on your procedure, I could file an issue with my edge case.

@mnlevy1981
Copy link
Contributor

@gold2718

However, I do not think that sequence is currently being followed. If we agree on your procedure, I could file an issue with my edge case.

Yeah, I was trying to present how I thought the code should handle branches (with no urgency; this issue seems to be low priority at the moment and I'm not trying to imply that should change). So I don't think you need a new issue for your edge case, this is probably the best place for that discussion.

billsacks added a commit to ESCOMP/CTSM that referenced this issue Jan 23, 2018
- Talk about this in the context of CTSM rather than CESM

- Cut documentation on modifying a component, because we currently don't
  have a great workflow for doing this with components in git (due to
  ESMCI/manage_externals#34)
@billsacks
Copy link
Member Author

@ekluzek points out this is important for FATES. We should discuss a reasonable timeline for implementing it.

@billsacks
Copy link
Member Author

In talking with @bandre-ucar he pointed out some things to consider in implementing the solution proposed in #34 (comment)

  • Your local copy could get polluted by branches that you need to clean up at some point.

  • The current operation is consistent in that it tries to make your local copy match what's in the config file. With rule (6) from the above comment, this is not the case if your local copy of the branch has moved ahead of the remote branch. For example, if you actually want checkout_externals to revert your local copy to what's on the remote, then this would require a manual git operation under the above proposal. (It's a matter of opinion which is the clearest solution here, but the new, proposed solution is less consistent with the general rule we've had, so would at least require some more careful documentation / explanations.)

  • We should also think about consistency with svn: If the behavior we come up with doesn't translate to svn, then there should be some sort of check (@mnlevy1981 did address this somewhat in his comment).

  • Does this:

    If local copy exists and is not a checkout of requested branch: clobber with new checkout (note that you are already aborting if dirty state)

    Mean delete (or move out of the way) the old one and checkout a new one? We've tried hard to avoid that, because (for example) it could mean deleting untracked files (or in the move out of the way case, ending up with a lot of versions of a directory). Instead, we should use an svn switch.

  • We should also be sure to think through some use cases where we run checkout_externals several times and make sure that we end up in the desired state in all of those cases.

For the record, I'm still okay (I think) with @mnlevy1981 's proposed solution, but I'd like to think through @bandre-ucar 's points a little more.

chrislxj pushed a commit to chrislxj/ctsm that referenced this issue Apr 2, 2018
- Talk about this in the context of CTSM rather than CESM

- Cut documentation on modifying a component, because we currently don't
  have a great workflow for doing this with components in git (due to
  ESMCI/manage_externals#34)
@billsacks
Copy link
Member Author

billsacks commented Apr 7, 2018

I imagine that some of the needs here can be implemented by modifying _current_ref_from_branch_command (which I'm planning to rename _current_ref in an upcoming PR): This function determines, for example, if we're currently on a tracking branch. We could modify this to return the type of reference (tag, hash, tracking branch, non-tracking branch, detached head, etc.) in addition to the name of the reference.

@billsacks
Copy link
Member Author

@johnpaulalex @jedwards4b @mnlevy1981 and I met to revive this issue. My understanding of the decisions is:

  1. We agree about the first-order desire: If you are on the correct local branch before running manage_externals, then you should still be on that branch after running it, not in detached head state.
  2. If your local branch is strictly behind the remote, manage_externals will fast-forward your local branch.
  3. If your local branch is strictly ahead of the remote, manage_externals will abort with an error message. Initially, this could just give some suggestions about what to do (like pushing your local branch to the remote). We could also consider adding (mutually exclusive) flags --stay-ahead or --rewind and suggesting specifying one of them (with --rewind meaning that you would end up in detached head state, NOT that your local branch would be reset).
  4. We will only work with local branches that both (a) have the same name as the specified remote branch and (b) have their upstream set to the specified remote branch.
    • If you have a local branch with the same name as the desired remote branch, but your local branch either has its upstream set differently from the remote or does not have an upstream set, then throw an error.
    • If you do not have a local branch with a matching name, then one will be created, even if you have a differently-named local branch with the appropriate upstream.
    • In comparing remotes (between the remote specified in your externals and the upstream branch associated with a local branch), we will treat two remotes as equivalent even if one is https and one is ssh form. Also note that the .git extension at the end of the URL is optional, so remotes should be treated as equivalent even if one has that extension and the other doesn't.

I think the rules we have specified result in behavior where, after running manage_externals, either (1) your locally-checked-out sha matches the head of the remote branch, or (2) an error was issued. I like that. (This seems consistent with the suggestions in #34 (comment) and #34 (comment)).

In #34 (comment), @mnlevy1981 suggested that, if you are starting in detached head state at the correct sha, then you should end in detached head state, rather than checking out a local branch and getting it set to the correct sha in this case. I don't have strong feelings one way or the other on this point. But that logic of remaining in detached head state in this situation would support the scenario where the user isn't ready to push their local changes to the remote, yet wants manage_externals to help them track the remote. I'm not sure how often this will come up in practice. If we wanted to follow this, I think we'd end up with the rules:

  • If there is no local branch tracking the given remote with a matching name, create it (as we discussed)
  • If you are on a local branch that tracks the given remote and with a matching name, then issue an error if your local branch is ahead of the remote (as we discussed)
  • But (new thing): if you are currently in detached head state and you have a local branch tracking the given remote with a matching name, then leave things in detached head state.

@johnpaulalex
Copy link
Contributor

Thanks Bill, this all seems right to me. I'll wait for Jim and Michael to chime in before looking into implementation.

As for that final note about detached head, I'm a bit nervous because it means the same config lands you in two different end states depending on your start state. What if we instead have a --detached_head flag to unconditionally cause that end state? (aka reproduce the current behavior of the script, iiuc) And without the flag, we point you to the local branch w/o detached head. No strong feelings here either.

@jedwards4b
Copy link
Collaborator

Since detached head is the current behavior I think it might be better to add a flag for the branch w/o detached head --no-detached-head.

@billsacks
Copy link
Member Author

@johnpaulalex - Good point that it might be confusing if the end state – at least in terms of whether you are in detached head state or on a branch – would depend on the starting state. I'm fine dropping the idea on those grounds. I think we should focus on specifying good default behavior for now, and we can come back to possible flags later.

@jedwards4b - I wasn't clear whether your suggestion refers just to this case (where you are starting with a detached head) or more generally. My feeling is that this use of branches in manage_externals is relatively non-standard, so it's probably okay to introduce a behavior change. I'm fine with your suggestion of --no-detached-head, I'm just not sure if it's really worth the maintenance cost to maintain both behaviors.

@johnpaulalex
Copy link
Contributor

(Trying to swap this back into my head)

To sum up, we currently do a git checkout $remote/$branch, and in some circumstances we want to do just a plain 'git checkout $branch' , yes? Where 'circumstances' are defined above - you don't have a conflicting local branch with the same name, and you're only fast-forwarding, at least.

If that sounds right, I can start trying to code it up.

@billsacks
Copy link
Member Author

@johnpaulalex thanks a lot. Yes, I agree with that high-order description, though I'll add that in what is probably the most common case - where someone is already on the correct branch and wants to stay on that branch - the new implementation will result in a no-op (though doing a git checkout $branch when you are already on $branch probably won't hurt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants