Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta/GitHub workflow] A warning about CICE PRs that update Icepack #753

Closed
phil-blain opened this issue Aug 18, 2022 · 9 comments
Closed

Comments

@phil-blain
Copy link
Member

Hi everyone,

I noticed that a recent CICE PR (#733) was merged with a change to Icepack that references a commit that is not present on the main branch of Icepack.

We can see this on a fresh clone of CICE:

$ git clone --recurse-submodules https://github.com/CICE-Consortium/CICE
$ cd CICE
# get the hash of the squash merge for PR #733
$ git log --grep \#733
# show the change in the recorded submodule commit introduced by this CICE commit
$ git show --submodule=short 9be1c35ae62199ae0ea9c4f339105c7aec3192bf -- icepack/
commit 9be1c35ae62199ae0ea9c4f339105c7aec3192bf
Author: David A. Bailey <dbailey@ucar.edu>
Date:   Sun Jul 31 10:15:34 2022 -0600

    Deprecate CESM ponds (tr_pond_cesm) (#733)

    * Deprecate CESM ponds

    * Namelist changes for deprecating cesmponds

    * Update documentation

diff --git a/icepack b/icepack
index 76ecd41..595c00c 160000
--- a/icepack
+++ b/icepack
@@ -1 +1 @@
-Subproject commit 76ecd418d2efad7e74fe35c4ec85f0830923bda6
+Subproject commit 595c00cfe8121d4e2405282fa08c0907b22e8718
# try to use --submodule=log instead
$ git show --submodule=log 9be1c35ae62199ae0ea9c4f339105c7aec3192bf -- icepack/
commit 9be1c35ae62199ae0ea9c4f339105c7aec3192bf
Author: David A. Bailey <dbailey@ucar.edu>
Date:   Sun Jul 31 10:15:34 2022 -0600

    Deprecate CESM ponds (tr_pond_cesm) (#733)

    * Deprecate CESM ponds

    * Namelist changes for deprecating cesmponds

    * Update documentation

Submodule icepack 76ecd41...595c00c (commits not present)

Notice "(commits not present)" on the last line. If we cd into Icepack, we indeed are missing the new commit introduced by this PR:

$ cd icepack
$ git show 595c00cfe8121d4e2405282fa08c0907b22e8718
fatal: bad object 595c00cfe8121d4e2405282fa08c0907b22e8718

If we take a look at the icepack changes in PR 733, and click on the "25 files" link, we are taken to this page: CICE-Consortium/Icepack@76ecd41...595c00c which shows the Icepack commits
in the range 76ecd41...595c00c, i.e. the new Icepack commits introduced in the PR. We notice that these commits are authored and commited by @dabail10 , which means that they are not on the main branch because all commits on the main branch are squash merges and are committed by the person that merges the PR to main. If we click on the hash for each of the 4 commits, GitHub shows:

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

These four commits indeed originate in Dave's fork of Icepack, though the branch on which they were created is not on GitHub anymore, since we see the same message on his fork, see for example:
dabail10/Icepack@595c00c.

The CICE PR should not have been merged with an Icepack commit not present on Icepack's main branch, in my opinion. I missed it when I reviewed the PR.

This is a little bit concerning, because it means it is hard to checkout this earlier version of CICE and compile it:

git checkout --recurse-submodules 9be1c35ae62199ae0ea9c4f339105c7aec3192bf
fatal: failed to unpack tree object 595c00cfe8121d4e2405282fa08c0907b22e8718
error: Submodule 'icepack' could not be updated.
error: Cannot update submodule:
        icepack

Aborting

It is not impossible though, because the commit is still present on GitHub, so if we fetch it directly by its hash it succeeds:

$ cd icepack
$ git fetch origin 595c00cfe8121d4e2405282fa08c0907b22e8718
remote: Enumerating objects: 21, done.
remote: Counting objects: 100% (21/21), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 21 (delta 8), reused 16 (delta 4), pack-reused 0
Unpacking objects: 100% (21/21), 51.74 KiB | 413.00 KiB/s, done.
From github.com:cice-consortium/Icepack
 * branch            595c00cfe8121d4e2405282fa08c0907b22e8718 -> FETCH_HEAD
$ git checkout --recurse-submodules 9be1c35ae62199ae0ea9c4f339105c7aec3192bf
# works

Note that we can fetch it from CICE-Consortium/Icepack (and not necessarily dabail10/Icepack) because of the fact that GitHub uses the Git alternates mechanism to store all objects of all forks in the root repository of a fork network, which I've mentioned before.

There is virtually no risk that this commit becomes unavailable even if no branch points to it or its children, because it's known (a quick Google search will confirm it) that GitHub never runs git gc in their repositories and thus all commits are kept indefinitely.


So I'm just opening this issue as a warning for everyone to be more careful when merging CICE PR that update Icepack.

Additionnaly it shows how to get fetch a specific commit by its hash if ever someone needs to compile CICE at 9be1c35.

@apcraig
Copy link
Contributor

apcraig commented Aug 18, 2022

@phil-blain, thanks for raising this issue. From what I remember, when I merged PR #733, I noticed that Icepack was NOT correct. But my plan was to update Icepack on main right after that which is what I did. Had I not done that, I would have mentioned that the Icepack changes should be removed from PR #733. But I figured the version of CICE with a non-conformant Icepack would not be on the head for any duration and that it would not have any impact. Maybe that was wrong. The formal update to Icepack on main after PR #733 did bring in the same changes (I think) that are with #733. Of coarse, the history is different. Just to clarify,

  • the current head of main is fine, right?
  • are there likely any impacts of having this version of Icepack on CICE main in the long term?

In the bigger picture, I agree we should be more careful about not bringing in Icepack changes to CICE when we don't want them. In particular, if you have changes to Icepack and do "git commit -a", you are likely to add the Icepack changes to the CICE commit even if you don't intend to. When I do git status and I see "icepack" as changed, I then never do "git commit -a" and do "git add" for specifically just the files I want to add.

@phil-blain
Copy link
Member Author

The formal update to Icepack on main after PR #733 did bring in the same changes (I think) that are with #733.

Yes, the next commit, c6470cf (Update icepack to 3cb1746a202615044e (#743), 2022-07-31), updates Icepack:

$ git show --submodule=log c6470cf67acb90e13fe26e7bfc3d001dacd3fe91 icepack/
commit c6470cf67acb90e13fe26e7bfc3d001dacd3fe91
Author: Tony Craig <apcraig@users.noreply.github.com>
Date:   Sun Jul 31 14:55

    Update icepack to 3cb1746a202615044e (#743)

    Bring in deprecated 0-layer and cesmponds changes in Icepack.

Submodule icepack 595c00c...3cb1746:
  > Deprecate 0-layer thermodynamics (#394)
  > Deprecate CESM pond option (#393)
  < Update comments in documentation
  < change comment
  < change comment
  < Deprecate CESM pond option

and Icepack PR 393 (CICE-Consortium/Icepack#393) is the one that merged the four Icepack commits mentioned above.

the current head of main is fine, right?

Yes, it is fine since c6470cf.

are there likely any impacts of having this version of Icepack on CICE main in the long term?

If ever GitHub decides to start running git gc on their repositories, then it's possible that this commit would be deleted and then we won't be able to fetch it. So then we won't be able to compile CICE commit 9be1c35 easily. A good way to future-proof ourselves would be to push a hidden ref to Icepack that references this commit, maybe something like this, after fetching it as shown above:

cd icepack
git checkout 595c00cfe8121d4e2405282fa08c0907b22e8718
git push origin HEAD:refs/cice/9be1c35ae62199ae0ea9c4f339105c7aec3192bf

This pushes a ref named 9be1c35ae62199ae0ea9c4f339105c7aec3192bf to refs/cice, which is outside the standard Git refs refs/tags (for tags) and refs/heads (for branches), and so it won't show up anywhere on GitHub but will prevent 595c00cfe8121d4e2405282fa08c0907b22e8718 from being garbage collected if git gc is ran.

In general, I think that never using git commit -a is a best practice, indeed :),

@phil-blain
Copy link
Member Author

phil-blain commented Aug 18, 2022

Maybe a good follow-up action to this would be to update our CICE PR template and add a question like:

Is Icepack being updated in this PR ? If yes, does it reference a commit already merged to Icepack's main branch ?

@apcraig
Copy link
Contributor

apcraig commented Aug 18, 2022

I don't think having access to CICE 9be1c35 is important. For the sake of robustness, maybe we should push Icepack 595c00cfe, although I'm not that concerned.

One of the problems is that sometimes folks don't realize they are committing an update to Icepack, so having a questions on the PR template might not help :). You really don't see the changes except as a list of files in the PR, so it's easy to miss, especially as submodules tend to be a bit confusing.

@phil-blain
Copy link
Member Author

In my opinion it is really, really important for all commits on CICE's main branch to be buildable. This is why I raised this issue. If a commit is not buildable it greatly increases the difficulty of using git bisect, which is immensely helpful to find the commit that caused a regression.

One of the problems is that sometimes folks don't realize they are committing an update to Icepack, so having a questions on the PR template might not help

Well, the Icepack changes are listed in the "Files" tab of the PR, so if we point them there, it's easy to see if Icepack was changed or not, I would say. It won't be listed at all if it was not changed.

@apcraig
Copy link
Contributor

apcraig commented Aug 18, 2022

OK, this should now exist,

git push origin HEAD:refs/cice/9be1c35ae62199ae0ea9c4f339105c7aec3192bf

Thanks for the guidance @phil-blain. I will also update the CICE PR template.

@phil-blain
Copy link
Member Author

I checked if other older CICE commits also reference Icepack commits not found on Icepack's main branch. I've detailed this in phil-blain#44 (mostly in French).

I've pushed references for all those CICE commits to my Icepack fork, similar to what I suggested above for 9be1c35. So we might want to also push these refs over to CICE-Consortium/Icepack. This should do it:

# clone my fork
git clone https://github.com/phil-blain/Icepack icepack
cd icepack
# fetch the cice/* refs
git fetch origin 'refs/cice/*:refs/cice/*'
# push them upstream
git remote add upstream https://github.com/cice-consortium/Icepack
git push upstream  'refs/cice/*:refs/cice/*'

@apcraig
Copy link
Contributor

apcraig commented Aug 22, 2022

Thanks @phil-blain, I executed the commands above and it seems to have worked. Feel free to confirm that there aren't any missing references in CICE anymore. Hopefully that's true and we avoid this issue in the future. Thanks again!

@phil-blain
Copy link
Member Author

Thanks @apcraig. I confirmed the refs/cice/* refs in CICE-Consortium/Icepack are the same as in my fork.

Just to confirm: the missing references are still in CICE; changing that would involving changing all commit hashes (in CICE and Icepack!) which we do not want.

The refs/cice/* refs in Icepack are useful only to avoid these Icepack commits being garbage-collected (deleted) if ever GitHub changes their policy on unreachable commits (which is currently: nothing is ever deleted except if we get an explicit request to do so).

If we want to checkout an earlier CICE commit, and this commit references an Icepack commit not on main, we simply have to fetch it by its hash as I showed above.

I'll close. thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants