New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix detection of missing commits on checkout #2322
Conversation
I upgraded from 3.31 to 3.52 and started seeing 'reference is not a tree' errors in CI. Something in the Buildkite code was not catching this properly and causing the agent to recreate the repo. In my case, this was being caused by force pushes to branches with open PRs happening soon after a previous push. ``` $ git fetch -v --prune -- origin refs/pull/123/head ... From github.com:my/repo * branch refs/pull/123/head -> FETCH_HEAD $ git checkout -f 7890000000 fatal: reference is not a tree: 7890000000⚠️ Warning: Checkout failed! checking out commit "7890000000": exit status 128 (Attempt 1/3 Retrying in 2s) $ git clone -v -- git@github.com:my/repo.git . Cloning into '.'... ... $ git clean -fdqx -e /.local $ git fetch -v --prune -- origin refs/pull/123/head ``` Based on buildkite#2286, I assume something similar is happening - the output of the process isn't being properly captured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @goodspark. I had a play around on v3.31, and I doubt the old code ever detected the error 🤷♂️ But I could only simulate git writing this error message, I could not reproduce it, even when I force pushed on PRs builds.
So it seems there is some regression in the agent between v3.31 and now that causes git to get into this state. So, to help us debug that, can you link us to some builds - use support@buildkite.com if you would like you keep the links private.
The old code detected the error fine, as this often happened for me and the job would just exit with an error instead of trying to clean and reclone the repo (which, given the size of the repo I'm working with, started causing time outs and job backups because of increased job times). There is no corrupted state IMO - this is just caused by force pushes to a branch I think. |
Thanks for the links to the logs, and the additional context. I think I see what's going on. This code in v3.31: Lines 47 to 54 in eade03a
was not (and I suspect ever) detecting the that git exited with an error and printed "fatal: reference is not a tree: " . It did not return a gitError of type gitErrorCheckoutReferenceIsNotATree , but would instead return a one type gitErrorCheckout .
But, the code in Lines 1052 to 1063 in eade03a
did not wipe the checkout directory for either gitErrorCheckout or gitErrorCheckoutReferenceIsNotATree . So in v3.31 you were seeing that it did not wipe the checkout directory when you force pushed.
This situation changed in v3.48 which included this PR: https://github.com/buildkite/agent/pull/2137/files which introduced a new error type, Lines 58 to 70 in f7a91c2
As you can see, it was intended that an exit code of 128 with fatal: reference is not a tree: would be returned earlier if it was detected. But at the time, we were not aware that the mechanism to detect it did not work.
This PR will fix the detection mechanism, so I think the previous behaviour is restored. |
I upgraded from 3.31 to 3.52 and started seeing 'reference is not a tree' errors in CI. Something in the Buildkite code was not catching this properly and causing the agent to recreate the repo.
In my case, this was being caused by force pushes to branches with open PRs happening soon after a previous push.
Based on #2286, I assume something similar is happening - the output of the process isn't being properly captured.