Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: prune any deleted refs before fetching #9504

Merged

Conversation

KevinSnyderCodes
Copy link
Contributor

@KevinSnyderCodes KevinSnyderCodes commented May 25, 2022

This PR modifies nativeGitClient.Fetch() to call git remote prune origin before fetching refs.

In some cases, an old branch may exist that conflicts with the name of a new branch. The old branch will have been deleted from origin but still exist locally in the argocd-repo-server.

Example: an old branch feature/foo conflicts with a new branch feature/foo/bar

In these cases, syncing an application results in the error:

rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force` failed exit status 1: error: cannot lock ref 'refs/remotes/origin/feature/foo/bar': 'refs/remotes/origin/feature/foo' exists; cannot create 'refs/remotes/origin/feature/foo/bar' From https://github.com/org/repo ! [new branch] feature/foo/bar -> origin/feature/foo/bar (unable to update local ref) error: some local refs could not be updated; try running 'git remote prune origin' to remove any old, conflicting branches

Adding git remote prune origin before fetching, as recommended by the error message, should fix this issue.

The current workaround is to restart the argocd-repo-server which should flush the local repository folder. This works when Argo CD is installed using the Helm chart.


Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
    • This is a bug fix
  • The title of the PR states what changed and the related issues number (used for the release note).
    • To my knowledge, there are no related issues
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
    • To my knowledge, there are no related issues
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
    • This is a big fix
  • Does this PR require documentation updates?
    • No
  • I've updated documentation as required by this PR.
    • N/A
  • Optional. My organization is added to USERS.md.
    • Skipping
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

@KevinSnyderCodes KevinSnyderCodes changed the title git: prune any deleted refers before fetching git: prune any deleted refs before fetching May 25, 2022
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, great catch of an edge case! I wonder if a unit test would be possible here. Maybe using a file:// repo as "remote"?

@crenshaw-dev
Copy link
Collaborator

Considering this seems to be a fairly far-edge case, I also wonder if the "happy path" should skip the prune and then only run if there's an error on fetch.

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #9504 (c2c0927) into master (c4306da) will increase coverage by 0.02%.
The diff coverage is 61.53%.

@@            Coverage Diff             @@
##           master    #9504      +/-   ##
==========================================
+ Coverage   45.84%   45.86%   +0.02%     
==========================================
  Files         221      221              
  Lines       26300    26309       +9     
==========================================
+ Hits        12057    12067      +10     
+ Misses      12589    12586       -3     
- Partials     1654     1656       +2     
Impacted Files Coverage Δ
util/git/client.go 48.13% <61.53%> (+1.98%) ⬆️
util/settings/settings.go 48.16% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 059f947...c2c0927. Read the comment docs.

@KevinSnyderCodes
Copy link
Contributor Author

@crenshaw-dev Thanks for the quick response 🙂

Interesting, great catch of an edge case! I wonder if a unit test would be possible here. Maybe using a file:// repo as "remote"?

Should be doable. I'll see what can be done.

Considering this seems to be a fairly far-edge case, I also wonder if the "happy path" should skip the prune and then only run if there's an error on fetch.

Good idea, I'll change the implementation to do this. Maybe even read the contents of the error and determine if a prune would be worthwhile.

@crenshaw-dev
Copy link
Collaborator

Good idea, I'll change the implementation to do this. Maybe even read the contents of the error and determine if a prune would be worthwhile.

I like that. A second unit test could check to make sure the git error string checking works (so if we upgrade git, we remember to update the string if it changes).

util/git/client.go Outdated Show resolved Hide resolved
@crenshaw-dev
Copy link
Collaborator

@KevinSnyderCodes thanks for the changes! Can you sign your commits to get DCO to pass?

@crenshaw-dev crenshaw-dev requested a review from leoluz May 26, 2022 18:27
KevinSnyderCodes and others added 6 commits May 26, 2022 11:46
This commit modifies `nativeGitClient.Fetch()` to call `git remote prune origin` before fetching refs.

In some cases, an old branch may exist that conflicts with the name of a new branch. The old branch will have been deleted from `origin` but still exist locally in the `argocd-repo-server`.

Example: an old branch `feature/foo` conflicts with a new branch `feature/foo/bar`

In these cases, syncing an application results in the error:

```
rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force` failed exit status 1: error: cannot lock ref 'refs/remotes/origin/feature/foo/bar': 'refs/remotes/origin/feature/foo' exists; cannot create 'refs/remotes/origin/feature/foo/bar' From https://github.com/org/repo ! [new branch] feature/foo/bar -> origin/feature/foo/bar (unable to update local ref) error: some local refs could not be updated; try running 'git remote prune origin' to remove any old, conflicting branches
```

Adding `git remote prune origin` before fetching, as recommended by the error message, should fix this issue.

The current workaround is to restart the `argocd-repo-server` which should flush the local repository folder. This works when Argo CD is installed using the Helm chart.

Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
* fix: added extra protection to syncing app with replace

Signed-off-by: ciiay <yicai@redhat.com>

* Code clean up

Signed-off-by: ciiay <yicai@redhat.com>

* Updated logic for isAppOfAppsPattern

Signed-off-by: ciiay <yicai@redhat.com>

* Updated text strings as per comment

Signed-off-by: ciiay <yicai@redhat.com>

* Fixed lint issue

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
* chore: Simplified GetRepoHTTPClient function

Signed-off-by: ls0f <lovedboy.tk@qq.com>

* simplified code and improve unit test coverage

Signed-off-by: ls0f <lovedboy.tk@qq.com>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
…d unit tests

Confirmed that `Test_nativeGitClient_Fetch_Prune` fails without the bug fix, succeeds with it.

Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
…roj#9434)

* fix: avoid k8s API call before authorization in k8s endpoint

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* check for bad project

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* lint

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* more logging

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* handle 404, return 500 instead of 400 for other errors

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* use user input

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* refactor validation

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* fix tests

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* fixes, tests

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks again @KevinSnyderCodes!

@crenshaw-dev crenshaw-dev enabled auto-merge (squash) May 27, 2022 14:55
@crenshaw-dev crenshaw-dev merged commit 2a69038 into argoproj:master May 27, 2022
@sidewinder12s
Copy link

sidewinder12s commented May 31, 2022

Thanks for this, I think I've actually run into this edge case repeatedly with a large monorepo with the only solution be to just restart the repo-servers.

Though my error message surfaced is slightly different from this one.

[2022-05-31 19:17:45,071 INFO legacy:210] time="2022-05-31T19:17:45Z" level=fatal msg="rpc error: code = InvalidArgument desc = application spec for app is invalid: InvalidSpecError: Unable to get app details: rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force` failed exit status 1: From https://github.com/REPO%0A   3ca76ebc810..d0650727103  branch_name -> origin/branch_name%0A   6fd5c9b25bf..c9a3ecb37eb  branch_name_2 -> branch_name_2%0Aerror: cannot lock ref 'refs/remotes/origin/duplicate_branch/test': 'refs/remotes/origin/duplicate_branch' exists; cannot create 'refs/remotes/origin/duplicate_branch/test'%0A ! [new branch]              duplicate_branch/test     -> origin/duplicate_branch/test  (unable to update local ref)%0A * [new branch]              branch_3 -> origin/branch_3%0A   Auto packing the repository in background for optimum performance.%0ASee \"git help gc\" for manual housekeeping."

(Full message edited a bit to remove real branch/repo names)

@kadamanas93
Copy link

Thanks for this, I think I've actually run into this edge case repeatedly with a large monorepo with the only solution be to just restart the repo-servers.

Though my error message surfaced is slightly different from this one.

[2022-05-31 19:17:45,071 INFO legacy:210] time="2022-05-31T19:17:45Z" level=fatal msg="rpc error: code = InvalidArgument desc = application spec for app is invalid: InvalidSpecError: Unable to get app details: rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force` failed exit status 1: From https://github.com/REPO%0A   3ca76ebc810..d0650727103  branch_name -> origin/branch_name%0A   6fd5c9b25bf..c9a3ecb37eb  branch_name_2 -> branch_name_2%0Aerror: cannot lock ref 'refs/remotes/origin/duplicate_branch/test': 'refs/remotes/origin/duplicate_branch' exists; cannot create 'refs/remotes/origin/duplicate_branch/test'%0A ! [new branch]              duplicate_branch/test     -> origin/duplicate_branch/test  (unable to update local ref)%0A * [new branch]              branch_3 -> origin/branch_3%0A   Auto packing the repository in background for optimum performance.%0ASee \"git help gc\" for manual housekeeping."

(Full message edited a bit to remove real branch/repo names)

Your error log doesn't contain this string which the check relies on. Will it still run git remote prune origin then?

I am getting the same error log so I hope this fixes it correctly.

Also, would this PR be part of the next release?

@crenshaw-dev
Copy link
Collaborator

@kadamanas93 I can cherry-pick this onto 2.4 (or earlier if anyone needs it). I don't think this PR will fix your problem, since your error string is different.

Can you outline the steps to create a situation which produces your error message? I'd want to write a unit test along with the fix to make sure we don't accidentally regress from fixing your issue.

@kadamanas93
Copy link

kadamanas93 commented Jun 13, 2022

@kadamanas93 I can cherry-pick this onto 2.4 (or earlier if anyone needs it). I don't think this PR will fix your problem, since your error string is different.

Can you outline the steps to create a situation which produces your error message? I'd want to write a unit test along with the fix to make sure we don't accidentally regress from fixing your issue.

I really don't know how it's different than what OP described except the error message is different. I do have some extra log lines for my error message though:

rpc error: code = Internal desc = Failed to fetch <git-sha>: 
`git fetch origin --tags --force` failed exit status 1: Failed to add the ECDSA 
host key for IP address '<ip-address>' to the list of known hosts 
(/app/config/ssh/ssh_known_hosts). error: cannot lock ref 
'refs/remotes/origin/foo/bar': 
'refs/remotes/origin/foo' exists; cannot create 
'refs/remotes/origin/foo/bar' 
From github.com:***** ! [new branch] 
foo/bar -> origin/foo/bar (unable to update 
local ref) Auto packing the repository in background for optimum performance. 
See "git help gc" for manual housekeeping. warning: The last gc run reported 
the following. Please correct the root cause and remove .git/gc.log. Automatic 
cleanup will not be performed until the file is removed. warning: There are 
too many unreachable loose objects; run 'git prune' to remove them.

@sidewinder12s
Copy link

My guess was the difference was the old branch was in the namespace/path of the new name and then the old branch was removed upstream.

@crenshaw-dev
Copy link
Collaborator

That seems possible. @kadamanas93 if you can write out the git steps to reproduce the issue with that error message, I'd be happy to write up the fix and the unit test.

@kadamanas93
Copy link

That seems possible. @kadamanas93 if you can write out the git steps to reproduce the issue with that error message, I'd be happy to write up the fix and the unit test.

I am not sure if I managed to replicate the exact error I have but based on @sidewinder12s hypothesis, here is how I produced the error. I cloned a repo in two separate directories:

# clone repos
mkdir clone_1 clone_2
cd clone_1 && git clone git@github.com:kadamanas93/repo.git
cd ../clone_2 && git clone git@github.com:kadamanas93/repo.git

# create branch
cd clone_1/repo
git checkout -b foo
git push -u origin foo

# branch foo should be in repo clone 2
cd ../../clone_2/repo
git fetch

# delete the branch locally and remotely
cd ../../clone_1/repo
git branch -d foo
git push -d origin foo

# create new branch with conflicts
git checkout -b foo/bar
git push -u origin foo/bar

cd ../../clone_2/repo
git fetch origin --tags --force
error: cannot lock ref 'refs/remotes/origin/foo/bar': 'refs/remotes/origin/foo' exists; cannot create 'refs/remotes/origin/foo/bar'
From github.com:kadamanas93/repo
 ! [new branch]      foo/bar    -> origin/foo/bar  (unable to update local ref) 

@crenshaw-dev
Copy link
Collaborator

@kadamanas93 unfortunately I think that's the exact same process as what's tested in this PR. Which makes me wonder why the error message is different. :-/

@KevinSnyderCodes
Copy link
Contributor Author

@crenshaw-dev What are your thoughts on always running git remote prune origin regardless of the error message? I know we decided not to, but it should always be safe to run, and we wouldn't need to worry about covering the various error messages.

The main downside I can think of is adding an unnecessary operation and latency to the fetch() function.

Maybe we can put the behavior behind some global flag, if that's a pattern Argo CD likes to use.

@crenshaw-dev
Copy link
Collaborator

@KevinSnyderCodes fair... I think I'd be alright with an always-prune option behind a flag. I think I'd much, much prefer to have a test case and logic specifically tailored to the issue though. Argo CD is already very config-heavy, and I prefer to avoid more cognitive load.

@sidewinder12s
Copy link

I also wonder if running prune on every fetch may introduce a performance regression with monorepos/large git repositories. I am not exactly sure how to reproduce since I hadn't created the breaking branches on our monorepo, but if I run into it again I can try and collect more info.

Would performing a prune always during hard refreshes be another option? Or on some cadence?

@crenshaw-dev
Copy link
Collaborator

I also wonder if running prune on every fetch may introduce a performance regression with monorepos/large git repositories.

Yeah, I feel like it's probably a fairly light operation, but I'm just not sure.

Would performing a prune always during hard refreshes be another option? Or on some cadence?

I like the hard-refresh option. Cadence gets tricky, but theoretically would work. I think I'd want an API endpoint so users could schedule their own Cron to do the prune.

@sidewinder12s
Copy link

Yeah, the only other thing I wasn't sure if you only exposed this through a hard refresh was how did that behave if you have multiple app server/repo-server replicas defined. Would you need to hit an app on each broken repo-server?

I like the hard-refresh option. Cadence gets tricky, but theoretically would work. I think I'd want an API endpoint so users could schedule their own Cron to do the prune.

That and/or just have the repo-server prune on a cron as well with the api endpoint available.

I'd generally thought that hard refresh would hard refresh everything on an app, but found that there were lots of little things that wouldn't get refreshed, so might want to force the prune during that as well anyways.

@crenshaw-dev crenshaw-dev added the cherry-pick/2.4 Candidate for cherry picking into the 2.4 release branch label Jun 29, 2022
crenshaw-dev added a commit that referenced this pull request Jun 29, 2022
* git: prune any deleted refers before fetching

This commit modifies `nativeGitClient.Fetch()` to call `git remote prune origin` before fetching refs.

In some cases, an old branch may exist that conflicts with the name of a new branch. The old branch will have been deleted from `origin` but still exist locally in the `argocd-repo-server`.

Example: an old branch `feature/foo` conflicts with a new branch `feature/foo/bar`

In these cases, syncing an application results in the error:

```
rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force` failed exit status 1: error: cannot lock ref 'refs/remotes/origin/feature/foo/bar': 'refs/remotes/origin/feature/foo' exists; cannot create 'refs/remotes/origin/feature/foo/bar' From https://github.com/org/repo ! [new branch] feature/foo/bar -> origin/feature/foo/bar (unable to update local ref) error: some local refs could not be updated; try running 'git remote prune origin' to remove any old, conflicting branches
```

Adding `git remote prune origin` before fetching, as recommended by the error message, should fix this issue.

The current workaround is to restart the `argocd-repo-server` which should flush the local repository folder. This works when Argo CD is installed using the Helm chart.

Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

* fix: added extra protection to syncing app with replace (#9187)

* fix: added extra protection to syncing app with replace

Signed-off-by: ciiay <yicai@redhat.com>

* Code clean up

Signed-off-by: ciiay <yicai@redhat.com>

* Updated logic for isAppOfAppsPattern

Signed-off-by: ciiay <yicai@redhat.com>

* Updated text strings as per comment

Signed-off-by: ciiay <yicai@redhat.com>

* Fixed lint issue

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

* chore: Simplified GetRepoHTTPClient function (#9396)

* chore: Simplified GetRepoHTTPClient function

Signed-off-by: ls0f <lovedboy.tk@qq.com>

* simplified code and improve unit test coverage

Signed-off-by: ls0f <lovedboy.tk@qq.com>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

* Only prune if fetch error message indicates that it is worthwhile, add unit tests

Confirmed that `Test_nativeGitClient_Fetch_Prune` fails without the bug fix, succeeds with it.

Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

* fix: avoid k8s call before authorization for terminal endpoint (#9434)

* fix: avoid k8s API call before authorization in k8s endpoint

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* check for bad project

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* lint

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* more logging

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* handle 404, return 500 instead of 400 for other errors

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* use user input

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* refactor validation

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* fix tests

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

* fixes, tests

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

* Match against "try running 'git remote prune origin'"

Signed-off-by: Kevin Snyder <kevin.snyder.codes@gmail.com>

Co-authored-by: Yi Cai <yicai@redhat.com>
Co-authored-by: ls0f <lovedboy.tk@qq.com>
Co-authored-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
@crenshaw-dev
Copy link
Collaborator

Yeah, the only other thing I wasn't sure if you only exposed this through a hard refresh was how did that behave if you have multiple app server/repo-server replicas defined. Would you need to hit an app on each broken repo-server?

That's fair, you'd have to hit each repo-server.

@crenshaw-dev
Copy link
Collaborator

Cherry-picked onto release-2.4 for 2.4.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick/2.4 Candidate for cherry picking into the 2.4 release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants