Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo server unable to checkout commit #7898

Closed
alexmt opened this issue Dec 9, 2021 · 18 comments · Fixed by #11805
Closed

Repo server unable to checkout commit #7898

alexmt opened this issue Dec 9, 2021 · 18 comments · Fixed by #11805
Labels
bug Something isn't working

Comments

@alexmt
Copy link
Collaborator

alexmt commented Dec 9, 2021

Describe the bug

Repo server unable to checkout commit with the following error. Only workaround is to restart it:

ComparisonError: rpc error: code = Internal desc = Failed to checkout <sha>: `git checkout --force <sha>` failed exit status 128: fatal: Unable to create '/tmp/<reducted>/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again.

To Reproduce

No clear steps to reproduce

Expected behavior

Repo server should be able to self recover from such issue
.

Version

v2.2.0-rc1
@alexmt alexmt added the bug Something isn't working label Dec 9, 2021
@RyanW8
Copy link

RyanW8 commented Jul 14, 2022

We're running into the same issue running ArgoCD v2.3.3

@johnoct-au
Copy link

We're running in to the same issue running ArgoCD v2.4.6

@poenneby
Copy link

poenneby commented Sep 2, 2022

Same issue as above on v2.3.3

@Jacobh2
Copy link

Jacobh2 commented Sep 26, 2022

Same issue on v2.4.12 when using the git-generator for an applicationset

@fbozic
Copy link

fbozic commented Oct 4, 2022

Same issue as above on v2.4.11

@irizzant
Copy link
Contributor

Same as above on the last version 2.4.14

@SomniVertix
Copy link

Is there a workaround?

@arturhoo
Copy link
Contributor

arturhoo commented Dec 14, 2022

From a recent discussion on: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1670965919247029

@j3p0uk:

I believe we tracked down the root of this. We had a git checkout timeout and this lead the argo util code to run a kill, using SIGKILL, terminating the git process and leaving the lock file behind.
We think the util function should likely try a SIGTERM and wait for some time first, before the SIGKILL? This might allow a long-running process to clean up, and not poison the repo-server cache for future calls?

From myself:

"In particular, here are two log messages that exemplify the hypothesis above:

time="2022-12-14T18:31:53Z" level=error msg="`git checkout --force HASH` failed timeout after 1m30s" execID=7ddd4	
time="2022-12-14T18:31:54Z" level=error msg="`git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '/src/app/_argocd-repo/28e9c5c3-0a51-41e8-90b4-bbc2a5998a89/.git/index.lock': File exists.\n\nAnother git process seems to be running in this repository, e.g.\nan editor opened by 'git commit'. Please make sure all processes\nare terminated then try again. If it still fails, a git process\nmay have crashed in this repository earlier:\nremove the file manually to continue." execID=39bf8

From https://github.com/argoproj/argo-cd/blob/master/reposerver/repository/repository.go#L2138-L2154 - with the first checkout timing out after 90s. Once that repo-server pod gets into this state, no new checkout operations are successful.

Following my initial plan, I'm going to reduce this timeout and try to repro it consistently locally."

@j3p0uk
Copy link

j3p0uk commented Dec 14, 2022

It might be that increasing ARGOCD_EXEC_TIMEOUT from the 90s default will reduce occurrences of this. In our investigation we were seeing ranges of git checkouts all the way up to the timeout limit.

@arturhoo
Copy link
Contributor

arturhoo commented Dec 15, 2022

I just managed to repro it locally on my mbp:

  1. Create app of apps: argocd app create guestbooks --repo https://github.com/arturhoo/argocd-example-apps-large.git --path apps --dest-server https://kubernetes.default.svc --dest-namespace default --server 127.0.0.1:8080 --insecure --plaintext
  2. Sync everything (allow namespaces to be created)
  3. Patch repo-server: 526c2a6
  4. Restart repo-server
  5. Get manifest for a random commit in the repo (~6000 commits, ~250MB size): argocd app manifests helm-guestbook2 --revision 3cf990c141922e61becdfd7d900902bdcb57c238 --server 127.0.0.1:8080 --insecure --plaintext

Result:

$ argocd app manifests helm-guestbook2 --revision 3cf990c141922e61becdfd7d900902bdcb57c238 --server 127.0.0.1:8080 --insecure --plaintext
FATA[0001] rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '/private..git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

Application status can't be determined:

$ argocd app get helm-guestbook2 --server 127.0.0.1:8080 --insecure --plaintext
Name:               argocd/helm-guestbook2
Project:            default
Server:             https://kubernetes.default.svc
Namespace:          helm-guestbook2
URL:                http://127.0.0.1:8080/applications/helm-guestbook2
Repo:               https://github.com/arturhoo/argocd-example-apps-large
Target:             main
Path:               helm-guestbook2
SyncWindow:         Sync Allowed
Sync Policy:        <none>
Sync Status:        Unknown
Health Status:      Healthy

CONDITION        MESSAGE  LAST TRANSITION
ComparisonError  rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '/private..git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.  2022-12-15 13:27:35 +0000 GMT


GROUP  KIND        NAMESPACE        NAME             STATUS   HEALTH   HOOK  MESSAGE
       Namespace                    helm-guestbook2  Running  Synced         namespace/helm-guestbook2 created
       Service     helm-guestbook2  helm-guestbook2  Unknown  Healthy        service/helm-guestbook2 created
apps   Deployment  helm-guestbook2  helm-guestbook2  Unknown  Healthy        deployment.apps/helm-guestbook2 created

By killing the git checkout process with SIGTERM (instead of SIGKILL), repo-server is able to recover and continue to server other requests (for example app get):

Validated with:

Sample: https://gist.github.com/arturhoo/b3793d35908f932a7856aee17e5ab57f

crenshaw-dev added a commit that referenced this issue Feb 1, 2023
…7898) (#11805)

* Pull in new version of argoproj/pkg

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Allow timeout behavior to be specified in util/exec/exec

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Git processes receive SIGTERM when timedout

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Update util/exec/exec_test.go

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
crenshaw-dev added a commit that referenced this issue Feb 2, 2023
…7898) (#11805)

* Pull in new version of argoproj/pkg

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Allow timeout behavior to be specified in util/exec/exec

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Git processes receive SIGTERM when timedout

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Update util/exec/exec_test.go

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
schakrad pushed a commit to schakrad/argo-cd that referenced this issue Mar 14, 2023
…rgoproj#7898) (argoproj#11805)

* Pull in new version of argoproj/pkg

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Allow timeout behavior to be specified in util/exec/exec

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Git processes receive SIGTERM when timedout

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>

* Update util/exec/exec_test.go

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: schakrad <chakradari.sindhu@gmail.com>
@emmahsax
Copy link

emmahsax commented Oct 25, 2023

We are seeing a very similar issue:

rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '..git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue.

We are running ArgoCD v2.7.9+0ee33e5.dirty, built in July 2023. Any suggestions?

@vgzclk
Copy link

vgzclk commented Nov 6, 2023

Similar error at argocd v2.8.4+c279299

time="2023-11-05T01:43:59Z" level=error msg="finished unary call with code Unknown" error="error getting cached app resource tree: ComparisonError: Failed to load target state: failed to generate manifest for source 1 of 1: rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: git checkout --force FETCH_HEAD failed timeout after 10m0s" grpc.code=Unknown grpc.method=ResourceTree grpc.service=application.ApplicationService grpc.start_time="2023-11-05T01:43:59Z" grpc.time_ms=1.619 span.kind=server system=grpc

@flightofthunder
Copy link

Hi, similar issue as @emmahsax as well, on v2.8.4+ebab8ec, using the OpenShift GitOps operator.

I noticed it seems to happen after the repo-server instances perform the daily hard-refresh on all apps (500+). The resources used by the two argo-repo-server pods peek, and there are sometimes some apps that stay blocked with the index.lock error. This was not always the case, and did not happen on a daily basis...

I was thinking it was possibly due to our use of a volume mount for the /tmp directory on the repo-server pods. Even deleting the pods and letting the operator recreate them did not fix the issue - it was necessary to manually delete the index.lock files by using the container terminal (not ideal), or finally to deactivate the volume. We are monitoring to see if the issue happens again.

@tomerle03
Copy link

Hi, similar issue as @emmahsax as well, on v2.8.4+ebab8ec, using the OpenShift GitOps operator.

I noticed it seems to happen after the repo-server instances perform the daily hard-refresh on all apps (500+). The resources used by the two argo-repo-server pods peek, and there are sometimes some apps that stay blocked with the index.lock error. This was not always the case, and did not happen on a daily basis...

I was thinking it was possibly due to our use of a volume mount for the /tmp directory on the repo-server pods. Even deleting the pods and letting the operator recreate them did not fix the issue - it was necessary to manually delete the index.lock files by using the container terminal (not ideal), or finally to deactivate the volume. We are monitoring to see if the issue happens again.

Hi, I'm also using RedHat gitops operator with ArgoCD v2.8.4 and when I removed the PVC from /tmp the sync operation worked fine

@ronmegini
Copy link

ronmegini commented Mar 14, 2024

A similar error happened to me on argocd version v2.10.1+a79e0ea.
Argocd applications were stuck in an unknown state with the following error:

Failed to load target state: failed to generate manifest for source 1 of 1: rpc error: code = Unknown desc = failed to initialize repository resources: rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '<path to cached source>/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue.

Although no git commit process was open, and the file .git/index.lock didn't exist.
Also noticed many similar logs in argocd repo-server pod logs for the last 24h.

Eventually fixed by deleting the repo-server pod and refreshing all argocd applications.

What is the root cause of this behavior? Is it a bug? Any fix?

@jeremysprofile
Copy link

We also had this happen to us on v2.8.0+804d4b8. Rolling the repo-server deploy worked for us.

@danny-devops
Copy link

A similar error happened to me on argocd version v2.10.1+a79e0ea. Argocd applications were stuck in an unknown state with the following error:

Failed to load target state: failed to generate manifest for source 1 of 1: rpc error: code = Unknown desc = failed to initialize repository resources: rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git checkout --force FETCH_HEAD` failed exit status 128: fatal: Unable to create '<path to cached source>/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue.

Although no git commit process was open, and the file .git/index.lock didn't exist. Also noticed many similar logs in argocd repo-server pod logs for the last 24h.

Eventually fixed by deleting the repo-server pod and refreshing all argocd applications.

What is the root cause of this behavior? Is it a bug? Any fix?

We are experiencing the same issue on v2.10.7+b060053
Similarly to @jeremysprofile we workaround by rolling the repo-server.

Can this issue be opened for further investigation or any update provided if you are aware of it?

@pdeva
Copy link

pdeva commented May 13, 2024

we are still seeing this issue in argo 2.10.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.