Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error: invalid memory address or nil pointer dereference #14098

Open
2 of 3 tasks
farcaller opened this issue Jun 16, 2023 · 14 comments
Open
2 of 3 tasks

runtime error: invalid memory address or nil pointer dereference #14098

farcaller opened this issue Jun 16, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@farcaller
Copy link
Contributor

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

At some point the sync stops working and the sync status reports runtime error: invalid memory address or nil pointer dereference.

The following stacktrace can be observed:

Recovered from panic: runtime error: invalid memory address or nil pointer dereference
goroutine 206 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x64
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppRefreshQueueItem.func1()
	/go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:1321 +0x50
panic({0x29cb4e0, 0x5cdd2f0})
	/usr/local/go/src/runtime/panic.go:890 +0x260
github.com/argoproj/gitops-engine/pkg/diff.structuredMergeDiff(0x400cfeb5f0)
	/go/pkg/mod/github.com/argoproj/gitops-engine@v0.7.1-0.20230512020822-b4dd8b8c3976/pkg/diff/diff.go:153 +0x90
github.com/argoproj/gitops-engine/pkg/diff.StructuredMergeDiff(0x40040f34c0?, 0x400cfeb8f0?, 0x5?, {0x2f85024?, 0x412c2f8?})
	/go/pkg/mod/github.com/argoproj/gitops-engine@v0.7.1-0.20230512020822-b4dd8b8c3976/pkg/diff/diff.go:133 +0x4c
github.com/argoproj/gitops-engine/pkg/diff.Diff(0x40040f3360, 0x40040f3368, {0x400cfeb8f0, 0x5, 0x5})
	/go/pkg/mod/github.com/argoproj/gitops-engine@v0.7.1-0.20230512020822-b4dd8b8c3976/pkg/diff/diff.go:101 +0x180
github.com/argoproj/gitops-engine/pkg/diff.DiffArray({0x40040f3378, 0x1, 0x0?}, {0x40040f3370, 0x1?, 0x0?}, {0x400cfeb8f0, 0x5, 0x5})
	/go/pkg/mod/github.com/argoproj/gitops-engine@v0.7.1-0.20230512020822-b4dd8b8c3976/pkg/diff/diff.go:646 +0x11c
github.com/argoproj/argo-cd/v2/util/argo/diff.StateDiffs({0x40040f3358?, 0x4022d3bad0?, 0x2fa68f0?}, {0x40040f3348?, 0x4022fb12d0?, 0xa?}, {0x413b8b8, 0x401e9fcc60?})
	/go/src/github.com/argoproj/argo-cd/util/argo/diff/diff.go:266 +0x3ec
github.com/argoproj/argo-cd/v2/controller.(*appStateManager).CompareAppState(0x400087d130, 0x40011b5c00, 0x400917f8c0, {0x400cbc5060, 0x1, 0x1}, {0x4005a7b960?, 0x1, 0x1}, 0x0, ...)
	/go/src/github.com/argoproj/argo-cd/controller/state.go:556 +0x24cc
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppRefreshQueueItem(0x40007ebd40)
	/go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:1428 +0xd00
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run.func3()
	/go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:732 +0x2c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x3a22000000006172?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:155 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x7b3a226e6f697461?, {0x40fd540, 0x40010b0d20}, 0x1, 0x40000bfb00)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:156 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x7463657269643a66?, 0x3b9aca00, 0x0, 0x6a?, 0x7d7b3a222e227b3a?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:133 +0x80
k8s.io/apimachinery/pkg/util/wait.Until(0x66222c7d7b3a2268?, 0x4c52556f7065723a?, 0x3a66222c7d7b3a22?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:90 +0x28
created by github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run
	/go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:731 +0x59c

I'm pretty much reporting it only because nil pointer dereference is a bad thing(tm) and it seems like I triggered some very non-trivial race condition. Looking through the code, I don't have an immediate clue of what could have gone wrong there.

To Reproduce

I wasn't able to repo properly. It was 100% reproduceable on the broken instance but went away after the restart (and was fine since). Just before the issue happened I manually triggered a sync for a bunch of projects.

Expected behavior

Screenshots

image

Version

2.7.3
@farcaller farcaller added the bug Something isn't working label Jun 16, 2023
@crenshaw-dev
Copy link
Collaborator

@farcaller
Copy link
Contributor Author

I can reproduce it pretty much weekly now, with the same stacktrace. If any debugging info would be useful, I can annotate some code around with extra logging.

@crenshaw-dev
Copy link
Collaborator

I think either pt or p.live must be nil. Wanna add this?

if pt == nil || p.live == nil {
    log.Errorf("pt: %v, p.live: %v, gvk: %v", pt, p.live, gvk)
}

It would at least give us a start.

@Amr-Aly
Copy link

Amr-Aly commented Aug 4, 2023

Facing the same issue on v2.7.7, any workaround for this?

@farcaller
Copy link
Contributor Author

It looks like I figured where this one is coming from.

I had argo-cd manage a VolumeSnapshotClass, specifically snapshot.storage.k8s.io/v1beta1 of it. But I'm running v1.27.3 now, which actually removed the v1beta1 of it now that it's stable.

Argocd fails to diff it because live's empty at that version.

A simple fix my side was to use the proper GVK for the resource, but practically, it should fail in a more user-friendly way up the chain.

@zswanson
Copy link

Hit this in 2.8.0 - the issue seemed to stem from a PodDisruptionBudget from the Kyverno chart, which used a cluster capabilities function to decide if it was creating the new v1 or deprecated v1beta1 PDB kind. Had to remove the offending PDB from the clusters manually before I could get a sync to work.

@blakebarnett
Copy link
Contributor

We see this quite often also. Currently our workaround is to invalidate the cache for the affected cluster. We're running crossplane so this GVK issue would probably be more extreme than usual.

ArgoCD: v2.6.7
k8s: v1.24.15 (EKS)

@bravosierrasierra
Copy link

my problem was api version change: HorizontalPodAutoscaler/v2beta1 to HorizontalPodAutoscaler/v2

time="2023-10-17T12:08:21Z" level=info msg="Adding resource result, status: 'SyncFailed', phase: '', message: 'the server could not find the requested resource'" application=argocd/app-backend-feature-test kind=HorizontalPodAutoscaler name=app-hpa namespace=backend-feature-test phase=Sync syncId=00005-RAPRf

message in gui after application recreate.
autoscaling/v2beta1/HorizontalPodAutoscaler
backend-feature-test
celery-hpa
SyncFailed
the server could not find the requested resource
autoscaling/v2beta1/HorizontalPodAutoscaler
backend-feature-test
app-hpa
SyncFailed
the server could not find the requested resource

contrintuitive problem :(

@ivan-cai
Copy link

We see this quite often also. Currently our workaround is to invalidate the cache for the affected cluster. We're running crossplane so this GVK issue would probably be more extreme than usual.

ArgoCD: v2.6.7 k8s: v1.24.15 (EKS)

@blakebarnett how to invalidate the cache?

@blakebarnett
Copy link
Contributor

Settings -> Clusters -> (click cluster) -> Invalidate Cache (top-left)

@ivan-cai
Copy link

Settings -> Clusters -> (click cluster) -> Invalidate Cache (top-left)

@blakebarnett Thx!

@prein
Copy link

prein commented Apr 10, 2024

Settings -> Clusters -> (click cluster) -> Invalidate Cache (top-left)

Is there a way to do it using kubectl? In my UI when I click cluster I get an error
image
and in a popup
image

EDIT. Restarting argocd-application-controller pod is the way

@prein
Copy link

prein commented Apr 10, 2024

Observing this in 2.8.1

@MohammedNoureldin
Copy link

MohammedNoureldin commented May 22, 2024

What to do if the the issue persists, and the solution above don't help?

runtime error: invalid memory address or nil pointer dereference

I invalidated the cache, removed the cluster. I even deployed the whole cluster (the remote cluster as I use ApplicationSet), still I see the same issue.

I addition I removed all pods of ArgoCD, but nothing helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants