Skip to content

Fix nil pointer dereference in kubernetesResolver.Stop() when k8smetadata extension is active#2106

Merged
musa-asad merged 1 commit intomainfrom
fix/nil-guard-safestopch-close
Apr 30, 2026
Merged

Fix nil pointer dereference in kubernetesResolver.Stop() when k8smetadata extension is active#2106
musa-asad merged 1 commit intomainfrom
fix/nil-guard-safestopch-close

Conversation

@musa-asad
Copy link
Copy Markdown
Contributor

@musa-asad musa-asad commented Apr 30, 2026

Summary

Fix a nil pointer dereference (SIGSEGV) panic in kubernetesResolver.Stop() that occurs during agent shutdown when the k8smetadata extension is configured.

Fixes #1743

Problem

When the k8smetadata extension is present in the agent configuration, the kubernetesResolver is initialized via a code path that does not allocate a safeStopCh (the extension manages its own lifecycle independently). However, Stop() unconditionally calls e.safeStopCh.Close(), causing a nil pointer dereference panic during shutdown.

The panic stack trace looks like:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x291a66f]

goroutine 1 [running]:
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/resolver.(*kubernetesResolver).Stop(...)
    kubernetes.go:239

Fix

Add a nil check before calling Close() on safeStopCh. This matches the existing pattern already used in the k8smetadata extension's own Shutdown() method (extension/k8smetadata/extension.go).

Testing

Unit Tests

All existing tests pass, including TestEksResolver/Test_Stop:

$ go test ./plugins/processors/awsapplicationsignals/internal/resolver/ -v -count=1
=== RUN   TestEksResolver
=== RUN   TestEksResolver/Test_getWorkloadAndNamespaceByIP
=== RUN   TestEksResolver/Test_Stop
=== RUN   TestEksResolver/Test_Process_when_useListPod_is_true
=== RUN   TestEksResolver/Test_Process_when_useListPod_is_false
=== RUN   TestEksResolver/Test_extension_flag
--- PASS: TestEksResolver (0.00s)
PASS
ok  github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/resolver  0.062s

End-to-End Before/After EKS Verification

Tested on an EKS 1.30 cluster (2 managed nodes, t3.medium) with the CloudWatch Observability addon (v5.3.1-eksbuild.1), which enables the k8smetadata extension + Application Signals pipeline — the exact configuration that triggers the bug.

Both before and after confirmed k8smetadata extension active:

2026-04-30T05:03:28Z I! {"caller":"resolver/kubernetes.go:152","msg":"k8smetadata extension is present"}

Shutdown was triggered by sending SIGTERM to PID 1 inside the agent container via kubectl debug, which exercises the exact Stop()safeStopCh.Close() code path.

BEFORE (unfixed 1.300066.0b1367) — panic on every shutdown:

2026-04-30T05:08:04Z I! {"caller":"otelcol@v0.124.0/collector.go:358","msg":"Received signal from OS","signal":"terminated"}
2026-04-30T05:08:04Z I! {"caller":"service@v0.124.0/service.go:331","msg":"Starting shutdown..."}
2026-04-30T05:08:04Z I! Profiler is stopped during shutdown
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x28f668f]

goroutine 1 [running]:
internal/sync.(*Mutex).Lock(...)
	internal/sync/mutex.go:63
sync.(*Mutex).Lock(...)
	sync/mutex.go:46
github.com/aws/amazon-cloudwatch-agent/internal/k8sCommon/k8sclient.(*SafeChannel).Close(0x0)
	.../kubernetes_utils.go:208 +0x2f
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/resolver.(*kubernetesResolver).Stop(0x3?, {0x321b60bb18f8?, 0x321b60bb1860?})
	.../kubernetes.go:239 +0x1a

Container crashed and restarted (restart count: 1).

AFTER (this PR) — clean shutdown, 3/3 tests:

2026-04-30T05:10:18Z I! {"caller":"otelcol@v0.124.0/collector.go:358","msg":"Received signal from OS","signal":"terminated"}
2026-04-30T05:10:18Z I! {"caller":"service@v0.124.0/service.go:331","msg":"Starting shutdown..."}
2026-04-30T05:10:18Z I! Profiler is stopped during shutdown
2026-04-30T05:10:18Z I! {"caller":"extensions/extensions.go:69","msg":"Stopping extensions..."}
2026-04-30T05:10:18Z I! {"caller":"entitystore/extension.go:138","msg":"Pod to Service Environment Mapping TTL Cache stopped"}
2026-04-30T05:10:18Z I! {"caller":"service@v0.124.0/service.go:345","msg":"Shutdown complete."}

No panic, no SIGSEGV. Container restarted cleanly (restart count: 1, from SIGTERM — not a crash).

Test Image SIGTERM Result Panic?
Before 1.300066.0b1367 (unfixed) SIGSEGV crash YESkubernetes.go:239
After #1 This PR (e3addeb) Shutdown complete. NO
After #2 This PR (e3addeb) Shutdown complete. NO
After #3 This PR (e3addeb) Shutdown complete. NO

…data extension is active

When the k8smetadata extension is configured, the kubernetesResolver is
initialized without a safeStopCh (the extension manages its own lifecycle).
The Stop() method unconditionally called safeStopCh.Close(), causing a nil
pointer dereference (SIGSEGV) during shutdown.

Add a nil check before calling Close() to prevent the panic. This matches
the existing pattern used in the k8smetadata extension's own Shutdown()
method.

Fixes #1743
@musa-asad musa-asad requested a review from a team as a code owner April 30, 2026 04:18
@musa-asad musa-asad added the ready for testing Indicates this PR is ready for integration tests to run label Apr 30, 2026
@musa-asad musa-asad self-assigned this Apr 30, 2026
@JayPolanco JayPolanco self-requested a review April 30, 2026 16:00
@JayPolanco JayPolanco removed the request for review from nathalapooja April 30, 2026 16:02
@JayPolanco JayPolanco assigned JayPolanco and unassigned musa-asad Apr 30, 2026
@musa-asad musa-asad removed the ready for testing Indicates this PR is ready for integration tests to run label Apr 30, 2026
@JayPolanco JayPolanco removed their assignment Apr 30, 2026
@musa-asad musa-asad added skip testing ready for testing Indicates this PR is ready for integration tests to run and removed ready for testing Indicates this PR is ready for integration tests to run labels Apr 30, 2026
@musa-asad musa-asad merged commit 82cf464 into main Apr 30, 2026
667 of 726 checks passed
@musa-asad musa-asad deleted the fix/nil-guard-safestopch-close branch April 30, 2026 16:27
@musa-asad musa-asad self-assigned this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cannot get pod from kubelet, err: call to /pods endpoint failed

3 participants