Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/k8s: fix invalid memory address or nil pointer dereference #17642

Merged
merged 1 commit into from Oct 22, 2021

Conversation

aanm
Copy link
Member

@aanm aanm commented Oct 19, 2021

Since k8sMeta might return nil, we should check for it before accessing
the fields of that structure otherwise we will risk on panic for a nil
pointer dereference.

Fixes: 63c0b29 ("Checks k8s metadata for pod before removing IP from ipccahe")
Signed-off-by: André Martins andre@cilium.io

2021-10-18T14:33:44.882292070Z Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
2021-10-18T14:33:44.882354474Z goroutine 269 [running]:
2021-10-18T14:33:44.882361875Z k8s.io/apimachinery/pkg/util/runtime.logPanic({0x226b480, 0x4037bc0})
2021-10-18T14:33:44.882367475Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x85
2021-10-18T14:33:44.882374176Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00013ebe0})
2021-10-18T14:33:44.882382676Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
2021-10-18T14:33:44.882388577Z panic({0x226b480, 0x4037bc0})
2021-10-18T14:33:44.882392477Z 	/usr/local/go/src/runtime/panic.go:1038 +0x215
2021-10-18T14:33:44.882396577Z github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointDeleted(0xc0009ec380, 0xc001f31bc0)
2021-10-18T14:33:44.882412178Z 	/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_endpoint.go:214 +0xc5
2021-10-18T14:33:44.882416979Z github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumEndpointsInit.func3({0x25f7ce0, 0xc001f31bc0})
2021-10-18T14:33:44.882421779Z 	/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_endpoint.go:71 +0xa5
2021-10-18T14:33:44.882425479Z k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
2021-10-18T14:33:44.882429380Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:245
2021-10-18T14:33:44.882434180Z github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1({0x22ae5a0, 0xc001e70708})
2021-10-18T14:33:44.882437880Z 	/go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:114 +0x216
2021-10-18T14:33:44.882441380Z k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc00013ebe0, 0xc0000b1300)
2021-10-18T14:33:44.882444981Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:544 +0x2ea
2021-10-18T14:33:44.882448781Z k8s.io/client-go/tools/cache.(*controller).processLoop(0xc00041d440)
2021-10-18T14:33:44.882452581Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x36
2021-10-18T14:33:44.882456081Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f0cdf4d71e0)
2021-10-18T14:33:44.882459382Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
2021-10-18T14:33:44.882462982Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xf579c8, {0x2a4b760, 0xc0004c6870}, 0x1, 0xc0000565a0)
2021-10-18T14:33:44.882466282Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
2021-10-18T14:33:44.882469682Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00041d4a8, 0x3b9aca00, 0x0, 0x80, 0x7f0cdf2e81e8)
2021-10-18T14:33:44.882497484Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
2021-10-18T14:33:44.882501484Z k8s.io/apimachinery/pkg/util/wait.Until(...)
2021-10-18T14:33:44.882504985Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
2021-10-18T14:33:44.882508685Z k8s.io/client-go/tools/cache.(*controller).Run(0xc00041d440, 0xc0000565a0)
2021-10-18T14:33:44.882512485Z 	/go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x2fb
2021-10-18T14:33:44.882515885Z created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumEndpointsInit
2021-10-18T14:33:44.882519986Z 	/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_endpoint.go:87 +0x65

@aanm aanm added the release-note/misc This PR makes changes that have no direct user impact. label Oct 19, 2021
@aanm aanm requested review from a team and errordeveloper October 19, 2021 13:28
@aanm aanm added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/misc This PR makes changes that have no direct user impact. labels Oct 19, 2021
@aanm aanm changed the title kg/k8s: fix invalid memory address or nil pointer dereference pkg/k8s: fix invalid memory address or nil pointer dereference Oct 19, 2021
Since k8sMeta might return nil, we should check for it before accessing
the fields of that structure otherwise we will risk on panic for a nil
pointer dereference.

Fixes: 63c0b29 ("Checks k8s metadata for pod before removing IP from ipccahe")
Signed-off-by: André Martins <andre@cilium.io>
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if it's expected that we try to remove from the ipcache an IP address that isn't in the ipcache? /cc @Weil0ng

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@Weil0ng
Copy link
Contributor

Weil0ng commented Oct 19, 2021

Do we know if it's expected that we try to remove from the ipcache an IP address that isn't in the ipcache? /cc @Weil0ng

So removal happens when we get an CEP delete event and the insertion should happen when we first get the add event for the CEP, correct? I could be missing something but if the k8s cache is synced, then we should always have the IP address in the ipcache before getting the delete event?

@aanm
Copy link
Member Author

aanm commented Oct 20, 2021

Do we know if it's expected that we try to remove from the ipcache an IP address that isn't in the ipcache? /cc @Weil0ng

So removal happens when we get an CEP delete event and the insertion should happen when we first get the add event for the CEP, correct? I could be missing something but if the k8s cache is synced, then we should always have the IP address in the ipcache before getting the delete event?

@Weil0ng I think we might have received the CEP delete event for a CEP that didn't have any metadata in the ipcache. That could have happened before the k8s cache was synced.

@Weil0ng
Copy link
Contributor

Weil0ng commented Oct 20, 2021

@Weil0ng I think we might have received the CEP delete event for a CEP that didn't have any metadata in the ipcache. That could have happened before the k8s cache was synced.

Yea makes sense.

@aanm
Copy link
Member Author

aanm commented Oct 20, 2021

/test

@aanm aanm added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 22, 2021
@kkourt kkourt merged commit 5802e53 into cilium:master Oct 22, 2021
@codablock
Copy link
Contributor

Can this be backported to 1.10? I stumbled across this crash today and was forced to switch to a CI image.

@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.6 Nov 5, 2021
@pchaigno
Copy link
Member

pchaigno commented Nov 5, 2021

This is fixing a bug, so per our backporting guidelines should be backported to v1.10. I think it was just an oversight before. @codablock Thanks for reporting it! 🙏

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.6 Nov 12, 2021
@aanm
Copy link
Member Author

aanm commented Nov 16, 2021

@codablock can you open a GH issue with the panic that you have seen? The bug this PR is fixing does not exist on v1.10 since the code was introduced in v1.11. That's why it was not marked as to be backported. /cc @pchaigno

@codablock
Copy link
Contributor

@aanm I don't have the logs anymore. But I later realised that I was on v1.11-rc0 due the Helm chart being on that version already. So I assume I got unlucky and was using a version that had the bug freshly introduced but the fix missing.

Copy link
Contributor

@Weil0ng Weil0ng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized, shouldn't this line need fix too? https://github.com/cilium/cilium/blob/master/pkg/k8s/watchers/pod.go#L828

But maybe this is superseded by #17909 and doesn't require a separate fix?

@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.10 in 1.10.6 Nov 17, 2021
@joestringer
Copy link
Member

@aanm do you know the answer to @Weil0ng 's question above?

@Weil0ng
Copy link
Contributor

Weil0ng commented Dec 8, 2021

@aanm do you know the answer to @Weil0ng 's question above?

I think it is captured in #17909 :)

@aanm aanm deleted the pr/fix-panic branch December 9, 2021 02:40
@aanm
Copy link
Member Author

aanm commented Dec 9, 2021

@aanm do you know the answer to @Weil0ng 's question above?

I think it is captured in #17909 :)

Sorry @Weil0ng if people don't ping me directly on GH I will miss messages :-)

@Weil0ng
Copy link
Contributor

Weil0ng commented Dec 10, 2021

no worries, my bad :) Will make sure to tag folks when it's a directed question next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants