-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustermesh-apiserver: fixed nil pointer dereference #18957
Conversation
Thanks for the PR @abocim! Added @cilium/sig-clustermesh to review this PR as I lack context. |
ee81dcc
to
f95e539
Compare
f95e539
to
3023735
Compare
This pull request has been automatically marked as stale because it |
/test |
3023735
to
9c1d8fa
Compare
Rebase done. |
/ci-external-workloads |
1 similar comment
/ci-external-workloads |
/test |
Hello, @jrajahalme, do you think the |
@abocim can you rebase again against master? Sorry for keep asking for this but the CI is currently failing because it's not rebased :-) |
... while clustermesh-apiserver is starting and new CEW with identity 0 is existing in etcd In original implementation clustermesh-apiserver and etcd are running in separate containers within one pod. So a new empty etcd is created while clustermesh-apiserver is starting. In our usecase we use an existing etcd which is shared with cilium-agent. An error below has been occurring when a new CEW resource was already applied into kubernetes and cilium-agent was already started on related external workload machine while clustermesh-apiserver was not deployed yet. $ ./clustermesh-apiserver --cluster-id=12 --cluster-name=uacl-test --k8s-kubeconfig-path ../../uacl/uacl-test.kubeconfig --kvstore-opt etcd.config=../../uacl/etcd.config level=info msg="Started gops server" address="127.0.0.1:9892" subsys=clustermesh-apiserver level=info msg="Starting clustermesh-apiserver..." cluster-id=12 cluster-name=uacl-test subsys=clustermesh-apiserver level=info msg="Establishing connection to apiserver" host="https://uacl-test-api.test:31243" subsys=k8s level=info msg="Connected to apiserver" subsys=k8s level=info msg="Waiting until all Cilium CRDs are available" subsys=k8s level=info msg="All Cilium CRDs have been found and are available" subsys=k8s level=info msg="Initializing identity allocator" subsys=identity-cache level=info msg="Creating etcd client" ConfigPath=../../uacl/etcd.config KeepAliveHeartbeat=15s KeepAliveTimeout=25s RateLimit=20 subsys=kvstore level=info msg="Started health API" subsys=clustermesh-apiserver level=info msg="Connecting to etcd server..." config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore level=info msg="Got lease ID 320f7d7b1f23bc32" subsys=kvstore level=info msg="Got lock lease ID 320f7d7b1f23bc34" subsys=kvstore level=info msg="Initial etcd session established" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore level=info msg="Successfully verified version of etcd endpoint" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" etcdEndpoint="https://uacl-test-api.test:30108" subsys=kvstore version=3.4.16 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1919aa3] goroutine 213 [running]: github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).keyPath(0x0, {0x7f349c122c38, 0xc000a7e240}) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:276 +0x43 github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).syncLocalKey(0x0, {0x2204f70, 0xc000050138}, {0x22051d8, 0xc000a7e240}) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:288 +0x87 github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).UpdateKeySync(...) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:375 main.(*VMManager).OnUpdate(0xc0005f6080, {0x21e9e10, 0xc000a7e0c0}) /home/abocim/go/src/github.com/cilium/cilium/clustermesh-apiserver/vmmanager.go:183 +0x3f4 github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).onUpdate(...) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:233 github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).updateKey(0xc000128780, {0xc00025e85d, 0x15}, {0xc0000f6a00, 0xf3, 0x100}) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:414 +0x102 github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).watcher(0xc000128780, 0xc000270a20) /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:482 +0x73c created by github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).listAndStartWatcher /home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:447 +0x89 Signed-off-by: Adam Bocim <adam.bocim@seznam.cz>
9c1d8fa
to
5864ed6
Compare
/test Job 'Cilium-PR-K8s-GKE' hit: #17628 (95.15% similarity) |
/ci-external-workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a classic case where a watcher calls into use of a returned value before the function starting the watcher returns. I also checked that the new syncKVStoreKey()
properly does what UpdateKeySync()
did before in this case.
External workload CI test also passed.
@aanm GKE flake is unrelated to this change. |
Thank you very much for the PR @abocim! |
Fixes nil pointer dereference while clustermesh-apiserver is starting and new CEW with identity 0
is existing in etcd
In original implementation clustermesh-apiserver and etcd are running
in separate containers within one pod. So a new empty etcd is created while
clustermesh-apiserver is starting.
In our usecase we use an existing etcd which is shared with cilium-agent.
An error below has been occurring when a new CEW resource was already applied
into kubernetes and cilium-agent was already started on related external
workload machine while clustermesh-apiserver was not deployed yet.
$ ./clustermesh-apiserver --cluster-id=12 --cluster-name=uacl-test --k8s-kubeconfig-path ../../uacl/uacl-test.kubeconfig --kvstore-opt etcd.config=../../uacl/etcd.config
level=info msg="Started gops server" address="127.0.0.1:9892" subsys=clustermesh-apiserver
level=info msg="Starting clustermesh-apiserver..." cluster-id=12 cluster-name=uacl-test subsys=clustermesh-apiserver
level=info msg="Establishing connection to apiserver" host="https://uacl-test-api.test:31243" subsys=k8s
level=info msg="Connected to apiserver" subsys=k8s
level=info msg="Waiting until all Cilium CRDs are available" subsys=k8s
level=info msg="All Cilium CRDs have been found and are available" subsys=k8s
level=info msg="Initializing identity allocator" subsys=identity-cache
level=info msg="Creating etcd client" ConfigPath=../../uacl/etcd.config KeepAliveHeartbeat=15s KeepAliveTimeout=25s RateLimit=20 subsys=kvstore
level=info msg="Started health API" subsys=clustermesh-apiserver
level=info msg="Connecting to etcd server..." config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore
level=info msg="Got lease ID 320f7d7b1f23bc32" subsys=kvstore
level=info msg="Got lock lease ID 320f7d7b1f23bc34" subsys=kvstore
level=info msg="Initial etcd session established" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore
level=info msg="Successfully verified version of etcd endpoint" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" etcdEndpoint="https://uacl-test-api.test:30108" subsys=kvstore version=3.4.16
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1919aa3]
goroutine 213 [running]:
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).keyPath(0x0, {0x7f349c122c38, 0xc000a7e240})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:276 +0x43
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).syncLocalKey(0x0, {0x2204f70, 0xc000050138}, {0x22051d8, 0xc000a7e240})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:288 +0x87
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).UpdateKeySync(...)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:375
main.(*VMManager).OnUpdate(0xc0005f6080, {0x21e9e10, 0xc000a7e0c0})
/home/abocim/go/src/github.com/cilium/cilium/clustermesh-apiserver/vmmanager.go:183 +0x3f4
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).onUpdate(...)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:233
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).updateKey(0xc000128780, {0xc00025e85d, 0x15}, {0xc0000f6a00, 0xf3, 0x100})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:414 +0x102
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).watcher(0xc000128780, 0xc000270a20)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:482 +0x73c
created by github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).listAndStartWatcher
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:447 +0x89
Signed-off-by: Adam Bocim adam.bocim@seznam.cz