-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: stop endpoints in parallel on exit #15447
agent: stop endpoints in parallel on exit #15447
Conversation
f8a9bf9
to
74db935
Compare
74db935
to
adb01e8
Compare
I guess this should be a minor change now. |
This is a nice improvement, but I don't see how this fixes the issue you've tagged.
If an agent crashes or killed, then wait groups are not going to add any protection anyway.
The way we take backup across 2 steps ( |
Yes, I've commented in the issue: #15446 (comment), also changed the Fixes tag to Related in commit msg.
I think we just ignored backups while restoring endpoints. I'm not sure if this is intended or just an overlook. cilium/pkg/endpoint/restore.go Lines 170 to 178 in 52ec2f6
I think maps are GCed because the endpoint is not restored properly cilium/pkg/datapath/maps/map.go Lines 113 to 119 in 52ec2f6
|
We should probably fix that. |
7301291
to
d9818e2
Compare
I'll try to fix that in a separate PR. |
Endpoints are stopped on exit signals in an agent cleanup function. This patch does this in goroutines to speed it up, reduces the probability of agent exiting timeout, that is, reduces the possibility of pod network disconnection caused by interrupted regeneration. Related: cilium#15446 Signed-off-by: Jaff Cheng jaff.cheng.sh@gmail.com
d9818e2
to
2003901
Compare
Rebased |
test-me-please |
Let's see if https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/72/testReport/junit/Suite-k8s-1/16/K8sServicesTest_Checks_service_across_nodes_Tests_NodePort_BPF_Tests_with_direct_routing_and_DSR/ is a flake (seems very likely). test-1.16-netnext |
agent: stop endpoints in parallel on exit
Endpoints are stopped on exit signals in an agent cleanup function.
This patch does this in goroutines to speed it up, reduces the probability
of agent exiting timeout, that is, reduces the possibility of pod network
disconnection caused by interrupted regeneration.
Related: #15446
Signed-off-by: Jaff Cheng jaff.cheng.sh@gmail.com