New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: add BackendSlot to Service6Key.String and Service4Key.String #29581
Conversation
Commit 0103bf3 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
/test |
@xyz-li What is the effect of removing the service entry? Given there are no backends to respond, is this a problem? |
Deleting the service entry will not affect the data-path. But it is very confusing when using cilium cli. |
/test |
I think it might actually be useful to keep the service frontend in the datapath, so that when the LB decision is made, the datapath can emit drop notifications with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right at all. It's effectively undoing the delete for the obsolete backend slots, so we'll have entries in the service map that aren't used by the BPF datapath since slot > len(backends)
of the master service. Unless I'm missing something, to me it seems like this is just effectively leaving garbage entries in the service map and bloating it.
Looking at the code, the agent-side cache should still have the master service in there as it's not deleted even when there's 0 backends.
Could you validate the cache against the actual BPF map (cilium-dbg bpf lb list
) and see if there's any difference? I'm wondering if there's a bug causing the cache to not be in sync.
At least on my kind environment everything looks OK:
root@kind-control-plane:/home/cilium# cilium-dbg map get cilium_lb4_services_v2
Key Value State Error
10.96.142.154:80 0 1 (9) [0x0 0x0] sync
...
root@kind-control-plane:/home/cilium# cilium-dbg bpf lb list
SERVICE ADDRESS BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.142.154:80 0.0.0.0:0 (9) (0) [ClusterIP, non-routable]
10.244.1.21:9376 (9) (1)
After deleting the backend pod:
root@kind-control-plane:/home/cilium# cilium-dbg map get cilium_lb4_services_v2
Key Value State Error
10.96.142.154:80 0 0 (9) [0x0 0x0] sync
root@kind-control-plane:/home/cilium# cilium-dbg bpf lb list
SERVICE ADDRESS BACKEND ADDRESS (REVNAT_ID) (SLOT)
...
10.96.142.154:80 0.0.0.0:0 (9) (0) [ClusterIP, non-routable]
After scale down my nginx workload, one endpoint will be deleted. At last the func Lines 883 to 890 in fa00376
And then the String function.Lines 250 to 257 in fa00376
|
Ah the A short term fix to this issue is to make sure |
This commit adds BackendSlot value to the Service6Key.String and Service4Key.String methods. This is to prevent the service key from being deleted when the backend endpoint is deleted. Fixes: cilium#29580 Signed-off-by: xyz-li <hui0787411@163.com>
/test |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Fixes: #29580