Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: add BackendSlot to Service6Key.String and Service4Key.String #29581

Merged
merged 1 commit into from Apr 3, 2024

Conversation

xyz-li
Copy link
Contributor

@xyz-li xyz-li commented Dec 4, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #29580

Fix overlapping keys in agent-side service BPF map cache used for retries. In rare cases this bug may have caused retrying of a failed BPF map update for a services entry to be skipped leading to a missing entry. This may have, for example, adversely affected recovering from a full BPF service map after excess services were removed.

@xyz-li xyz-li requested a review from a team as a code owner December 4, 2023 03:40
@xyz-li xyz-li requested a review from ldelossa December 4, 2023 03:40
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 4, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Dec 4, 2023
@maintainer-s-little-helper
Copy link

Commit 0103bf3 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 4, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 4, 2023
@aanm aanm added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Dec 4, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 4, 2023
@aanm
Copy link
Member

aanm commented Dec 4, 2023

/test

@ti-mo
Copy link
Contributor

ti-mo commented Dec 5, 2023

@xyz-li What is the effect of removing the service entry? Given there are no backends to respond, is this a problem?

@xyz-li
Copy link
Contributor Author

xyz-li commented Dec 5, 2023

@xyz-li What is the effect of removing the service entry? Given there are no backends to respond, is this a problem?

Deleting the service entry will not affect the data-path. But it is very confusing when using cilium cli.
The output of cilium map get cilium_lb4_services_v2 does not contain my service IP. I spent one day to find the cause.

@xyz-li xyz-li marked this pull request as draft December 7, 2023 11:05
@ti-mo ti-mo changed the title daemon: service is deleted in cache when deleting endpints daemon: service is deleted in cache when deleting endpoints Dec 7, 2023
@xyz-li xyz-li marked this pull request as ready for review December 7, 2023 12:15
@aanm
Copy link
Member

aanm commented Dec 8, 2023

/test

@aanm aanm enabled auto-merge December 8, 2023 08:07
@joestringer joestringer requested review from a team and joamaki and removed request for a team December 12, 2023 20:32
@joestringer
Copy link
Member

I think it might actually be useful to keep the service frontend in the datapath, so that when the LB decision is made, the datapath can emit drop notifications with DROP_NO_SERVICE ("Service backend not found").

@joestringer joestringer added the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Dec 12, 2023
@joestringer joestringer requested a review from a team December 14, 2023 23:18
Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right at all. It's effectively undoing the delete for the obsolete backend slots, so we'll have entries in the service map that aren't used by the BPF datapath since slot > len(backends) of the master service. Unless I'm missing something, to me it seems like this is just effectively leaving garbage entries in the service map and bloating it.

Looking at the code, the agent-side cache should still have the master service in there as it's not deleted even when there's 0 backends.

Could you validate the cache against the actual BPF map (cilium-dbg bpf lb list) and see if there's any difference? I'm wondering if there's a bug causing the cache to not be in sync.

At least on my kind environment everything looks OK:

root@kind-control-plane:/home/cilium# cilium-dbg map get cilium_lb4_services_v2
Key                 Value                State   Error
10.96.142.154:80    0 1 (9) [0x0 0x0]    sync    
...

root@kind-control-plane:/home/cilium# cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)                           
10.96.142.154:80    0.0.0.0:0 (9) (0) [ClusterIP, non-routable]                  
                    10.244.1.21:9376 (9) (1) 

After deleting the backend pod:

root@kind-control-plane:/home/cilium# cilium-dbg map get cilium_lb4_services_v2
Key                 Value                State   Error
10.96.142.154:80    0 0 (9) [0x0 0x0]    sync    
root@kind-control-plane:/home/cilium# cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
...                               
10.96.142.154:80    0.0.0.0:0 (9) (0) [ClusterIP, non-routable]                  

@xyz-li
Copy link
Contributor Author

xyz-li commented Dec 17, 2023

  1. kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   36d
nginx        ClusterIP   10.96.158.224   <none>        80/TCP    12d
  1. cilium map get cilium_lb4_services_v2
10.96.114.15:443   0 1 (8) [0x0 0x10]   sync
10.96.0.10:53      0 2 (2) [0x0 0x0]    sync
10.96.0.1:443      0 1 (3) [0x0 0x0]    sync
10.96.158.224:80   0 2 (4) [0x0 0x0]    sync // 10.96.158.224 is here now.
10.96.0.10:9153    0 2 (1) [0x0 0x0]    sync
  1. kubectl scale deploy nginx --replicas=1 # now service has one endpoint
  2. kubectl describe svc nginx
Name:              nginx
Namespace:         default
Labels:            app=nginx
Annotations:       <none>
Selector:          app=nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.158.224
IPs:               10.96.158.224
Port:              80-80  80/TCP
TargetPort:        80/TCP
Endpoints:         10.0.0.249:80
Session Affinity:  None
Events:            <none>
  1. cilium map get cilium_lb4_services_v2 # the item in cache not exists.
Key                Value                State   Error
10.96.114.15:443   0 1 (8) [0x0 0x10]   sync
10.96.0.10:53      0 2 (2) [0x0 0x0]    sync
10.96.0.1:443      0 1 (3) [0x0 0x0]    sync
10.96.0.10:9153    0 2 (1) [0x0 0x0]    sync
  1. cilium bpf lb list
SERVICE ADDRESS    BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.0.10:9153    0.0.0.0:0 (1) (0) [ClusterIP, non-routable]
                   10.0.0.39:9153 (1) (1)
                   10.0.1.179:9153 (1) (2)
10.96.0.10:53      10.0.0.39:53 (2) (1)
                   0.0.0.0:0 (2) (0) [ClusterIP, non-routable]
                   10.0.1.179:53 (2) (2)
10.96.114.15:443   0.0.0.0:0 (8) (0) [ClusterIP, InternalLocal, non-routable]
                   10.11.0.11:4244 (8) (1)
10.96.0.1:443      0.0.0.0:0 (3) (0) [ClusterIP, non-routable]
                   10.11.0.11:6443 (3) (1)
10.96.158.224:80   10.0.0.249:80 (4) (1)     # 10.96.158.224 exists 
                   0.0.0.0:0 (4) (0) [ClusterIP, non-routable]

After scale down my nginx workload, one endpoint will be deleted. At last the func deleteCacheEntry is called.

cilium/pkg/bpf/map_linux.go

Lines 883 to 890 in fa00376

func (m *Map) deleteCacheEntry(key MapKey, err error) {
if m.cache == nil {
return
}
k := key.String()
if err == nil {
delete(m.cache, k)

And then the String function.
func (k *Service4Key) String() string {
kHost := k.ToHost().(*Service4Key)
addr := net.JoinHostPort(kHost.Address.String(), fmt.Sprintf("%d", kHost.Port))
if kHost.Scope == loadbalancer.ScopeInternal {
addr += "/i"
}
return addr
}

@xyz-li xyz-li requested a review from joamaki December 17, 2023 02:49
@joamaki
Copy link
Contributor

joamaki commented Dec 18, 2023

  1. kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   36d
nginx        ClusterIP   10.96.158.224   <none>        80/TCP    12d
2. cilium map get cilium_lb4_services_v2
10.96.114.15:443   0 1 (8) [0x0 0x10]   sync
10.96.0.10:53      0 2 (2) [0x0 0x0]    sync
10.96.0.1:443      0 1 (3) [0x0 0x0]    sync
10.96.158.224:80   0 2 (4) [0x0 0x0]    sync // 10.96.158.224 is here now.
10.96.0.10:9153    0 2 (1) [0x0 0x0]    sync
3. kubectl  scale deploy nginx --replicas=1 # now service has one endpoint

4. kubectl describe svc nginx
Name:              nginx
Namespace:         default
Labels:            app=nginx
Annotations:       <none>
Selector:          app=nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.158.224
IPs:               10.96.158.224
Port:              80-80  80/TCP
TargetPort:        80/TCP
Endpoints:         10.0.0.249:80
Session Affinity:  None
Events:            <none>
5. cilium map get cilium_lb4_services_v2  # the item in cache not exists.
Key                Value                State   Error
10.96.114.15:443   0 1 (8) [0x0 0x10]   sync
10.96.0.10:53      0 2 (2) [0x0 0x0]    sync
10.96.0.1:443      0 1 (3) [0x0 0x0]    sync
10.96.0.10:9153    0 2 (1) [0x0 0x0]    sync
6. cilium bpf lb list
SERVICE ADDRESS    BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.0.10:9153    0.0.0.0:0 (1) (0) [ClusterIP, non-routable]
                   10.0.0.39:9153 (1) (1)
                   10.0.1.179:9153 (1) (2)
10.96.0.10:53      10.0.0.39:53 (2) (1)
                   0.0.0.0:0 (2) (0) [ClusterIP, non-routable]
                   10.0.1.179:53 (2) (2)
10.96.114.15:443   0.0.0.0:0 (8) (0) [ClusterIP, InternalLocal, non-routable]
                   10.11.0.11:4244 (8) (1)
10.96.0.1:443      0.0.0.0:0 (3) (0) [ClusterIP, non-routable]
                   10.11.0.11:6443 (3) (1)
10.96.158.224:80   10.0.0.249:80 (4) (1)     # 10.96.158.224 exists 
                   0.0.0.0:0 (4) (0) [ClusterIP, non-routable]

After scale down my nginx workload, one endpoint will be deleted. At last the func deleteCacheEntry is called.

cilium/pkg/bpf/map_linux.go

Lines 883 to 890 in fa00376

func (m *Map) deleteCacheEntry(key MapKey, err error) {
if m.cache == nil {
return
}
k := key.String()
if err == nil {
delete(m.cache, k)

And then the String function.

func (k *Service4Key) String() string {
kHost := k.ToHost().(*Service4Key)
addr := net.JoinHostPort(kHost.Address.String(), fmt.Sprintf("%d", kHost.Port))
if kHost.Scope == loadbalancer.ScopeInternal {
addr += "/i"
}
return addr
}

Ah the String function is incorrect! It's not taking into account the BackendSlot which causes multiple service entries to have the same key, hence we delete the "master" service when we're deleting the "backend slot" entries. What's worse here is that the "cache" is used to retry failed operations in resolveErrors and thus we may fail to retry some BPF updates since we've overwritten the entry in the cache.

A short term fix to this issue is to make sure Service4Key.String() fully reflects the Service4Key value, e.g. by having say (N) at the end where N is the BackendSlot. Could you try this and verify if that results in the correct behavior?

@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.19 Mar 20, 2024
    This commit adds BackendSlot value to the Service6Key.String
    and Service4Key.String methods. This is to prevent the
    service key from being deleted when the backend endpoint is deleted.

    Fixes: cilium#29580

Signed-off-by: xyz-li <hui0787411@163.com>
@joestringer
Copy link
Member

/test

@joestringer joestringer added affects/v1.12 This issue affects v1.12 branch and removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. needs-backport/1.12 labels Apr 2, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.12.19 Apr 2, 2024
@joestringer joestringer added this pull request to the merge queue Apr 3, 2024
Merged via the queue into cilium:main with commit ecf6ff1 Apr 3, 2024
62 checks passed
@nbusseneau nbusseneau mentioned this pull request Apr 10, 2024
8 tasks
@nbusseneau nbusseneau added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 10, 2024
@nbusseneau nbusseneau mentioned this pull request Apr 10, 2024
9 tasks
@nbusseneau nbusseneau added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Apr 10, 2024
@nbusseneau nbusseneau mentioned this pull request Apr 10, 2024
13 tasks
@nbusseneau nbusseneau added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 10, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Apr 11, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.12 This issue affects v1.12 branch area/daemon Impacts operation of the Cilium daemon. area/loadbalancing Impacts load-balancing and Kubernetes service implementations backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

daemon: service entry in cache is deleted when workload scales down
9 participants