Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce v2 backend map with u32 backend ID #17235

Merged
merged 1 commit into from Sep 28, 2021
Merged

Conversation

Weil0ng
Copy link
Contributor

@Weil0ng Weil0ng commented Aug 24, 2021

See commit msg.

Fixes: #16121

Downgrade is tracked in #17262

@Weil0ng Weil0ng requested review from brb and a team August 24, 2021 21:24
@Weil0ng Weil0ng requested a review from a team as a code owner August 24, 2021 21:24
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 24, 2021
@Weil0ng Weil0ng requested a review from ti-mo August 24, 2021 21:24
@Weil0ng Weil0ng marked this pull request as draft August 24, 2021 21:50
@Weil0ng Weil0ng force-pushed the bemap branch 4 times, most recently from 109e9c4 to 7590bb8 Compare August 25, 2021 22:56
@Weil0ng Weil0ng added release-note/misc This PR makes changes that have no direct user impact. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 25, 2021
@Weil0ng Weil0ng marked this pull request as ready for review August 25, 2021 23:35
@Weil0ng Weil0ng requested a review from a team as a code owner August 25, 2021 23:35
@Weil0ng Weil0ng requested a review from nebril August 25, 2021 23:35
@Weil0ng Weil0ng force-pushed the bemap branch 2 times, most recently from 54c848c to 5394afe Compare August 25, 2021 23:54
@pchaigno pchaigno self-requested a review August 26, 2021 05:48
@pchaigno pchaigno added upgrade-impact This PR has potential upgrade or downgrade impact. sig/loader Impacts the loading of BPF programs into the kernel. labels Aug 26, 2021
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we still have references to the old map in this PR. Why are they needed? We can handle the downgrade by adding code to v1.10 for that (copy from v2 to v1).

The 2/2 part will switch operation to v2 map completely

What is not already switched to v2 in this PR?

introduce a --keep-legacy-service-backends flag to control whether we get rid of v1 map or keep it (for downgrade).

We usually just delete the maps after 3 releases, once nobody is supposed to be running a version with the old map.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Aug 26, 2021

What is not already switched to v2 in this PR?

Datapath is still using v1 map in this PR.

We usually just delete the maps after 3 releases, once nobody is supposed to be running a version with the old map.

I see, that works too, --keep-legacy-service-backends was suggested by @brb so I'd love to hear his thoughts on this. AFAIU, whichever we do, we will have a few versions that have both maps, problem is how do we know the user intent if they want to use v1 or v2 map? I was thinking we have 2 versions, one with this PR (2 maps, uses v1) and another with 2/2 (2 maps, uses v2), so user could switch between the two if they want by upgrading/downgrading?

@pchaigno
Copy link
Member

Do we really need to give control over that to the user? We could switch to v2 in this PR, with add code for the v1->v2 migration. Then add code for the v2->v1 migration as a patch on top of v1.10. We leave the v1 map around if it exists (but it won't be created for new clusters) and we explicitly remove it in 3 minor releases. We took a similar approach for the policy tail call (after a rename from cilium_policy to cilium_call_policy).

The current approach implemented in this PR means we'll have wasted memory on the v1 map for new clusters. The flag also seems a bit confusing to me; why would the user need to care about such low-level implementation details?

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Aug 27, 2021

Do we really need to give control over that to the user? We could switch to v2 in this PR, with add code for the v1->v2 migration. Then add code for the v2->v1 migration as a patch on top of v1.10. We leave the v1 map around if it exists (but it won't be created for new clusters) and we explicitly remove it in 3 minor releases. We took a similar approach for the policy tail call (after a rename from cilium_policy to cilium_call_policy).

The current approach implemented in this PR means we'll have wasted memory on the v1 map for new clusters. The flag also seems a bit confusing to me; why would the user need to care about such low-level implementation details?

I'm sold :) A bit confused when you say we add v2->v1 on top of 1.10, so this PR (v1->v2) would go in 1.11, correct?

@pchaigno
Copy link
Member

A bit confused when you say we add v2->v1 on top of 1.10, so this PR (v1->v2) would go in 1.11, correct?

Yes, the present PR (map v1->map v2) would go in v1.11 and the reverse operation (map v2->map v1) would be implemented as a direct change on the v1.10 branch, as done for v1.7 in #13052 for example.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 18, 2021

test-1.16-netnext

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 19, 2021

Managed to reproduce the Maglev Tests_NodePort test failure locally, here's a few observations:

  • Same curl cmd from k8s3 (outside of cluster) succeeds sometimes and timeout sometimes

Target service:

   root@k8s1:/home/cilium# cilium service list | grep 31751
   13   192.168.36.11:31751     NodePort       1 => 10.0.1.128:69           <------------- target service
   14   10.0.2.15:31751         NodePort       1 => 10.0.1.128:69        
   15   0.0.0.0:31751           NodePort       1 => 10.0.1.128:69

Curl tries:

vagrant@k8s3:~$ curl tftp://192.168.36.11:31751/hello -m 5

Hostname: testds-t5sqh

Request Information:
        client_address=192.168.36.11
        client_port=48980
        real path=/hello
        request_scheme=tftp

vagrant@k8s3:~$ curl tftp://192.168.36.11:31751/hello -m 5

Hostname: testds-mwvwz

Request Information:
        client_address=192.168.36.13
        client_port=57300
        real path=/hello
        request_scheme=tftp

vagrant@k8s3:~$ curl tftp://192.168.36.11:31751/hello -m 5
curl: (28) Operation timed out after 5000 milliseconds with 0 out of 0 bytes received
vagrant@k8s3:~$ curl tftp://192.168.36.11:31751/hello -m 5

Hostname: testds-t5sqh

Request Information:
        client_address=192.168.36.11
        client_port=54715
        real path=/hello
        request_scheme=tftp

vagrant@k8s3:~$ curl tftp://192.168.36.11:31751/hello -m 5

Hostname: testds-mwvwz

Request Information:
        client_address=192.168.36.13
        client_port=40563
        real path=/hello
        request_scheme=tftp
  • (From above output) Both backends can be reached at different tries
  • Checked maglev map for the target service, both outer map and inner map look good:
root@k8s1:/home/cilium# bpftool map dump id 1195          <---------- outer map
key: 00 0d  value: b1 04 00 00                             <---------- svc ID 13, inner map fd is 1201

Inner Maglev map dump:

root@k8s1:/home/cilium# bpftool map dump id 1201                                                                                                                                                                                                              
key:                                                                                                                                                                                                                                                          
00 00 00 00
value:
09 00 00 00 0b 00 00 00  09 00 00 00 0b 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 0b 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 0b 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 0b 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 0b 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 0b 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 0b 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  09 00 00 00 09 00 00 00
0b 00 00 00 09 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00 09 00 00 00
09 00 00 00 0b 00 00 00  0b 00 00 00
Found 1 element

New V2 backend map dump:

root@k8s1:/home/cilium# bpftool map dump pinned /sys/fs/bpf/tc/globals/cilium_lb4_backends_v2 
key: 07 00 00 00  value: 0a 00 01 c9 00 45 00 00
key: 02 00 00 00  value: 0a 00 01 55 00 35 00 00
key: 0f 00 00 00  value: 0a 00 00 15 00 45 00 00
key: 0a 00 00 00  value: 0a 00 00 d0 00 50 00 00
key: 0d 00 00 00  value: 0a 00 01 eb 00 45 00 00
key: 0e 00 00 00  value: 0a 00 00 15 00 50 00 00
key: 0c 00 00 00  value: 0a 00 01 eb 00 50 00 00
key: 06 00 00 00  value: 0a 00 01 c9 00 50 00 00
key: 04 00 00 00  value: 0a 00 00 1c 23 82 00 00
key: 05 00 00 00  value: 0a 00 00 56 0b b8 00 00
key: 03 00 00 00  value: 0a 00 01 55 23 c1 00 00
key: 01 00 00 00  value: c0 a8 24 0b 19 2b 00 00
key: 0b 00 00 00  value: 0a 00 00 d0 00 45 00 00            <------ be2, ip: 10.0.0.208
key: 09 00 00 00  value: 0a 00 01 80 00 45 00 00             <------ be1, ip: 10.0.1.128
key: 08 00 00 00  value: 0a 00 01 80 00 50 00 00
Found 15 elements
vagrant@k8s1:~$ kubectl get pods -A -o wide | grep testds
default             testds-mwvwz                      2/2     Running   0          57m   10.0.0.208      k8s1   <none>           <none>
default             testds-t5sqh                      2/2     Running   0          57m   10.0.1.128      k8s2   <none>           <none>

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 20, 2021

Seems like backend selection is broken:
Running cilium monitor -t drop on the target node (192.168.36.11) gives the following when curl fails:

level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Service backend not found) flow 0x0 to endpoint 0, identity world->unknown: 192.168.36.13:60421 -> 192.168.36.11:31751 udp

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 20, 2021

It seems like the backend id is not interpreted correctly:

From another run, here's the inner map:

root@k8s1:/home/cilium# bpftool map dump id 769                                                                                                                                                                                                               
key:                                                                                                                                                                                                                                                          
00 00 00 00                                                    
value:                                                         
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 0a 00 00 00  08 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
08 00 00 00 08 00 00 00  08 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
08 00 00 00 0a 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 0a 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
0a 00 00 00 08 00 00 00  0a 00 00 00 08 00 00 00
0a 00 00 00 0a 00 00 00  08 00 00 00 0a 00 00 00
08 00 00 00 08 00 00 00  0a 00 00 00
Found 1 element

From a debug print from within lb4_select_backends() with maglev:

root@k8s1:/home/vagrant# cat tracelog 
          <idle>-0       [001] d.s. 55273.630524: bpf_trace_printk: index: 233

          <idle>-0       [001] dNs. 55273.630546: bpf_trace_printk: backendID: 655360.    <---------- "000A0000"

          <idle>-0       [001] d.s. 55275.744112: bpf_trace_printk: index: 183

          <idle>-0       [001] dNs. 55275.744135: bpf_trace_printk: backendID: 655360

          <idle>-0       [001] d.s. 55276.766519: bpf_trace_printk: index: 183

          <idle>-0       [001] dNs. 55276.766539: bpf_trace_printk: backendID: 655360

   <idle>-0       [001] d.s. 55270.897225: bpf_trace_printk: index: 20

          <idle>-0       [001] dNs. 55270.897246: bpf_trace_printk: backendID: 8

          <idle>-0       [001] d.s. 55272.605415: bpf_trace_printk: index: 233

          <idle>-0       [001] dNs. 55272.605437: bpf_trace_printk: backendID: 655360

          <idle>-0       [001] dNs. 55277.722812: bpf_trace_printk: index: 75

          <idle>-0       [001] dNs. 55277.722840: bpf_trace_printk: backendID: 655360

          <idle>-0       [001] d.s. 55279.170761: bpf_trace_printk: index: 0

          <idle>-0       [001] dNs. 55279.170782: bpf_trace_printk: backendID: 8

@pchaigno pchaigno removed the dont-merge/blocked Another PR must be merged before this one. label Sep 20, 2021
The V2 map key is u32 typed instead of u16.

Upon agent (re)start:
- we restore backends from v1 map and copy all
entries from v1 map to v2 map if v1 map exists
(this is the upgrade scenario).
- we then remove v1 map and operate on v2 map.

Signed-off-by: Weilong Cui <cuiwl@google.com>
@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 20, 2021

Figured out what's going on, the offset multiplier needs to be doubled with u16 -> u32 change.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 20, 2021

test-me-please

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks ClusterIP Connectivity Checks service on same node

Failure Output

FAIL: Expected

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' hit: #17270 (94.72% similarity)

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 21, 2021

GKE tests complain pod not ready in time, seems like a flakiness similar to #17307

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 21, 2021

test-gke

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 21, 2021

test-gke

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 21, 2021

Past two GKE tests report disjoint set of failures, all seem like flakes unrelated to this PR.

@Weil0ng Weil0ng added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 23, 2021
@jibi jibi removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 24, 2021
@jibi
Copy link
Member

jibi commented Sep 24, 2021

(removed the ready-to-merge label as this still needs a review from agent and CLI teams)

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 27, 2021

Still need someone from cli to take a look, wonder if @twpayne can help out here :)

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Sep 27, 2021

Got approval from all code owners, CI is passing too (gke-stable reported disjoint flakes in past two runs), marking reade-to-merge.

@Weil0ng Weil0ng added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 27, 2021
@jibi jibi merged commit dee982f into cilium:master Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/loader Impacts the loading of BPF programs into the kernel. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Service/backend ID pool is not scaling with bpf lb map size
8 participants