Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket-lb is broken on minikube in Docker mode #15769

Closed
1 of 2 tasks
brb opened this issue Apr 19, 2021 · 9 comments · Fixed by #16259
Closed
1 of 2 tasks

socket-lb is broken on minikube in Docker mode #15769

brb opened this issue Apr 19, 2021 · 9 comments · Fixed by #16259
Labels
kind/bug This is a bug in the Cilium logic.

Comments

@brb
Copy link
Member

brb commented Apr 19, 2021

When running minikube in the Docker mode (aka k8s docker-in-docker), ClusterIP services cannot be reached with socket-lb enabled (aka host-reachable svc; enabled by default on newer kernels due to the --kube-proxy-replacement defaulting to probe). This is due to the same reason as documented in #14951.

  • In the GSG minikube disable socket-lb or ask users to chose another minikube mode (release blocker).
  • Fix socket-lb for minikube in the Docker mode (requires host to run in cgroup v2).
@brb brb added kind/bug This is a bug in the Cilium logic. sig/loadbalancing labels Apr 19, 2021
@aditighag
Copy link
Member

aditighag commented May 11, 2021

When running minikube in the Docker mode (aka k8s docker-in-docker), ClusterIP services cannot be reached with socket-lb enabled

@brb I wasn't able to reproduce the issue. I followed the steps given in the gsg to create a minikube cluster on the dev VM (net-next), and installed Cilium v1.10.0-rc1 (except set the minikube driver to docker).

How did you discover that socket-lb was broken? Just to confirm, when you say docker-in-docker, you are referring to the setup where k8s node is deployed as a docker container, and then cilium pod runs as a container inside the node, right? I also checked that cgroup v2 is NOT enabled.

docker seems to be the default driver (even when I didn't pass --driver=docker, it set it as the driver).

$ minikube version
minikube version: v1.20.0
commit: c61663e942ec43b20e8e70839dcca52e44cd85ae

$ minikube profile list
|----------|-----------|---------|--------------|------|---------|---------|-------|
| Profile  | VM Driver | Runtime |      IP      | Port | Version | Status  | Nodes |
|----------|-----------|---------|--------------|------|---------|---------|-------|
| minikube | docker    | docker  | 192.168.49.2 | 8443 | v1.20.2 | Running |     1 |

$ k get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP   51m
nginx        ClusterIP   10.96.69.189   <none>        80/TCP    42m

$ k get ep
NAME         ENDPOINTS           AGE
kubernetes   192.168.49.2:8443   51m
nginx        10.0.0.90:80        42m

$ uname -r
5.12.0-rc4+

$ cilium status --verbose
KubeProxyReplacement Details:
  Status:                Probe
  Socket LB Protocols:   TCP, UDP

Per the monitor traces, service translation has already happened at the socket layer so we don't see the service clusterIP. 

$ cilium monitor --related-to 301
Listening for events on 6 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
<- endpoint 301 flow 0x0 identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:54726 -> 10.0.0.171:53 udp
-> endpoint 301 flow 0x0 identity 10740->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.171: 10.0.0.171:53 -> 10.0.0.189:54726 udp
<- endpoint 301 flow 0x0 identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:54726 -> 10.0.0.171:53 udp
-> endpoint 301 flow 0x0 identity 10740->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.171: 10.0.0.171:53 -> 10.0.0.189:54726 udp
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp SYN
-> endpoint 301 flow 0x785f40c0 identity 15303->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.90: 10.0.0.90:80 -> 10.0.0.189:36738 tcp SYN, ACK
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK
-> endpoint 301 flow 0x785f40c0 identity 15303->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.90: 10.0.0.90:80 -> 10.0.0.189:36738 tcp ACK
-> endpoint 301 flow 0x785f40c0 identity 15303->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.90: 10.0.0.90:80 -> 10.0.0.189:36738 tcp ACK
-> endpoint 301 flow 0x785f40c0 identity 15303->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.90: 10.0.0.90:80 -> 10.0.0.189:36738 tcp ACK
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK, FIN
-> endpoint 301 flow 0x785f40c0 identity 15303->54206 state reply ifindex lxcdad3c136542c orig-ip 10.0.0.90: 10.0.0.90:80 -> 10.0.0.189:36738 tcp ACK, FIN
<- endpoint 301 flow 0x9192118d identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.189:36738 -> 10.0.0.90:80 tcp ACK
<- endpoint 301 flow 0x0 identity 54206->unknown state new ifindex 0 orig-ip 0.0.0.0: 9a:6f:6a:36:09:94 -> e2:03:4f:3c:d4:ad ARP

@rolinh
Copy link
Member

rolinh commented May 11, 2021

@aditighag I believe the problem arises with kernel 5.10+. Which version do you run?

@aditighag
Copy link
Member

aditighag commented May 11, 2021

@aditighag I believe the problem arises with kernel 5.10+. Which version do you run?

Yes, it is > 5.10 (mentioned in the previous comment).

$ uname -r
5.12.0-rc4+

@tgraf
Copy link
Member

tgraf commented May 11, 2021

If we do any work on this before the release, it should be the doc change as mentioned in the top comment:

In the GSG minikube disable socket-lb or ask users to chose another minikube mode (release blocker).

@aditighag
Copy link
Member

Looks like the minikube GSG was removed in v1.10.0-rc1.

@brb
Copy link
Member Author

brb commented May 12, 2021

Did some digging. When running Cilium with host-reachable svc on on minikube with Docker driver, bpf_sock is attached to the following hierarchy:

/sys/fs/cgroup/system.slice/docker-a67774d38f6403cb2103138f7b336c0ed69c53aed9449d1a41f1ddb160fa780b.scope/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbf5a70ce_4e5f_4f39_b699_9135a484a4ea.slice/docker-d60878a0db148eec593b24ebdd9518985bd0f5e9d1fc6ef19918c75a0e13dc63.scope
    395      device          multi
    848      connect4
    844      connect6
    850      sendmsg4
    846      sendmsg6
    851      recvmsg4
    847      recvmsg6
    849      getpeername4
    845      getpeername6

Any other pod is running in the following cgroupv2 hierarchy:

0::/system.slice/docker-a67774d38f6403cb2103138f7b336c0ed69c53aed9449d1a41f1ddb160fa780b.scope/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0c89a37c_ae75_4efa_a6ab_92b3ddb7ff15.slice/docker-01f584a07421aa0e4a9bd4a245818932876a0e6e4b86c9735da7e6bdb5784b1d.scope

So the bpf_sock based svc xlation is not active.

To fix the issue, we need to find the common root in the hierarchies.

@brb
Copy link
Member Author

brb commented May 16, 2021

More digging. After changing the Docker cgroup driver to cgroupfs (what we have on vagrant dev VMs), it still works on my laptop. Example of cilium-agent cgroups:

> cat /proc/305989/cgroup
0::/docker/7e50a71c256fc473a62382c1e08efec5e046f1fc063d000cc263c4a185d56488/kubelet/kubepods/burstable/pod7cfcf9c8-7afd-447e-81ec-06189684942f/a519ec8e4ed46d409787f1683c136ab0b6d2ea5afa3fd1cf15978690c01371fd
$~/tmp/kind [05:10:04]
> cat /proc/306506/cgroup
0::/docker/9326f9d808c871c1d464b3593b6ff59a02685116b308e54b6a563c08d56ca192/kubelet/kubepods/burstable/podcf46c98d-1933-41bb-b3a1-90b8eee4a1ac/6c08e7e4ddc7454540c7a011fbb92346aaff9ae45e522a86ee49d0b4433803d9

@brb
Copy link
Member Author

brb commented Jul 2, 2021

@ti-mo Maybe you have chance to validate the fix?

@ti-mo
Copy link
Contributor

ti-mo commented Jul 7, 2021

Thanks, will do and report back. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants