CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #13774

nebril · 2020-10-27T08:46:26Z

every build on https://jenkins.cilium.io/job/cilium-master-K8s-all/ fails

/home/jenkins/workspace/cilium-master-K8s-all/1.14-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
Kubernetes DNS did not become ready in time
/home/jenkins/workspace/cilium-master-K8s-all/1.14-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:634

The text was updated successfully, but these errors were encountered:

nebril · 2020-10-27T08:47:14Z

focused test run also fails in the same way on 1.14, so it's unlikely this is infra/ci related: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel-Focus/100/

pchaigno · 2021-01-06T10:00:02Z

It looks like this test sometimes also fails in pipelines other than k8s-all. It failed before in #14097 with the same error message.

It also just failed in #14525. That last PR should only affect the IPSec code path so I'm fairly confident it's a flake.
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/316/testReport/junit/Suite-k8s-1/20/K8sDatapathConfig_Encapsulation_Check_vxlan_connectivity_with_per_endpoint_routes/
ae88713e_K8sDatapathConfig_Encapsulation_Check_vxlan_connectivity_with_per-endpoint_routes.zip

jrajahalme · 2021-01-08T19:15:17Z

Hit this on k8s 1.20: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/344/testReport/junit/Suite-k8s-1/20/K8sDatapathConfig_Encapsulation_Check_vxlan_connectivity_with_per_endpoint_routes/

ungureanuvladvictor · 2021-01-11T14:42:24Z

I've hit this in #14571 via https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/359/.

ungureanuvladvictor · 2021-01-11T17:34:31Z

Failed again on the same PR ^^ https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/366/testReport/junit/Suite-k8s-1/20/K8sDatapathConfig_Encapsulation_Check_vxlan_connectivity_with_per_endpoint_routes/

jibi · 2021-02-01T16:52:27Z

I managed to reproduce this locally with:

NFS=1 NETNEXT=1 KUBEPROXY=0 ginkgo -v --focus "K8sDatapathConfig.*Check vxlan connectivity with per-endpoint routes" -- -cilium.provision=false -cilium.holdEnvironment=true -cilium.skipLogs -cilium.runQuarantined

Although I'm seeing a different failure:

17:47:17 STEP: Applying policy /home/vagrant/go/src/github.com/cilium/cilium/test/k8sT/manifests/l3-policy-demo.yaml
17:47:25 STEP: Waiting for 4m0s for 5 pods of deployment demo_ds.yaml to become ready
17:47:25 STEP: WaitforNPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="")
17:47:30 STEP: WaitforNPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="") => <nil>
17:47:30 STEP: Checking pod connectivity between nodes
17:47:30 STEP: WaitforPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="-l zgroup=testDSClient")
17:47:30 STEP: WaitforPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="-l zgroup=testDSClient") => <nil>
17:47:30 STEP: WaitforPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="-l zgroup=testDS")
17:47:30 STEP: WaitforPods(namespace="202102011747k8sdatapathconfigencapsulationcheckvxlanconnectivit", filter="-l zgroup=testDS") => <nil>

---
K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes
at /home/jibi/go/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514

[Connectivity test between nodes failed
Expected
    <bool>: false
to be true]

jibi · 2021-02-04T14:47:23Z

Reproduced the actual failure:

K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes
at /home/jibi/go/src/github.com/cilium/cilium-test/test/ginkgo-ext/scopes.go:514

[Kubernetes DNS did not become ready in time]

The deployment of coredns seems to be failing due to:

15:16:24 STEP: Kubernetes DNS is not ready yet: unable to resolve service name kubernetes.default.svc.cluster.local with DNS server 10.96.0.10 by running 'dig +short kubernetes.default.svc.cluster.local @10.96.0.10' Cilium pod: Exitcode: 9
Err: Process exited with status 9
Stdout:
 	 ;; connection timed out; no servers could be reached
	
Stderr:
 	 command terminated with exit code 9

although coredns looks healthy:

vagrant@k8s1:~$ ks get pods -l k8s-app=kube-dns
NAME                      READY   STATUS    RESTARTS   AGE
coredns-cc45bff6b-7zqsw   1/1     Running   0          18m

vagrant@k8s1:~$ ks describe pod -l k8s-app=kube-dns
Name:               coredns-cc45bff6b-7zqsw
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               k8s2/192.168.36.12
Start Time:         Thu, 04 Feb 2021 14:12:32 +0000
Labels:             k8s-app=kube-dns
                    pod-template-hash=cc45bff6b
Annotations:        seccomp.security.alpha.kubernetes.io/pod: docker/default
Status:             Running
IP:                 10.0.1.244
Controlled By:      ReplicaSet/coredns-cc45bff6b
Containers:
  coredns:
    Container ID:  docker://edc002608d440ee4f771b653f4aea55be12e55f989b0a307253d46583bc691bd
    Image:         k8s.gcr.io/coredns:1.3.1
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:02382353821b12c21b062c59184e227e001079bb13ebd01f9d3270ba0fcbf1e4
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Thu, 04 Feb 2021 14:12:49 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-r4mkp (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-r4mkp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-r4mkp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  20m   default-scheduler  Successfully assigned kube-system/coredns-cc45bff6b-7zqsw to k8s2
  Normal  Pulling    19m   kubelet, k8s2      Pulling image "k8s.gcr.io/coredns:1.3.1"
  Normal  Pulled     19m   kubelet, k8s2      Successfully pulled image "k8s.gcr.io/coredns:1.3.1"
  Normal  Created    19m   kubelet, k8s2      Created container coredns
  Normal  Started    19m   kubelet, k8s2      Started container coredns

The IP of the resolver is correct:

vagrant@k8s1:~$ ks get svc kube-dns
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   38m

But the dig command is failing:

vagrant@k8s1:~$ dig +short test.local @10.96.0.10
;; connection timed out; no servers could be reached

Looking at coredns logs we can see that it is receiving the requests:

vagrant@k8s1:~$ ks logs -l k8s-app=kube-dns | grep test
2021-02-04T14:36:40.884Z [INFO] 10.0.2.15:39865 - 19766 "A IN test.local. udp 51 false 4096" NXDOMAIN qr,rd,ra,ad 114 0.023345241s
2021-02-04T14:36:45.881Z [INFO] 10.0.2.15:39865 - 19766 "A IN test.local. udp 51 false 4096" NXDOMAIN qr,rd,ra,ad 114 0.020891984s
2021-02-04T14:36:50.886Z [INFO] 10.0.2.15:39865 - 19766 "A IN test.local. udp 51 false 4096" NXDOMAIN qr,rd,ra,ad 114 0.02507274s

So the responses are getting dropped for some reason.

Nothing interesting from tcpdump running on the hostns:

vagrant@k8s1:~$ sudo tcpdump -i any -n udp and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
14:44:07.769082 IP 10.0.2.15.55554 > 10.0.1.244.53: 49592+ [1au] A? test.local. (51)
14:44:12.771207 IP 10.0.2.15.55554 > 10.0.1.244.53: 49592+ [1au] A? test.local. (51)
14:44:17.771222 IP 10.0.2.15.55554 > 10.0.1.244.53: 49592+ [1au] A? test.local. (51)
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

cilium monitor:

root@k8s1:/home/cilium# cilium monitor | grep :53
level=info msg="Initializing dissection cache..." subsys=monitor
-> overlay flow 0x0 identity remote-node->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:51549 -> 10.0.1.244:53 udp
-> overlay flow 0x0 identity remote-node->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:51549 -> 10.0.1.244:53 udp
-> overlay flow 0x0 identity remote-node->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:51549 -> 10.0.1.244:53 udp

jibi · 2021-02-04T15:17:43Z

Running the same dig command on the other node (k8s2) works:

vagrant@k8s2:~$ dig +short kubernetes.default.svc.cluster.local @10.96.0.10
10.96.0.1

And looking at tcpdump on k8s2 (where coredns is running) I can also see the response when I run dig from the k8s1 node:

vagrant@k8s2:~$ sudo tcpdump -i any -n udp and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:15:50.480991 IP 10.0.2.15.50097 > 10.0.1.244.53: 20781+ [1au] A? google.com. (51)
15:15:50.481074 IP 10.0.2.15.50097 > 10.0.1.244.53: 20781+ [1au] A? google.com. (51)
15:15:50.481356 IP 10.0.1.244.33091 > 8.8.8.8.53: 20781+ [1au] A? google.com. (51)
15:15:50.481412 IP 10.0.2.15.33091 > 8.8.8.8.53: 20781+ [1au] A? google.com. (51)
15:15:50.506223 IP 8.8.8.8.53 > 10.0.2.15.33091: 20781 1/0/1 A 216.58.198.14 (55)
15:15:50.506330 IP 8.8.8.8.53 > 10.0.1.244.33091: 20781 1/0/1 A 216.58.198.14 (55)
15:15:50.506510 IP 10.0.1.244.53 > 10.0.2.15.50097: 20781 1/0/1 A 216.58.198.14 (65) <--

so the response is somehow getting lost while being tunneled from k8s2 to k8s1

edit: the response we are seeing is from the lxc device. If dump the traffic from the cilium_vxlan device we only see the request:

vagrant@k8s2:~$ sudo tcpdump -i cilium_vxlan -n udp and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cilium_vxlan, link-type EN10MB (Ethernet), capture size 262144 bytes
15:33:10.668417 IP 10.0.2.15.53680 > 10.0.1.244.53: 15943+ [1au] A? kubernetes.default.svc.cluster.local. (77)
^C
1 packet captured

jibi · 2021-02-04T15:27:59Z

Restarted Cilium with monitor-aggregation: none and rerun cilium monitor:

k8s1:

vagrant@k8s1:~$ ks exec $(cilium_pod k8s1) -it cilium monitor | grep :53
<- host flow 0x0 identity host->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.2.15:34111 -> 10.0.1.244:53 udp
-> overlay flow 0x0 identity remote-node->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:34111 -> 10.0.1.244:53 udp

and on k8s2:

vagrant@k8s1:~$ ks exec $(cilium_pod k8s2) -it cilium monitor | grep :53
<- overlay flow 0x0 identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:52390 -> 10.0.1.244:53 udp
-> endpoint 359 flow 0x0 identity remote-node->593 state new ifindex lxce71012cea7d0 orig-ip 10.0.2.15: 10.0.2.15:52390 -> 10.0.1.244:53 udp
<- stack flow 0x0 identity world->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:52390 -> 10.0.1.244:53 udp
-> endpoint 359 flow 0x0 identity world->593 state established ifindex 0 orig-ip 10.0.2.15: 10.0.2.15:52390 -> 10.0.1.244:53 udp
<- endpoint 359 flow 0x0 identity 593->unknown state new ifindex 0 orig-ip 0.0.0.0: 10.0.1.244:53 -> 10.0.2.15:52390 udp
-> stack flow 0x0 identity 593->host state reply ifindex 0 orig-ip 0.0.0.0: 10.0.1.244:53 -> 10.0.2.15:52390 udp

jibi · 2021-02-04T15:40:02Z

Underlying problem: both nodes have the same IP for the enp0s3 interface:

vagrant@k8s1:~$ ip a s dev enp0s3 | grep inet
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
    inet6 fe80::a00:27ff:fe4e:92d0/64 scope link

vagrant@k8s2:~$ ip a s dev enp0s3 | grep inet
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
    inet6 fe80::a00:27ff:fe4e:92d0/64 scope link

so k8s2 is blackholing the traffic to k8s1.

Possible explanation to the flakiness: the test fails only when coredns is scheduled on k8s2

jibi · 2021-02-05T09:56:31Z

Setting enable-endpoint-routes to false causes k8s2 to not blackhole anymore the response traffic for k8s1:

vagrant@k8s1:~$ dig +short kubernetes.default.svc.cluster.local @10.96.0.10
10.96.0.1

pchaigno · 2021-02-08T22:07:14Z

I validated that the following diff fixes the flake locally:

$ git diff
diff --git a/pkg/datapath/iptables/iptables.go b/pkg/datapath/iptables/iptables.go
index 787a527b3..10a68b361 100644
--- a/pkg/datapath/iptables/iptables.go
+++ b/pkg/datapath/iptables/iptables.go
@@ -980,7 +980,7 @@ func (m *IptablesManager) installMasqueradeRules(prog, ifName, localDeliveryInte
                m.waitArgs,
                "-t", "nat",
                "-A", ciliumPostNatChain,
-               "!", "-o", localDeliveryInterface,
+               "!", "-o", ifName,
                "-m", "comment", "--comment", "exclude non-"+ifName+" traffic from masquerade",
                "-j", "RETURN"), false); err != nil {
                return err

Matching this iptables rule causes packets to bypass masquerading and leave with enp0s3's IP. This change (from ! -o cilium_host to ! -o lxc+) when per-endpoint routes are enabled was introduced by commit c496e25.

--- Analysis --- In tunneling mode, our CILIUM_POST_nat chain is currently as follows. 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat ! -o cilium_host -m comment --comment "exclude non-cilium_host traffic from masquerade" -j RETURN 3. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquarade" -j ACCEPT 4. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 5. -A CILIUM_POST_nat -s 127.0.0.1/32 -o cilium_host -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 The second rule implements an early exit from the chain, as none of the subsequent rules match on output interfaces other than cilium_host. Once per-endpoint routes are enabled in addition to tunneling, the chain changes. The second and fifth rules now match on lxc+ as the output interface: 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat ! -o lxc+ -m comment --comment "exclude non-cilium_host traffic from masquerade" -j RETURN 3. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquarade" -j ACCEPT 4. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 5. -A CILIUM_POST_nat -s 127.0.0.1/32 -o lxc+ -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 Commit c496e25 ("eni: Support masquerading") implemented that change, based on the fact that with per-endpoint routes, packets are routed directly to lxc devices without going through cilium_host. Nevertheless, the fourth rule still matches on cilium_host and therefore becomes noop. At the time c496e25 was implemented, this change was correct because the fourth rule is only present when tunneling is enabled and per-endpoint routes were not compatible with tunneling. Commit 3179a47 ("datapath: Support enable-endpoint-routes with encapsulation") however made those options compatible and the above chain possible. --- Fix --- Ideally, we would update the second rule when running with tunneling and per-endpoint routes, to be '! -o lxc+ ! -o cilium_host'. Iptables however doesn't support multiple output interface matchers. This commit implements a different fix and drops the second rule. Since subsequent SNATing rules already match on an output interface, the second rule is unnecessary. With tunneling and per-endpoint routes, the table now looks like: 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquerade" -j ACCEPT 3. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 4. -A CILIUM_POST_nat -s 127.0.0.1/32 -o lxc+ -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 --- Bug Impact --- This lack of masquerading can cause issues for example when trying to connect to a VIP with a remote backend from the hostns in our test VMs: 1. DNS request is made to VIP 10.96.0.10. 2. 10.0.2.15, the IP of enp0s3 (default route) is assigned as source IP. 3. kube-proxy translates VIP to backend IP on different node, e.g. 10.0.0.87. 3. Packet is sent to cilium_host as per the ip routes. 4. The packet is not masqueraded because it matches rule 2 in the bogus iptables chain (i.e., cilium_host != lxc+). 5. The packet arrives as 10.0.2.15 -> 10.0.0.87 on the second node. 6. Second node tries to answer to 10.0.2.15 unsuccessfully (all nodes have the same IP 10.0.2.15 for enp0s3; that IP isn't routable across nodes). This bug is described in #13774. Fixes: #13774 Co-authored-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

--- Analysis --- In tunneling mode, our CILIUM_POST_nat chain is currently as follows. 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat ! -o cilium_host -m comment --comment "exclude non-cilium_host traffic from masquerade" -j RETURN 3. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquarade" -j ACCEPT 4. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 5. -A CILIUM_POST_nat -s 127.0.0.1/32 -o cilium_host -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 The second rule implements an early exit from the chain, as none of the subsequent rules match on output interfaces other than cilium_host. Once per-endpoint routes are enabled in addition to tunneling, the chain changes. The second and fifth rules now match on lxc+ as the output interface: 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat ! -o lxc+ -m comment --comment "exclude non-cilium_host traffic from masquerade" -j RETURN 3. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquarade" -j ACCEPT 4. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 5. -A CILIUM_POST_nat -s 127.0.0.1/32 -o lxc+ -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 Commit c496e25 ("eni: Support masquerading") implemented that change, based on the fact that with per-endpoint routes, packets are routed directly to lxc devices without going through cilium_host. Nevertheless, the fourth rule still matches on cilium_host and therefore becomes noop. At the time c496e25 was implemented, this change was correct because the fourth rule is only present when tunneling is enabled and per-endpoint routes were not compatible with tunneling. Commit 3179a47 ("datapath: Support enable-endpoint-routes with encapsulation") however made those options compatible and the above chain possible. --- Fix --- Ideally, we would update the second rule when running with tunneling and per-endpoint routes, to be '! -o lxc+ ! -o cilium_host'. Iptables however doesn't support multiple output interface matchers. This commit implements a different fix and drops the second rule. Since subsequent SNATing rules already match on an output interface, the second rule is unnecessary. With tunneling and per-endpoint routes, the table now looks like: 1. -A CILIUM_POST_nat -s 10.0.1.0/24 ! -d 10.0.0.0/8 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE 2. -A CILIUM_POST_nat -m mark --mark 0xa00/0xe00 -m comment --comment "exclude proxy return traffic from masquerade" -j ACCEPT 3. -A CILIUM_POST_nat ! -s 10.0.1.6/32 ! -d 10.0.1.0/24 -o cilium_host -m comment --comment "cilium host->cluster masquerade" -j SNAT --to-source 10.0.1.6 4. -A CILIUM_POST_nat -s 127.0.0.1/32 -o lxc+ -m comment --comment "cilium host->cluster from 127.0.0.1 masquerade" -j SNAT --to-source 10.0.1.6 --- Bug Impact --- This lack of masquerading can cause issues for example when trying to connect to a VIP with a remote backend from the hostns in our test VMs: 1. DNS request is made to VIP 10.96.0.10. 2. 10.0.2.15, the IP of enp0s3 (default route) is assigned as source IP. 3. kube-proxy translates VIP to backend IP on different node, e.g. 10.0.0.87. 3. Packet is sent to cilium_host as per the ip routes. 4. The packet is not masqueraded because it matches rule 2 in the bogus iptables chain (i.e., cilium_host != lxc+). 5. The packet arrives as 10.0.2.15 -> 10.0.0.87 on the second node. 6. Second node tries to answer to 10.0.2.15 unsuccessfully (all nodes have the same IP 10.0.2.15 for enp0s3; that IP isn't routable across nodes). This bug is described in cilium#13774. Fixes: cilium#13774 Co-authored-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

nebril added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Oct 27, 2020

nebril self-assigned this Oct 27, 2020

kkourt mentioned this issue Dec 3, 2020

distinguish between FIN and RST on datapath #14097

Merged

pchaigno mentioned this issue Jan 6, 2021

ipsec: Fatal on unsupported, <4.19 kernels in tunneling mode #14525

Merged

jrajahalme mentioned this issue Jan 8, 2021

envoy: Update proxylib interface #14560

Merged

pchaigno changed the title ~~CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes fails on k8s-all~~ CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes Jan 11, 2021

pchaigno assigned jibi and pchaigno and unassigned nebril Feb 9, 2021

pchaigno mentioned this issue Feb 9, 2021

test: Validate Kubernetes DNS can resolve from both nodes #14909

Open

pchaigno mentioned this issue Feb 9, 2021

iptables: Fix incorrect SNAT bypass with endpoint routes and tunneling #14913

Merged

nathanjsweet closed this as completed in #14913 Feb 10, 2021

This was referenced Feb 12, 2021

CI: Suite-k8s-1.15.K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #14948

Closed

v1.9 backports 2021-02-11 #14930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #13774

CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #13774

nebril commented Oct 27, 2020

nebril commented Oct 27, 2020

pchaigno commented Jan 6, 2021

jrajahalme commented Jan 8, 2021

ungureanuvladvictor commented Jan 11, 2021

ungureanuvladvictor commented Jan 11, 2021

jibi commented Feb 1, 2021

jibi commented Feb 4, 2021 •

edited

Loading

jibi commented Feb 4, 2021 •

edited

Loading

jibi commented Feb 4, 2021

jibi commented Feb 4, 2021 •

edited

Loading

jibi commented Feb 5, 2021

pchaigno commented Feb 8, 2021 •

edited

Loading

CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #13774

CI: K8sDatapathConfig Encapsulation Check vxlan connectivity with per-endpoint routes #13774

Comments

nebril commented Oct 27, 2020

nebril commented Oct 27, 2020

pchaigno commented Jan 6, 2021

jrajahalme commented Jan 8, 2021

ungureanuvladvictor commented Jan 11, 2021

ungureanuvladvictor commented Jan 11, 2021

jibi commented Feb 1, 2021

jibi commented Feb 4, 2021 • edited Loading

jibi commented Feb 4, 2021 • edited Loading

jibi commented Feb 4, 2021

jibi commented Feb 4, 2021 • edited Loading

jibi commented Feb 5, 2021

pchaigno commented Feb 8, 2021 • edited Loading

jibi commented Feb 4, 2021 •

edited

Loading

jibi commented Feb 4, 2021 •

edited

Loading

jibi commented Feb 4, 2021 •

edited

Loading

pchaigno commented Feb 8, 2021 •

edited

Loading