Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing #14959

Closed
pchaigno opened this issue Feb 12, 2021 · 7 comments
Labels
area/CI Continuous Integration testing issue or flake area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects

Comments

@pchaigno
Copy link
Member

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.14-kernel-4.9/31/testReport/junit/Suite-k8s-1/14/K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing_with_bpf_host/
042b94d6_K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing_with_bpf_host.zip

Stacktrace

/home/jenkins/workspace/Cilium-PR-K8s-1.14-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
Connectivity test between nodes failed
Expected
    <bool>: false
to be true
/home/jenkins/workspace/Cilium-PR-K8s-1.14-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/DatapathConfiguration.go:584

Standard Output

Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 4
Number of "level=error" in logs: 0
⚠️  Number of "level=warning" in logs: 11
Number of "Cilium API handler panicked" in logs: 0
⚠️  Number of "Goroutine took lock for more than" in logs: 13
Top 5 errors/warnings:
BPF host reachable services for UDP needs kernel 4.19.57, 5.1.16, 5.2.0 or newer. If you run an older kernel and only need TCP, then specify: --host-reachable-services-protos=tcp and --kube-proxy-replacement=partial Disabling the feature.
SessionAffinity feature requires BPF LRU maps. Disabling the feature.
BPF masquerade requires NodePort (--enable-node-port=\
IPSec cannot be used with BPF NodePort. Disabling BPF NodePort feature.
BPF host reachable services for TCP needs kernel 4.17.0 or newer. Disabling the feature.
Cilium pods: [cilium-677gb cilium-hp6qc]
Netpols loaded: 
CiliumNetworkPolicies loaded: 202102121032k8sdatapathconfigtransparentencryptiondirectrouting::l3-policy-demo 
Endpoint Policy Enforcement:
Pod                          Ingress   Egress
testds-jjtt7                           
coredns-cc45bff6b-nt5vc                
test-k8s2-664d69b864-ht7j8             
testclient-hsdcs                       
testclient-zkbgg                       
testds-5s8jl                           
Cilium agent 'cilium-677gb': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 36 Failed 0
Cilium agent 'cilium-hp6qc': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 27 Failed 0

Standard Error

Click for details
10:30:35 STEP: Deploying ipsec_secret.yaml in namespace kube-system
10:30:36 STEP: Installing Cilium
10:30:36 STEP: Waiting for Cilium to become ready
10:30:36 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:37 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:38 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:39 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:40 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:41 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:42 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:44 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:45 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:46 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:47 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:48 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:49 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:50 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:51 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:52 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:53 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:54 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:55 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:56 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:57 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:58 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:30:59 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:00 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:01 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:02 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:03 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:05 STEP: Cilium DaemonSet not ready yet: only 0 of 2 desired pods are ready
10:31:06 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:07 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:08 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:09 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:10 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:11 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:12 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:13 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:14 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:15 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:16 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:17 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:18 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:19 STEP: Cilium DaemonSet not ready yet: only 1 of 2 desired pods are ready
10:31:20 STEP: Number of ready Cilium pods: 2
10:31:20 STEP: Validating if Kubernetes DNS is deployed
10:31:20 STEP: Checking if deployment is ready
10:31:20 STEP: Checking if kube-dns service is plumbed correctly
10:31:20 STEP: Checking if DNS can resolve
10:31:20 STEP: Checking if pods have identity
10:31:21 STEP: Kubernetes DNS is up and operational
10:31:21 STEP: Validating Cilium Installation
10:31:21 STEP: Performing Cilium controllers preflight check
10:31:21 STEP: Performing Cilium status preflight check
10:31:21 STEP: Performing Cilium health check
10:31:22 STEP: Performing Cilium service preflight check
10:31:22 STEP: Performing K8s service preflight check
10:31:23 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-hp6qc': Exitcode: 255 
Err: exit status 255
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 255
	 

10:31:26 STEP: Performing Cilium controllers preflight check
10:31:26 STEP: Performing Cilium health check
10:31:26 STEP: Performing Cilium status preflight check
10:31:27 STEP: Performing Cilium service preflight check
10:31:27 STEP: Performing K8s service preflight check
10:31:31 STEP: Performing Cilium status preflight check
10:31:31 STEP: Performing Cilium controllers preflight check
10:31:31 STEP: Performing Cilium health check
10:31:32 STEP: Performing Cilium service preflight check
10:31:32 STEP: Performing K8s service preflight check
10:31:36 STEP: Performing Cilium controllers preflight check
10:31:36 STEP: Performing Cilium status preflight check
10:31:36 STEP: Performing Cilium health check
10:31:37 STEP: Performing Cilium service preflight check
10:31:37 STEP: Performing K8s service preflight check
10:31:41 STEP: Performing Cilium controllers preflight check
10:31:41 STEP: Performing Cilium status preflight check
10:31:41 STEP: Performing Cilium health check
10:31:42 STEP: Performing Cilium service preflight check
10:31:42 STEP: Performing K8s service preflight check
10:31:46 STEP: Performing Cilium controllers preflight check
10:31:46 STEP: Performing Cilium status preflight check
10:31:46 STEP: Performing Cilium health check
10:31:47 STEP: Performing Cilium service preflight check
10:31:47 STEP: Performing K8s service preflight check
10:31:51 STEP: Performing Cilium controllers preflight check
10:31:51 STEP: Performing Cilium health check
10:31:51 STEP: Performing Cilium status preflight check
10:31:52 STEP: Performing Cilium service preflight check
10:31:52 STEP: Performing K8s service preflight check
10:31:53 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-hp6qc': Exitcode: 255 
Err: exit status 255
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 255
	 

10:31:56 STEP: Performing Cilium controllers preflight check
10:31:56 STEP: Performing Cilium status preflight check
10:31:56 STEP: Performing Cilium health check
10:31:57 STEP: Performing Cilium service preflight check
10:31:57 STEP: Performing K8s service preflight check
10:31:59 STEP: Waiting for cilium-operator to be ready
10:31:59 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator")
10:31:59 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator") => <nil>
10:31:59 STEP: Making sure all endpoints are in ready state
10:32:00 STEP: Creating namespace 202102121032k8sdatapathconfigtransparentencryptiondirectrouting
10:32:00 STEP: Deploying demo_ds.yaml in namespace 202102121032k8sdatapathconfigtransparentencryptiondirectrouting
10:32:01 STEP: Applying policy /home/jenkins/workspace/Cilium-PR-K8s-1.14-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/manifests/l3-policy-demo.yaml
10:32:07 STEP: Waiting for 4m0s for 5 pods of deployment demo_ds.yaml to become ready
10:32:07 STEP: WaitforNPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="")
10:32:07 STEP: WaitforNPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="") => <nil>
10:32:07 STEP: Checking pod connectivity between nodes
10:32:07 STEP: WaitforPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="-l zgroup=testDSClient")
10:32:07 STEP: WaitforPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="-l zgroup=testDSClient") => <nil>
10:32:07 STEP: WaitforPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="-l zgroup=testDS")
10:32:08 STEP: WaitforPods(namespace="202102121032k8sdatapathconfigtransparentencryptiondirectrouting", filter="-l zgroup=testDS") => <nil>
FAIL: Connectivity test between nodes failed
Expected
    <bool>: false
to be true
=== Test Finished at 2021-02-12T10:32:17Z====
10:32:17 STEP: Running JustAfterEach block for EntireTestsuite K8sDatapathConfig
===================== TEST FAILED =====================
10:32:17 STEP: Running AfterFailed block for EntireTestsuite K8sDatapathConfig
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE                                                         NAME                              READY   STATUS    RESTARTS   AGE    IP              NODE   NOMINATED NODE   READINESS GATES
	 202102121032k8sdatapathconfigtransparentencryptiondirectrouting   test-k8s2-664d69b864-ht7j8        2/2     Running   0          20s    10.0.0.170      k8s2   <none>           <none>
	 202102121032k8sdatapathconfigtransparentencryptiondirectrouting   testclient-hsdcs                  1/1     Running   0          20s    10.0.1.46       k8s1   <none>           <none>
	 202102121032k8sdatapathconfigtransparentencryptiondirectrouting   testclient-zkbgg                  1/1     Running   0          20s    10.0.0.107      k8s2   <none>           <none>
	 202102121032k8sdatapathconfigtransparentencryptiondirectrouting   testds-5s8jl                      2/2     Running   0          20s    10.0.1.220      k8s1   <none>           <none>
	 202102121032k8sdatapathconfigtransparentencryptiondirectrouting   testds-jjtt7                      2/2     Running   0          20s    10.0.0.55       k8s2   <none>           <none>
	 cilium-monitoring                                                 grafana-b959498b4-9psl8           0/1     Running   0          68m    10.0.1.210      k8s2   <none>           <none>
	 cilium-monitoring                                                 prometheus-69fd5878c7-9tdvc       1/1     Running   0          68m    10.0.1.137      k8s2   <none>           <none>
	 kube-system                                                       cilium-677gb                      1/1     Running   0          104s   192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       cilium-hp6qc                      1/1     Running   0          104s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       cilium-operator-78f7576df-64bkp   1/1     Running   0          104s   192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       cilium-operator-78f7576df-n7dq9   1/1     Running   0          104s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       coredns-cc45bff6b-nt5vc           1/1     Running   0          62m    10.0.0.44       k8s2   <none>           <none>
	 kube-system                                                       etcd-k8s1                         1/1     Running   0          74m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-apiserver-k8s1               1/1     Running   0          74m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-controller-manager-k8s1      1/1     Running   0          74m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-proxy-gpdjh                  1/1     Running   0          69m    192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       kube-proxy-wc5lg                  1/1     Running   0          75m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-scheduler-k8s1               1/1     Running   0          74m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       log-gatherer-86rz5                1/1     Running   0          69m    192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       log-gatherer-j2v9x                1/1     Running   0          69m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       registry-adder-mx22m              1/1     Running   0          69m    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       registry-adder-qfqrv              1/1     Running   0          69m    192.168.36.12   k8s2   <none>           <none>
	 
Stderr:
 	 

Fetching command output from pods [cilium-677gb cilium-hp6qc]
cmd: kubectl exec -n kube-system cilium-677gb -- cilium status
Exitcode: 0 
Stdout:
 	 KVStore:                Ok   Disabled
	 Kubernetes:             Ok   1.14 (v1.14.10) [linux/amd64]
	 Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
	 KubeProxyReplacement:   Probe   
	 Cilium:                 Ok      OK
	 NodeMonitor:            Listening for events on 3 CPUs with 64x4096 of shared memory
	 Cilium health daemon:   Ok   
	 IPAM:                   IPv4: 6/255 allocated from 10.0.0.0/24, IPv6: 6/255 allocated from fd00::/120
	 BandwidthManager:       Disabled
	 Host Routing:           Legacy
	 Masquerading:           IPTables
	 Controller Status:      36/36 healthy
	 Proxy Status:           OK, ip 10.0.0.110, 0 redirects active on ports 10000-20000
	 Hubble:                 Ok              Current/Max Flows: 605/4096 (14.77%), Flows/s: 7.58   Metrics: Disabled
	 Cluster health:         2/2 reachable   (2021-02-12T10:32:02Z)
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-677gb -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                                       IPv6       IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                                                            
	 99         Disabled           Disabled          20031      k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::b0   10.0.0.44    ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=coredns                                                                                   
	                                                            k8s:io.kubernetes.pod.namespace=kube-system                                                                                       
	                                                            k8s:k8s-app=kube-dns                                                                                                              
	 773        Disabled           Disabled          6427       k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::e5   10.0.0.107   ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                   
	                                                            k8s:io.kubernetes.pod.namespace=202102121032k8sdatapathconfigtransparentencryptiondirectrouting                                   
	                                                            k8s:zgroup=testDSClient                                                                                                           
	 832        Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s2                                                                                                ready   
	                                                            reserved:host                                                                                                                     
	 1970       Disabled           Disabled          4          reserved:health                                                                                   fd00::9a   10.0.0.136   ready   
	 2773       Disabled           Disabled          14889      k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::21   10.0.0.170   ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                   
	                                                            k8s:io.kubernetes.pod.namespace=202102121032k8sdatapathconfigtransparentencryptiondirectrouting                                   
	                                                            k8s:zgroup=test-k8s2                                                                                                              
	 2988       Enabled            Disabled          57077      k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::1f   10.0.0.55    ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                   
	                                                            k8s:io.kubernetes.pod.namespace=202102121032k8sdatapathconfigtransparentencryptiondirectrouting                                   
	                                                            k8s:zgroup=testDS                                                                                                                 
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-hp6qc -- cilium status
Exitcode: 0 
Stdout:
 	 KVStore:                Ok   Disabled
	 Kubernetes:             Ok   1.14 (v1.14.10) [linux/amd64]
	 Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
	 KubeProxyReplacement:   Probe   
	 Cilium:                 Ok      OK
	 NodeMonitor:            Listening for events on 3 CPUs with 64x4096 of shared memory
	 Cilium health daemon:   Ok   
	 IPAM:                   IPv4: 4/255 allocated from 10.0.1.0/24, IPv6: 4/255 allocated from fd00::100/120
	 BandwidthManager:       Disabled
	 Host Routing:           Legacy
	 Masquerading:           IPTables
	 Controller Status:      27/27 healthy
	 Proxy Status:           OK, ip 10.0.1.37, 0 redirects active on ports 10000-20000
	 Hubble:                 Ok              Current/Max Flows: 478/4096 (11.67%), Flows/s: 5.63   Metrics: Disabled
	 Cluster health:         2/2 reachable   (2021-02-12T10:31:59Z)
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-hp6qc -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                                       IPv6        IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                                                             
	 660        Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s1                                                                                                 ready   
	                                                            k8s:node-role.kubernetes.io/master                                                                                                 
	                                                            reserved:host                                                                                                                      
	 828        Disabled           Disabled          4          reserved:health                                                                                   fd00::186   10.0.1.200   ready   
	 2312       Enabled            Disabled          57077      k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::1d2   10.0.1.220   ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                    
	                                                            k8s:io.kubernetes.pod.namespace=202102121032k8sdatapathconfigtransparentencryptiondirectrouting                                    
	                                                            k8s:zgroup=testDS                                                                                                                  
	 4046       Disabled           Disabled          6427       k8s:io.cilium.k8s.policy.cluster=default                                                          fd00::122   10.0.1.46    ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                    
	                                                            k8s:io.kubernetes.pod.namespace=202102121032k8sdatapathconfigtransparentencryptiondirectrouting                                    
	                                                            k8s:zgroup=testDSClient                                                                                                            
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================
10:32:32 STEP: Running AfterEach for block EntireTestsuite K8sDatapathConfig
10:32:32 STEP: Deleting deployment demo_ds.yaml
10:32:32 STEP: Deleting deployment ipsec_secret.yaml
10:32:33 STEP: Deleting namespace 202102121032k8sdatapathconfigtransparentencryptiondirectrouting
10:32:33 STEP: Deleting namespace 202102121032k8sdatapathconfigtransparentencryptiondirectrouting
10:32:46 STEP: Running AfterEach for block EntireTestsuite
@pchaigno pchaigno changed the title CI: K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing with bpf_host CI: K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing Apr 8, 2021
@tklauser tklauser self-assigned this May 17, 2021
@tklauser
Copy link
Member

Just dumping my analysis so far:

Some recent failures:

All three have Unable to install direct node route {Ifindex: 0 Dst: fd02::100/120 Src: <nil> Gw: <nil> Flags: [] Table: 0} in the top error/warning messages. Checking the logs for this message we see something like:

2021-05-15T17:20:48.351381885Z level=debug msg="Updating direct route" allocCIDR="fd02::100/120" ipAddr="<nil>" subsys=linux-datapath
2021-05-15T17:20:48.351384415Z level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: fd02::100/120 Src: <nil> Gw: <nil> Flags: [] Table: 0}" error="unable to lookup route for node <nil>: numerical result out of range" subsys=linux-datapath     

It seems the node IPv6 address (newIP6) passed here is nil:

n.updateDirectRoute(oldIP6Cidr, newNode.IPv6AllocCIDR, oldIP6, newIP6, firstAddition, n.nodeConfig.EnableIPv6)

newIP6 is defined here:

newIP6 = newNode.GetNodeIP(true)

Checking the log further we eventually see a successful direct route update ~12 seconds later:

2021-05-15T17:21:00.885884009Z level=debug msg="Updating direct route" allocCIDR="fd02::100/120" ipAddr="fd04::12" subsys=linux-datapath 
2021-05-15T17:21:01.355145255Z level=debug msg="Upserting IP into ipcache layer" identity="{37204 custom-resource false}" ipAddr=10.0.1.201 k8sNamespace=kube-system k8sPodName=coredns-755cd654d4-fqnq5 key=6 namedPorts="map[dns:{53 17} dns-tcp:{53 6} metrics:{9153 6}]" subsys=ipcache

Looks like the remote node IPv6 address becomes known with some delay, potentially messing up state before, when the IP is not yet known? Could be a red herring though. But maybe it would be good at least to avoid the direct node route installation (and thus the warning log) in that case?

@jrajahalme
Copy link
Member

Happened again in #16271
No tunneling, so also ingress policy enforcement depends on ipcache propagation:

2021-05-22T00:32:52.930585469Z level=info msg="  --tunnel='disabled'" subsys=daemon

Policy is installed at 00:33:25:

time="2021-05-22T00:33:25Z" level=debug msg="running command: kubectl apply -f /home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/manifests/l3-policy-demo.yaml -n 202105220033k8sdatapathconfigtransparentencryptiondirectrouting"

policy map update in the destination Cilium agent happens at 00:33:27:

time="2021-05-22T00:33:27Z" level=debug msg=addPolicyKey bpfMapKey="Identity=31458,DestPort=0,Nexthdr=0,TrafficDirection=0" bpfMapValue="ProxyPort=0" containerID=b3d3d9661d datapathPolicyRevision=1 desiredPolicyRevision=2 endpointID=2445 identity=19160 incremental=false ipv4=10.0.0.141 ipv6="fd02::ae" k8sPodName=202105220033k8sdatapathconfigtransparentencryptiondirectrouting/testds-c25qr subsys=endpoint

Test command starts to run for UP TO 5 seconds at 00:33:30:

time="2021-05-22T00:33:30Z" level=debug msg="running command: kubectl exec -n 202105220033k8sdatapathconfigtransparentencryptiondirectrouting testclient-rrrtr -- ping -W 5 -c 5 10.0.0.141"

ipcache update in the destination node having the ingress policy happens at 00:33:36:

2021-05-22T00:33:36.097000183Z level=debug msg="Upserting IP into ipcache layer" identity="{31458 custom-resource false}" ipAddr=10.0.1.79 k8sNamespace=202105220033k8sdatapathconfigtransparentencryptiondirectrouting k8sPodName=testclient-rrrtr key=6 namedPorts="map[]" subsys=ipcache

The test is reported as failed at 00:33:40:

time="2021-05-22T00:33:40Z" level=error msg="Error executing command 'kubectl exec -n 202105220033k8sdatapathconfigtransparentencryptiondirectrouting testclient-rrrtr -- ping -W 5 -c 5 10.0.0.141'" error="exit status 1"
cmd: "kubectl exec -n 202105220033k8sdatapathconfigtransparentencryptiondirectrouting testclient-rrrtr -- ping -W 5 -c 5 10.0.0.141" exitCode: 1 duration: 9.220263797s stdout:
PING 10.0.0.141 (10.0.0.141) 56(84) bytes of data.

Based on this it seems pretty conclusive that this test flakes due to ipcache propagation sometimes taking longer than 4-5 seconds!

gandro added a commit to gandro/cilium that referenced this issue May 31, 2021
This increases the curl connection timeout from 5 to 15 seconds to avoid
issues with IPCache propagation delay. On Cilium master an 1.10, it
seems that IPCache updates in CI can take up to 4-8 seconds.

CI flakes likely caused by the increased IPCache propagation delay:

 - cilium#13839
 - cilium#14959
 - cilium#15103
 - cilium#16237

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@tklauser tklauser removed their assignment Nov 10, 2021
@brb brb added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Feb 17, 2022
@ti-mo
Copy link
Contributor

ti-mo commented May 11, 2022

Hitting this in #19159 as well:

2022-05-10T19:14:31.668706938Z level=debug msg="Updating direct route" addedCIDRs="[10.0.0.0/24]" newIP=192.168.56.11 oldIP="<nil>" removedCIDRs="[]" subsys=linux-datapath
2022-05-10T19:14:31.668796333Z level=debug msg="Updating direct route" addedCIDRs="[fd02::/120]" newIP="<nil>" oldIP="<nil>" removedCIDRs="[]" subsys=linux-datapath
2022-05-10T19:14:31.668802725Z level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: fd02::/120 Src: <nil> Gw: <nil> Flags: [] Table: 0 Realm: 0}" error="unable to lookup route for node <nil>: numerical result out of range" subsys=linux-datapath

In another related failure, there are a few added warnings:

Mismatch of router IPs found during restoration. The Kubernetes resource contained fd02::17a, while the filesystem contained fd02::d9. Using the router IP from the filesystem. To change the router IP, specify --local-router-ipv4 and/or --local-router-ipv6.
Mismatch of router IPs found during restoration. The Kubernetes resource contained 10.0.0.148, while the filesystem contained 10.0.1.9. Using the router IP from the filesystem. To change the router IP, specify --local-router-ipv4 and/or --local-router-ipv6.
Mismatch of router IPs found during restoration. The Kubernetes resource contained fd02::51, while the filesystem contained fd02::148. Using the router IP from the filesystem. To change the router IP, specify --local-router-ipv4 and/or --local-router-ipv6.
Unable to install direct node route {Ifindex: 0 Dst: fd02::/120 Src: <nil> Gw: <nil> Flags: [] Table: 0 Realm: 0}

@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 11, 2022
@pchaigno pchaigno removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 11, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 10, 2022
@github-actions
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
No open projects
CI Force
  
Awaiting triage
Development

No branches or pull requests

5 participants