CI: K8sFQDNTest Restart Cilium validate that FQDN is still working: Error reaching kube-dns before test #16717

pchaigno · 2021-06-30T20:18:33Z

https://jenkins.cilium.io/job/cilium-master-k8s-1.17-kernel-4.9/132/testReport/Suite-k8s-1/17/K8sFQDNTest_Restart_Cilium_validate_that_FQDN_is_still_working/
3205f837_K8sFQDNTest_Restart_Cilium_validate_that_FQDN_is_still_working.zip

Stacktrace

/home/jenkins/workspace/cilium-master-k8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:465
Error reaching kube-dns before test: error looking up kube-dns.kube-system.svc.cluster.local from default/app2-5cc5d58844-nv6wr: ;; connection timed out; no servers could be reached

command terminated with exit code 1

Expected
    <*errors.errorString | 0xc0024981b0>: {
        s: "error looking up kube-dns.kube-system.svc.cluster.local from default/app2-5cc5d58844-nv6wr: ;; connection timed out; no servers could be reached\n\ncommand terminated with exit code 1\n",
    }
to be nil
/home/jenkins/workspace/cilium-master-k8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/fqdn.go:89

Standard Output

Cilium pods: [cilium-mxkd7 cilium-zggcq]
Netpols loaded: 
CiliumNetworkPolicies loaded: 
Endpoint Policy Enforcement:
Pod                          Ingress   Egress
grafana-7fd557d749-qs865               
prometheus-d87f8f984-7rqc2             
app1-7b6ddb776f-9q5nl                  
app1-7b6ddb776f-n4vvx                  
app2-5cc5d58844-nv6wr                  
app3-6c7856c5b5-fs6n9                  
coredns-767d4c6dd7-jsfcl               
Cilium agent 'cilium-mxkd7': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 36 Failed 0
Cilium agent 'cilium-zggcq': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 35 Failed 0

Standard Error

Click to show

15:21:47 STEP: Running BeforeAll block for EntireTestsuite K8sFQDNTest
15:21:47 STEP: Ensuring the namespace kube-system exists
15:21:47 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
15:21:47 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
15:21:47 STEP: Installing Cilium
15:21:48 STEP: Waiting for Cilium to become ready
15:22:18 STEP: Validating if Kubernetes DNS is deployed
15:22:18 STEP: Checking if deployment is ready
15:22:18 STEP: Checking if kube-dns service is plumbed correctly
15:22:18 STEP: Checking if DNS can resolve
15:22:18 STEP: Checking if pods have identity
15:22:20 STEP: Kubernetes DNS is up and operational
15:22:20 STEP: Validating Cilium Installation
15:22:20 STEP: Performing Cilium controllers preflight check
15:22:20 STEP: Performing Cilium health check
15:22:20 STEP: Performing Cilium status preflight check
15:22:26 STEP: Performing Cilium service preflight check
15:22:26 STEP: Performing K8s service preflight check
15:22:26 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-mxkd7': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1
	 

15:22:26 STEP: Performing Cilium controllers preflight check
15:22:26 STEP: Performing Cilium status preflight check
15:22:26 STEP: Performing Cilium health check
15:22:28 STEP: Performing Cilium service preflight check
15:22:28 STEP: Performing K8s service preflight check
15:22:28 STEP: Performing Cilium controllers preflight check
15:22:28 STEP: Performing Cilium health check
15:22:28 STEP: Performing Cilium status preflight check
15:22:30 STEP: Performing Cilium service preflight check
15:22:30 STEP: Performing K8s service preflight check
15:22:30 STEP: Performing Cilium controllers preflight check
15:22:30 STEP: Performing Cilium status preflight check
15:22:30 STEP: Performing Cilium health check
15:22:31 STEP: Performing Cilium service preflight check
15:22:31 STEP: Performing K8s service preflight check
15:22:31 STEP: Performing Cilium status preflight check
15:22:31 STEP: Performing Cilium controllers preflight check
15:22:31 STEP: Performing Cilium health check
15:22:34 STEP: Performing Cilium service preflight check
15:22:34 STEP: Performing K8s service preflight check
15:22:34 STEP: Performing Cilium controllers preflight check
15:22:34 STEP: Performing Cilium status preflight check
15:22:34 STEP: Performing Cilium health check
15:22:35 STEP: Performing Cilium service preflight check
15:22:35 STEP: Performing K8s service preflight check
15:22:35 STEP: Performing Cilium controllers preflight check
15:22:35 STEP: Performing Cilium health check
15:22:35 STEP: Performing Cilium status preflight check
15:22:38 STEP: Performing Cilium service preflight check
15:22:38 STEP: Performing K8s service preflight check
15:22:38 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-mxkd7': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1
	 

15:22:38 STEP: Performing Cilium controllers preflight check
15:22:38 STEP: Performing Cilium status preflight check
15:22:38 STEP: Performing Cilium health check
15:22:40 STEP: Performing Cilium service preflight check
15:22:40 STEP: Performing K8s service preflight check
15:22:40 STEP: Performing Cilium controllers preflight check
15:22:40 STEP: Performing Cilium status preflight check
15:22:40 STEP: Performing Cilium health check
15:22:41 STEP: Performing Cilium service preflight check
15:22:41 STEP: Performing K8s service preflight check
15:22:41 STEP: Performing Cilium status preflight check
15:22:41 STEP: Performing Cilium controllers preflight check
15:22:41 STEP: Performing Cilium health check
15:22:43 STEP: Performing Cilium service preflight check
15:22:43 STEP: Performing K8s service preflight check
15:22:43 STEP: Performing Cilium controllers preflight check
15:22:43 STEP: Performing Cilium status preflight check
15:22:43 STEP: Performing Cilium health check
15:22:45 STEP: Performing Cilium service preflight check
15:22:45 STEP: Performing K8s service preflight check
15:22:45 STEP: Performing Cilium controllers preflight check
15:22:45 STEP: Performing Cilium health check
15:22:45 STEP: Performing Cilium status preflight check
15:22:47 STEP: Performing Cilium service preflight check
15:22:47 STEP: Performing K8s service preflight check
15:22:47 STEP: Performing Cilium controllers preflight check
15:22:47 STEP: Performing Cilium status preflight check
15:22:47 STEP: Performing Cilium health check
15:22:48 STEP: Performing Cilium service preflight check
15:22:48 STEP: Performing K8s service preflight check
15:22:48 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-mxkd7': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1
	 

15:22:48 STEP: Performing Cilium controllers preflight check
15:22:48 STEP: Performing Cilium health check
15:22:48 STEP: Performing Cilium status preflight check
15:22:50 STEP: Performing Cilium service preflight check
15:22:50 STEP: Performing K8s service preflight check
15:22:50 STEP: Performing Cilium controllers preflight check
15:22:50 STEP: Performing Cilium status preflight check
15:22:50 STEP: Performing Cilium health check
15:22:51 STEP: Performing Cilium service preflight check
15:22:51 STEP: Performing K8s service preflight check
15:22:51 STEP: Performing Cilium controllers preflight check
15:22:51 STEP: Performing Cilium health check
15:22:51 STEP: Performing Cilium status preflight check
15:22:54 STEP: Performing Cilium service preflight check
15:22:54 STEP: Performing K8s service preflight check
15:22:54 STEP: Performing Cilium status preflight check
15:22:54 STEP: Performing Cilium controllers preflight check
15:22:54 STEP: Performing Cilium health check
15:22:57 STEP: Performing Cilium service preflight check
15:22:57 STEP: Performing K8s service preflight check
15:22:57 STEP: Performing Cilium controllers preflight check
15:22:57 STEP: Performing Cilium status preflight check
15:22:57 STEP: Performing Cilium health check
15:23:00 STEP: Performing Cilium service preflight check
15:23:00 STEP: Performing K8s service preflight check
15:23:00 STEP: Performing Cilium controllers preflight check
15:23:00 STEP: Performing Cilium health check
15:23:00 STEP: Performing Cilium status preflight check
15:23:01 STEP: Performing Cilium service preflight check
15:23:01 STEP: Performing K8s service preflight check
15:23:01 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-mxkd7': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1
	 

15:23:01 STEP: Performing Cilium controllers preflight check
15:23:01 STEP: Performing Cilium health check
15:23:01 STEP: Performing Cilium status preflight check
15:23:02 STEP: Performing Cilium service preflight check
15:23:02 STEP: Performing K8s service preflight check
15:23:02 STEP: Performing Cilium controllers preflight check
15:23:02 STEP: Performing Cilium health check
15:23:02 STEP: Performing Cilium status preflight check
15:23:03 STEP: Performing Cilium service preflight check
15:23:03 STEP: Performing K8s service preflight check
15:23:03 STEP: Performing Cilium controllers preflight check
15:23:03 STEP: Performing Cilium status preflight check
15:23:03 STEP: Performing Cilium health check
15:23:05 STEP: Performing Cilium service preflight check
15:23:05 STEP: Performing K8s service preflight check
15:23:05 STEP: Performing Cilium controllers preflight check
15:23:05 STEP: Performing Cilium health check
15:23:05 STEP: Performing Cilium status preflight check
15:23:08 STEP: Performing Cilium service preflight check
15:23:08 STEP: Performing K8s service preflight check
15:23:08 STEP: Performing Cilium controllers preflight check
15:23:08 STEP: Performing Cilium status preflight check
15:23:08 STEP: Performing Cilium health check
15:23:10 STEP: Performing Cilium service preflight check
15:23:10 STEP: Performing K8s service preflight check
15:23:10 STEP: Performing Cilium controllers preflight check
15:23:10 STEP: Performing Cilium status preflight check
15:23:10 STEP: Performing Cilium health check
15:23:11 STEP: Performing Cilium service preflight check
15:23:11 STEP: Performing K8s service preflight check
15:23:11 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-mxkd7': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1
	 

15:23:11 STEP: Performing Cilium controllers preflight check
15:23:11 STEP: Performing Cilium health check
15:23:11 STEP: Performing Cilium status preflight check
15:23:13 STEP: Performing Cilium service preflight check
15:23:13 STEP: Performing K8s service preflight check
15:23:13 STEP: Performing Cilium controllers preflight check
15:23:13 STEP: Performing Cilium health check
15:23:13 STEP: Performing Cilium status preflight check
15:23:16 STEP: Performing Cilium service preflight check
15:23:16 STEP: Performing K8s service preflight check
15:23:16 STEP: Performing Cilium status preflight check
15:23:16 STEP: Performing Cilium controllers preflight check
15:23:16 STEP: Performing Cilium health check
15:23:18 STEP: Performing Cilium service preflight check
15:23:18 STEP: Performing K8s service preflight check
15:23:19 STEP: Waiting for cilium-operator to be ready
15:23:19 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator")
15:23:19 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator") => <nil>
15:23:19 STEP: Applying demo manifest
15:23:19 STEP: WaitforPods(namespace="default", filter="-l zgroup=testapp")
15:23:29 STEP: WaitforPods(namespace="default", filter="-l zgroup=testapp") => <nil>
FAIL: Error reaching kube-dns before test: error looking up kube-dns.kube-system.svc.cluster.local from default/app2-5cc5d58844-nv6wr: ;; connection timed out; no servers could be reached

command terminated with exit code 1

Expected
    <*errors.errorString | 0xc0024981b0>: {
        s: "error looking up kube-dns.kube-system.svc.cluster.local from default/app2-5cc5d58844-nv6wr: ;; connection timed out; no servers could be reached\n\ncommand terminated with exit code 1\n",
    }
to be nil
===================== TEST FAILED =====================
15:27:29 STEP: Running AfterFailed block for EntireTestsuite K8sFQDNTest
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE           NAME                               READY   STATUS    RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
	 cilium-monitoring   grafana-7fd557d749-qs865           1/1     Running   0          72m     10.0.0.90       k8s2   <none>           <none>
	 cilium-monitoring   prometheus-d87f8f984-7rqc2         1/1     Running   0          72m     10.0.0.2        k8s2   <none>           <none>
	 default             app1-7b6ddb776f-9q5nl              2/2     Running   0          4m13s   10.0.1.84       k8s1   <none>           <none>
	 default             app1-7b6ddb776f-n4vvx              2/2     Running   0          4m13s   10.0.1.52       k8s1   <none>           <none>
	 default             app2-5cc5d58844-nv6wr              1/1     Running   0          4m13s   10.0.1.234      k8s1   <none>           <none>
	 default             app3-6c7856c5b5-fs6n9              1/1     Running   0          4m13s   10.0.1.159      k8s1   <none>           <none>
	 kube-system         cilium-mxkd7                       1/1     Running   0          5m45s   192.168.36.11   k8s1   <none>           <none>
	 kube-system         cilium-operator-5f99fccbd8-2dbr9   1/1     Running   0          5m44s   192.168.36.11   k8s1   <none>           <none>
	 kube-system         cilium-operator-5f99fccbd8-dvpg7   1/1     Running   0          5m44s   192.168.36.12   k8s2   <none>           <none>
	 kube-system         cilium-zggcq                       1/1     Running   0          5m44s   192.168.36.12   k8s2   <none>           <none>
	 kube-system         coredns-767d4c6dd7-jsfcl           1/1     Running   0          7m6s    10.0.0.251      k8s2   <none>           <none>
	 kube-system         etcd-k8s1                          1/1     Running   0          75m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         kube-apiserver-k8s1                1/1     Running   0          75m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         kube-controller-manager-k8s1       1/1     Running   0          75m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         kube-proxy-2tcqm                   1/1     Running   0          75m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         kube-proxy-dx6pr                   1/1     Running   0          73m     192.168.36.12   k8s2   <none>           <none>
	 kube-system         kube-scheduler-k8s1                1/1     Running   0          75m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         log-gatherer-kcfvr                 1/1     Running   0          72m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         log-gatherer-qcb6s                 1/1     Running   0          72m     192.168.36.12   k8s2   <none>           <none>
	 kube-system         registry-adder-26hfp               1/1     Running   0          72m     192.168.36.11   k8s1   <none>           <none>
	 kube-system         registry-adder-ldqph               1/1     Running   0          72m     192.168.36.12   k8s2   <none>           <none>
	 
Stderr:
 	 

Fetching command output from pods [cilium-mxkd7 cilium-zggcq]
cmd: kubectl exec -n kube-system cilium-mxkd7 -- cilium service list
Exitcode: 0 
Stdout:
 	 ID   Frontend             Service Type   Backend                   
	 1    10.111.116.89:9090   ClusterIP      1 => 10.0.0.2:9090        
	 2    10.96.66.132:80      ClusterIP      1 => 10.0.1.84:80         
	                                          2 => 10.0.1.52:80         
	 3    10.96.0.1:443        ClusterIP      1 => 192.168.36.11:6443   
	 4    10.96.0.10:53        ClusterIP      1 => 10.0.0.251:53        
	 5    10.96.0.10:9153      ClusterIP      1 => 10.0.0.251:9153      
	 6    10.106.214.25:3000   ClusterIP      1 => 10.0.0.90:3000       
	 7    10.96.66.132:69      ClusterIP      1 => 10.0.1.84:69         
	                                          2 => 10.0.1.52:69         
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-mxkd7 -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                            IPv6        IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                  
	 880        Disabled           Disabled          13212      k8s:appSecond=true                                     fd02::1af   10.0.1.234   ready   
	                                                            k8s:id=app2                                                                             
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app2-account                                    
	                                                            k8s:io.kubernetes.pod.namespace=default                                                 
	                                                            k8s:zgroup=testapp                                                                      
	 1129       Disabled           Disabled          2894       k8s:id=app1                                            fd02::1d3   10.0.1.52    ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app1-account                                    
	                                                            k8s:io.kubernetes.pod.namespace=default                                                 
	                                                            k8s:zgroup=testapp                                                                      
	 1749       Disabled           Disabled          4          reserved:health                                        fd02::157   10.0.1.221   ready   
	 1868       Disabled           Disabled          2894       k8s:id=app1                                            fd02::16a   10.0.1.84    ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app1-account                                    
	                                                            k8s:io.kubernetes.pod.namespace=default                                                 
	                                                            k8s:zgroup=testapp                                                                      
	 2876       Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s1                                                      ready   
	                                                            k8s:node-role.kubernetes.io/master                                                      
	                                                            reserved:host                                                                           
	 3081       Disabled           Disabled          48485      k8s:id=app3                                            fd02::19a   10.0.1.159   ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                         
	                                                            k8s:io.kubernetes.pod.namespace=default                                                 
	                                                            k8s:zgroup=testapp                                                                      
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-zggcq -- cilium service list
Exitcode: 0 
Stdout:
 	 ID   Frontend             Service Type   Backend                   
	 1    10.96.0.10:53        ClusterIP      1 => 10.0.0.251:53        
	 2    10.96.0.10:9153      ClusterIP      1 => 10.0.0.251:9153      
	 3    10.106.214.25:3000   ClusterIP      1 => 10.0.0.90:3000       
	 4    10.111.116.89:9090   ClusterIP      1 => 10.0.0.2:9090        
	 5    10.96.66.132:69      ClusterIP      1 => 10.0.1.84:69         
	                                          2 => 10.0.1.52:69         
	 6    10.96.0.1:443        ClusterIP      1 => 192.168.36.11:6443   
	 7    10.96.66.132:80      ClusterIP      1 => 10.0.1.84:80         
	                                          2 => 10.0.1.52:80         
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-zggcq -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                              IPv6       IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                   
	 27         Disabled           Disabled          2560       k8s:app=grafana                                          fd02::1    10.0.0.90    ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                 
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                          
	                                                            k8s:io.kubernetes.pod.namespace=cilium-monitoring                                        
	 112        Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s2                                                       ready   
	                                                            reserved:host                                                                            
	 2245       Disabled           Disabled          4          reserved:health                                          fd02::81   10.0.0.201   ready   
	 2411       Disabled           Disabled          471        k8s:app=prometheus                                       fd02::7a   10.0.0.2     ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                 
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=prometheus-k8s                                   
	                                                            k8s:io.kubernetes.pod.namespace=cilium-monitoring                                        
	 3246       Disabled           Disabled          28075      k8s:io.cilium.k8s.policy.cluster=default                 fd02::51   10.0.0.251   ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=coredns                                          
	                                                            k8s:io.kubernetes.pod.namespace=kube-system                                              
	                                                            k8s:k8s-app=kube-dns                                                                     
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================
15:27:50 STEP: Running AfterEach for block EntireTestsuite K8sFQDNTest
15:27:50 STEP: Running AfterEach for block EntireTestsuite

The text was updated successfully, but these errors were encountered:

pchaigno · 2021-06-30T21:46:06Z

Symptoms

All DNS connections from default/app2-5cc5d58844-nv6wr seem to be hanging. We don't see the requests in the CoreDNS logs, so it probably wasn't received.

We can trace one request for example. On the source node:

$ cat ~/Downloads/tmp/cilium-mxkd7-hubble_observe.json | ./hubble observe --from-port 55145 --to-port 53 
Jun 28 15:27:21.649: default/app2-5cc5d58844-nv6wr:55145 <> kube-system/coredns-767d4c6dd7-jsfcl:53 to-overlay FORWARDED (UDP)
Jun 28 15:27:26.651: default/app2-5cc5d58844-nv6wr:55145 <> kube-system/coredns-767d4c6dd7-jsfcl:53 to-overlay FORWARDED (UDP)

On the destination node:

$ cat ~/Downloads/tmp/cilium-zggcq-hubble_observe.json | ./hubble observe --from-port 55145 --to-port 53
Jun 28 15:27:21.656: default/app2-5cc5d58844-nv6wr:55145 -> kube-system/coredns-767d4c6dd7-jsfcl:53 to-endpoint FORWARDED (UDP)
Jun 28 15:27:26.658: default/app2-5cc5d58844-nv6wr:55145 -> kube-system/coredns-767d4c6dd7-jsfcl:53 to-endpoint FORWARDED (UDP)

Monitor aggregation is enabled, so only to-xxx traces are collected.

Datapath Analysis

Endpoint routes are disabled, so these to-endpoint traces were emitted from the cilium_vxlan device after a tail call from bpf_overlay to bpf_lxc. Then, the packet should be redirected to the lxc device and enter the container. Let's confirm that.

We can first get the lxc device for the destination CoreDNS container:

$ jq '.[].status | select(."external-identifiers"."k8s-pod-name" == "coredns-767d4c6dd7-jsfcl").networking' cilium-zggcq-endpoint_list.txt
{
  "addressing": [
    {
      "ipv4": "10.0.0.251",
      "ipv6": "fd02::51"
    }
  ],
  "host-mac": "9a:b7:ea:b0:e5:82",
  "interface-index": 207,
  "interface-name": "lxc3e1dbd546d93",
  "mac": "8a:52:f4:b8:28:b8"
}

We can then check the BPF programs attached to the node:

$ cat bugtool-cilium-zggcq/cmd/bpftool-net-show.md 
Error: Netlink error reporting not supported
xdp:

tc:
cilium_net(9) clsact/ingress bpf_host_cilium_net.o:[to-host]
cilium_host(10) clsact/ingress bpf_host.o:[to-host]
cilium_host(10) clsact/egress bpf_host.o:[from-host]
lxc899cabc800bc(17) clsact/ingress bpf_lxc.o:[from-container]
lxcbea3fce3587e(19) clsact/ingress bpf_lxc.o:[from-container]
lxc3e1dbd546d93(207) clsact/ingress bpf_lxc.o:[from-container]
lxc3e1dbd546d93(207) clsact/egress bpf_lxc.o:[to-container]
cilium_vxlan(212) clsact/ingress bpf_overlay.o:[from-overlay]
cilium_vxlan(212) clsact/egress bpf_overlay.o:[to-overlay]
lxc_health(214) clsact/ingress bpf_lxc.o:[from-container]

flow_dissector:

Here, we see that, contrary to other containers, the CoreDNS pod actually has two BPF programs attached, one at ingress and one at egress. That should be the case only when endpoint routes are enabled. Similarly, in the routes, we can see that it's the only endpoint with a route:

$ grep lxc3e1dbd546d93 bugtool-cilium-zggcq/cmd/ip--4-r.md 
10.0.0.251 dev lxc scope link

Therefore the DNS packet is sent to the stack. It flows through netfilter and hits the FORWARD-filter table:

Chain FORWARD (policy DROP 84 packets, 8322 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2162 3076K CILIUM_FORWARD  all  --  any    any     anywhere             anywhere             /* cilium-feeder: CILIUM_FORWARD */
   96 13716 KUBE-FORWARD  all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */
   48  5280 KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */
[...]
Chain CILIUM_FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination         
 1204 2923K ACCEPT     all  --  any    cilium_host  anywhere             anywhere             /* cilium: any->cluster on cilium_host forward accept */
    0     0 ACCEPT     all  --  cilium_host any     anywhere             anywhere             /* cilium: cluster->any on cilium_host forward accept (nodeport) */
  864  140K ACCEPT     all  --  lxc+   any     anywhere             anywhere             /* cilium: cluster->any on lxc+ forward accept */
    0     0 ACCEPT     all  --  cilium_net any     anywhere             anywhere             /* cilium: cluster->any on cilium_net forward accept (nodeport) */

Since endpoint routes are disabled in the agent, rules installed in CILIUM_FORWARD don't match the packet because they assume it will come out of cilium_host. As per FORWARD's default policy, the packet is dropped.

Root Cause

This is actually a known limitation of Cilium since #16227 (changing the status of endpoint routes on an existing Cilium installation is not supported, even though we do it in CI). If endpoint routes are enabled/disabled in the agent, the setting is not reflected in existing endpoints (including the CoreDNS endpoint in our case). We usually work around it by deleting existing pods so that their routes are reinstalled from scratch. That would be a short solution here.

pchaigno · 2021-06-30T21:52:39Z

This would only happen in our 4.9 CI job because of the following condition:

cilium/bpf/bpf_lxc.c

Lines 1149 to 1156 in e6f34c3

    
           #if !defined(ENABLE_ROUTING) && defined(TUNNEL_MODE) && !defined(ENABLE_NODEPORT) 
        
           	/* See comment in IPv4 path. */ 
        
           	ctx_change_type(ctx, PACKET_HOST); 
        
           #else 
        
           	ifindex = ctx_load_meta(ctx, CB_IFINDEX); 
        
           	if (ifindex) 
        
           		return redirect_ep(ctx, ifindex, from_host); 
        
           #endif /* ENABLE_ROUTING && TUNNEL_MODE && !ENABLE_NODEPORT */

So if ENABLE_NODEPORT is defined (only undefined on 4.9), we redirect the packet to the lxc device instead of passing to the stack and we therefore skip the FORWARD-filter table.

joestringer · 2021-06-30T22:05:09Z

I believe that K8sDatapathConfig is the only suite which may switch up endpoint routes mode on/off, and in this particular failure case the K8sDatapathConfig tests ran immediately prior to K8sFQDN. I wonder if we're just not cleaning up the environment properly enough in the AfterAll / BeforeAll steps of one of these two contexts. This could explain why we don't see the failure more often - it requires particular groups of tests to be run in a particular order.

pchaigno · 2021-06-30T22:14:21Z

This could explain why we don't see the failure more often - it requires particular groups of tests to be run in a particular order.

Yep, but checking if DNS resolves is the first thing we do after any Cilium deployment AFAIK. So any test without endpoint routes running after a test with endpoints routes should fail.

joestringer · 2021-06-30T22:24:23Z

Seems like that should be established as part of this path:

cilium/test/k8sT/fqdn.go

Line 76 in eb9a5c4

DeployCiliumAndDNS(kubectl, ciliumFilename)

...

cilium/test/k8sT/assertionHelpers.go

Line 177 in eb9a5c4

vm.RedeployKubernetesDnsIfNecessary()

...

cilium/test/helpers/kubectl.go

Line 2048 in eb9a5c4

err := kub.ValidateKubernetesDNS()

...

cilium/test/helpers/kubectl.go

Line 1973 in eb9a5c4

if err := kub.KubernetesDNSCanResolve("default", "kubernetes"); err != nil {

However the failing line is later than this, so whatever check we did above was functionally different from the actual DNS lookup.

cilium/test/k8sT/fqdn.go

Line 89 in eb9a5c4

Expect(err).Should(BeNil(), "Error reaching kube-dns before test: %s", err)

EDIT: Yep, we validate DNS from one of the hosts, not from pods:

cilium/test/helpers/kubectl.go

Lines 1747 to 1748 in eb9a5c4

    
           cmd := fmt.Sprintf("dig +short %s @%s", serviceToResolve, kubeDnsService.Spec.ClusterIP) 
        
           res := kub.ExecInFirstPod(ctx, LogGathererNamespace, logGathererSelector(false), cmd)

cilium/test/k8sT/manifests/log-gatherer.yaml

Line 49 in eb9a5c4

hostNetwork: true

I don't know off-hand how different host DNS resolution is but it may provide some hints here.

pchaigno · 2021-06-30T22:38:42Z

I don't know off-hand how different host DNS resolution is but it may provide some hints here.

Hm. DNS resolution from a hostns pod should be the same as long as it's on a different node than the CoreDNS pod. Once the request reaches the destination node via the tunnel it's basically indistinguishable from a request from a pod (except from policy point of view, but we're not concerned with that here).

joestringer · 2021-06-30T23:38:22Z

I've looked into the following failures and also observed that there are per-endpoint routes for the DNS pod:

https://jenkins.cilium.io/job/cilium-master-k8s-1.21-kernel-4.9/544/testReport/junit/Suite-k8s-1/21/K8sDemosTest_Tests_Star_Wars_Demo/
https://jenkins.cilium.io/job/cilium-master-k8s-1.17-kernel-4.9/129/testReport/junit/Suite-k8s-1/17/K8sDemosTest_Tests_Star_Wars_Demo/
https://jenkins.cilium.io/job/cilium-master-k8s-1.21-kernel-4.9/546/testReport/junit/Suite-k8s-1/21/K8sDemosTest_Tests_Star_Wars_Demo/
https://jenkins.cilium.io/job/cilium-master-k8s-1.21-kernel-4.9/539/testReport/junit/Suite-k8s-1/21/K8sKafkaPolicyTest_Kafka_Policy_Tests_KafkaPolicies/

Here's a one-liner I've been using to establish which endpoints are configured with endpoint-routes mode in a CI sysdump:

$ for dir in $(find test_results/*/*/bugtool-cilium-*/cmd/state/[0-9]* -type d); do base64_decode $(grep BASE64 $dir/ep_config.h) | gron | grep -i -e req -e PodName | norg; done
{"K8sPodName":"xwing-6f56868789-gddzk"}
{"K8sPodName":"spaceship-6567c9b4bd-6gmg6"}
{"K8sPodName":"xwing-6f56868789-h8zzz"}
{"K8sPodName":""}
{"K8sPodName":""}
{"K8sPodName":"deathstar-595989bc5b-g92b4"}
{"K8sPodName":"xwing-6f56868789-cnvmc"}
{"K8sPodName":"spaceship-6567c9b4bd-9hm98"}
{"K8sPodName":"spaceship-6567c9b4bd-btd2x"}
{"K8sPodName":""}
{"K8sPodName":"deathstar-595989bc5b-l6w9h"}
{"K8sPodName":"spaceship-6567c9b4bd-9cxzj"}
{"DatapathConfiguration":{"require-egress-prog":true,"require-routing":false},"K8sPodName":"coredns-755cd654d4-6658j"}
{"K8sPodName":"prometheus-655fb888d7-dv5z4"}
{"K8sPodName":"deathstar-595989bc5b-phz25"}
{"K8sPodName":""}
{"K8sPodName":"grafana-5747bcc8f9-zqjbv"}

EDIT: Oh and here's some useful pointers to get the repro above to work:

https://github.com/tomnomnom/gron

base64_decode()                                                                 
{                                                                               
    echo "$@" | sed -e 's/^.*://' | base64 -di | jq '.'                         
}

In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io>

pchaigno · 2021-07-12T10:34:45Z

The fix at #16767 was not sufficient; #16835 should fix it.

Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: cilium#16717 Signed-off-by: Joe Stringer <joe@cilium.io>

Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit a0e7712 ] In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue cilium#16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit c18cfc8 ] Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: André Martins <andre@cilium.io>

[ upstream commit a0e7712 ] In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit c18cfc8 ] Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: André Martins <andre@cilium.io>

Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: cilium#16717 Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit a0e7712 ] In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit c18cfc8 ] Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit a0e7712 ] In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit c18cfc8 ] Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

pchaigno added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jun 30, 2021

pchaigno self-assigned this Jun 30, 2021

joestringer mentioned this issue Jun 30, 2021

CI: Suite-k8s-1.21.K8sFQDNTest Validate that multiple specs are working correctly #16659

Closed

pchaigno mentioned this issue Jul 1, 2021

CI: K8sFQDNTest Restart Cilium validate that FQDN is still working #16723

Closed

joestringer mentioned this issue Jul 1, 2021

Perform reverse NAT at host interface #15354

Merged

pchaigno removed their assignment Jul 2, 2021

joestringer mentioned this issue Jul 2, 2021

Enable per-endpoint routes by default #14955

Open

12 tasks

joestringer self-assigned this Jul 2, 2021

joestringer mentioned this issue Jul 2, 2021

test: Redeploy DNS after endpointRoutes reconfiguration #16767

Merged

aanm mentioned this issue Jul 3, 2021

Revert "policy: Make selectorcache callbacks lock-free" #16769

Merged

This was referenced Jul 6, 2021

k8s-1.13-net-next CI: DNS entry is not ready after timeout #14598

Closed

v1.10 backports 2021-07-05 #16774

Merged

joestringer closed this as completed in #16767 Jul 6, 2021

joestringer mentioned this issue Jul 8, 2021

test: Delete DNS pods in AfterAll for datapath tests #16835

Merged

pchaigno reopened this Jul 12, 2021

pchaigno mentioned this issue Jul 12, 2021

CI: K8sBookInfoDemoTest Bookinfo Demo Tests bookinfo demo: exit status 4 #16632

Closed

kkourt closed this as completed in #16835 Jul 13, 2021

pchaigno mentioned this issue Jul 13, 2021

test: Debug kubectl.GetPrivateIface failure #16863

Merged

pchaigno self-assigned this Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: K8sFQDNTest Restart Cilium validate that FQDN is still working: Error reaching kube-dns before test #16717

CI: K8sFQDNTest Restart Cilium validate that FQDN is still working: Error reaching kube-dns before test #16717

pchaigno commented Jun 30, 2021

pchaigno commented Jun 30, 2021 •

edited

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021 •

edited

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021 •

edited

pchaigno commented Jul 12, 2021

CI: K8sFQDNTest Restart Cilium validate that FQDN is still working: Error reaching kube-dns before test #16717

CI: K8sFQDNTest Restart Cilium validate that FQDN is still working: Error reaching kube-dns before test #16717

Comments

pchaigno commented Jun 30, 2021

Stacktrace

Standard Output

Standard Error

pchaigno commented Jun 30, 2021 • edited

Symptoms

Datapath Analysis

Root Cause

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021 • edited

pchaigno commented Jun 30, 2021

joestringer commented Jun 30, 2021 • edited

pchaigno commented Jul 12, 2021

pchaigno commented Jun 30, 2021 •

edited

joestringer commented Jun 30, 2021 •

edited

joestringer commented Jun 30, 2021 •

edited