Connectivity issues in Azure #12113

errordeveloper · 2020-06-16T18:10:20Z

There is something wrong with DNS in Azure, not very clear what it is yet - more details to follow.

One way it manifest itself is that pods deployed in kube-system, such as Hubble UI, fail to resolve $KUBERNETES_SERVICE_HOST. It turns out that in AKS the value of KUBERNETES_SERVICE_HOST gets set to something like ilya-test--ilya-test-1-da2a1f-9923c925.hcp.westeurope.azmk8s.io for pods in kube-system, and more traditional service IP in all other namespaces.

Quite crucially, it appears that quite a few of connectivity test pods are not reaching ready state at all:

echo-a-558b9b6dc4-pmsh8                                  1/1     Running            0          5h42m
echo-b-59d5ff8b98-r4hx8                                  1/1     Running            0          5h42m
echo-b-host-f4bd98474-rbpgz                              1/1     Running            0          5h42m
host-to-b-multi-node-clusterip-7bb8b4f964-qgsf6          1/1     Running            52         5h42m
host-to-b-multi-node-headless-5c5676647b-56xbt           1/1     Running            50         5h42m
pod-to-a-646cccc5df-t8blr                                1/1     Running            101        5h42m
pod-to-a-allowed-cnp-56f4cfd999-fnppn                    0/1     CrashLoopBackOff   99         5h42m
pod-to-a-external-1111-7c5c99c6d9-gnmfk                  1/1     Running            0          5h42m
pod-to-a-l3-denied-cnp-5dc8d69b7f-q4nvb                  1/1     Running            0          5h42m
pod-to-b-intra-node-b9454c7c6-sc9lq                      0/1     CrashLoopBackOff   99         5h42m
pod-to-b-intra-node-nodeport-6cc56666dc-tmqt9            0/1     CrashLoopBackOff   100        5h42m
pod-to-b-multi-node-clusterip-754d5ff9d-9gzwg            0/1     CrashLoopBackOff   99         5h42m
pod-to-b-multi-node-headless-7876749b84-sz4zz            1/1     Running            46         5h42m
pod-to-b-multi-node-nodeport-6d8fc65c99-ld8hv            0/1     CrashLoopBackOff   99         5h42m
pod-to-external-fqdn-allow-google-cnp-6478db9cd9-d74xk   0/1     CrashLoopBackOff   99         5h42m

The text was updated successfully, but these errors were encountered:

errordeveloper · 2020-06-16T18:12:18Z

I think is could be very much related to #11428, but the original report was concerning only pod-to-external-fqdn-allow-google-cnp, so I think there is more going on.

errordeveloper · 2020-06-16T18:16:21Z

I'm using Hubble UI as a test right now, and I tried to relocate the deployment to default namespace, to eliminate DNS issue. Now I'm seeing this in the logs:

{"name":"frontend","hostname":"hubble-ui-7d4fb6fb6c-n4cm7","pid":18,"req_id":"8247f0ed-f585-4c78-9642-2059227c2a03","user":"admin@localhost","level":50,"err":{"message":"Can't fetch namespaces via k8s api: Error: connect ETIMEDOUT 10.0.0.1:443","locations":[{"line":4,"column":7}],"path":["viewer","clusters"],"extensions":{"code":"INTERNAL_SERVER_ERROR"}},"msg":"","time":"2020-06-16T18:10:04.386Z","v":0}

So somehow there is also an API server connectivity issue also.

errordeveloper · 2020-06-16T18:40:19Z

I was not able to kubectl run a container image and install all the tools needed, as package repo connectivity was broken. I created an image of my own for this.

$ kubectl run -ti --image=errordeveloper/alpine-net-debug test-1 -n kube-system -- sh -l
If you don't see a command prompt, try pressing enter.
test-1:/# ping -c2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
From 172.17.0.1 icmp_seq=1 Destination Host Unreachable
From 172.17.0.1 icmp_seq=2 Destination Host Unreachable

--- 1.1.1.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1030ms
pipe 2
test-1:/# ping -c2 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 172.17.0.1 icmp_seq=1 Destination Host Unreachable
From 172.17.0.1 icmp_seq=2 Destination Host Unreachable

--- 8.8.8.8 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1002ms

test-1:/# cat /etc/resolv.conf 
nameserver 10.0.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local xdbzucfb4y2ubdbavguxct3msh.ax.internal.cloudapp.net
options ndots:5
test-1:/# dig google.com

test-1:/# dig +short google.com
test-1:/# dig +short google.com
;; connection timed out; no servers could be reached
test-1:/# dig +short azure.microsoft.com
;; connection timed out; no servers could be reached
test-1:/# echo $KUBERNETES_SERVICE_HOST
ilya-test--ilya-test-1-da2a1f-9923c925.hcp.westeurope.azmk8s.io
test-1:/# dig +short $KUBERNETES_SERVICE_HOST
test-1:/# dig +short $KUBERNETES_SERVICE_HOST
test-1:/# dig +short kubernetes.default
test-1:/# curl https://10.0.0.1 # Kubernetes service IP
curl: (28) Failed to connect to 10.0.0.1 port 443: Operation timed out
test-1:/# ping -c2 10.240.0.64 # CoreDNS pod IP on another node
PING 10.240.0.64 (10.240.0.64) 56(84) bytes of data.
From 172.17.0.1 icmp_seq=1 Destination Host Unreachable
From 172.17.0.1 icmp_seq=2 Destination Host Unreachable

--- 10.240.0.64 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1013ms
pipe 2
test-1:/# ping -c2 10.240.0.76  # CoreDNS pod IP on the same node as test pod
PING 10.240.0.76 (10.240.0.76) 56(84) bytes of data.
64 bytes from 10.240.0.76: icmp_seq=1 ttl=63 time=0.244 ms
64 bytes from 10.240.0.76: icmp_seq=2 ttl=63 time=0.071 ms

--- 10.240.0.76 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.071/0.157/0.244/0.086 ms
test-1:/# ping -c2 10.240.0.4 # private node IP of a remote node
PING 10.240.0.4 (10.240.0.4) 56(84) bytes of data.
From 172.17.0.1 icmp_seq=1 Destination Host Unreachable
From 172.17.0.1 icmp_seq=2 Destination Host Unreachable

--- 10.240.0.4 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1001ms

test-1:/# ping -c2 10.240.0.35 # private node IP of a remote node
PING 10.240.0.35 (10.240.0.35) 56(84) bytes of data.
From 172.17.0.1 icmp_seq=1 Destination Host Unreachable
From 172.17.0.1 icmp_seq=2 Destination Host Unreachable

--- 10.240.0.35 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1012ms
pipe 2
test-1:/# ping -c2 10.240.0.66 # private node IP of the local node
PING 10.240.0.66 (10.240.0.66) 56(84) bytes of data.
64 bytes from 10.240.0.66: icmp_seq=1 ttl=64 time=0.120 ms
64 bytes from 10.240.0.66: icmp_seq=2 ttl=64 time=0.080 ms

--- 10.240.0.66 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.080/0.100/0.120/0.020 ms
test-1:/#

So it looks like only pods on the same nodes are reachable.

Here is what cilium-health report:

$ kubectl exec -ti -n kube-system cilium-q99b5 -- cilium-health status --probe
Probe time:   2020-06-16T18:35:54Z
Nodes:
  aks-nodepool1-24840118-vmss000002 (localhost):
    Host connectivity to 10.240.0.66:
      ICMP to stack:   OK, RTT=1.115504ms
      HTTP to agent:   OK, RTT=183.4µs
    Endpoint connectivity to 10.240.0.95:
      ICMP to stack:   OK, RTT=313.001µs
      HTTP to agent:   OK, RTT=285.101µs
  aks-nodepool1-24840118-vmss000000:
    Host connectivity to 10.240.0.4:
      ICMP to stack:   OK, RTT=1.160904ms
      HTTP to agent:   OK, RTT=970.703µs
    Endpoint connectivity to 10.240.0.15:
      ICMP to stack:   OK, RTT=1.543006ms
      HTTP to agent:   OK, RTT=865.104µs
  aks-nodepool1-24840118-vmss000001:
    Host connectivity to 10.240.0.35:
      ICMP to stack:   OK, RTT=1.410105ms
      HTTP to agent:   OK, RTT=705.902µs
    Endpoint connectivity to 10.240.0.38:
      ICMP to stack:   OK, RTT=1.494305ms
      HTTP to agent:   OK, RTT=1.069204ms
$ kubectl exec -ti -n kube-system cilium-8q27l -- cilium-health status --probe
Probe time:   2020-06-16T18:36:25Z
Nodes:
  aks-nodepool1-24840118-vmss000000 (localhost):
    Host connectivity to 10.240.0.4:
      ICMP to stack:   OK, RTT=639.104µs
      HTTP to agent:   OK, RTT=496.804µs
    Endpoint connectivity to 10.240.0.15:
      ICMP to stack:   OK, RTT=592.404µs
      HTTP to agent:   OK, RTT=609.505µs
  aks-nodepool1-24840118-vmss000001:
    Host connectivity to 10.240.0.35:
      ICMP to stack:   OK, RTT=1.794313ms
      HTTP to agent:   OK, RTT=853.406µs
    Endpoint connectivity to 10.240.0.38:
      ICMP to stack:   OK, RTT=1.823113ms
      HTTP to agent:   OK, RTT=1.077109ms
  aks-nodepool1-24840118-vmss000002:
    Host connectivity to 10.240.0.66:
      ICMP to stack:   OK, RTT=1.721013ms
      HTTP to agent:   OK, RTT=646.805µs
    Endpoint connectivity to 10.240.0.95:
      ICMP to stack:   OK, RTT=1.953914ms
      HTTP to agent:   OK, RTT=904.207µs
$ kubectl exec -ti -n kube-system cilium-ttb5g -- cilium-health status --probe
Probe time:   2020-06-16T18:36:37Z
Nodes:
  aks-nodepool1-24840118-vmss000001 (localhost):
    Host connectivity to 10.240.0.35:
      ICMP to stack:   OK, RTT=254.501µs
      HTTP to agent:   OK, RTT=245.801µs
    Endpoint connectivity to 10.240.0.38:
      ICMP to stack:   OK, RTT=283.601µs
      HTTP to agent:   OK, RTT=270.401µs
  aks-nodepool1-24840118-vmss000000:
    Host connectivity to 10.240.0.4:
      ICMP to stack:   OK, RTT=1.142304ms
      HTTP to agent:   OK, RTT=1.010304ms
    Endpoint connectivity to 10.240.0.15:
      ICMP to stack:   OK, RTT=1.173504ms
      HTTP to agent:   OK, RTT=1.225004ms
  aks-nodepool1-24840118-vmss000002:
    Host connectivity to 10.240.0.66:
      ICMP to stack:   OK, RTT=1.097704ms
      HTTP to agent:   OK, RTT=610.002µs
    Endpoint connectivity to 10.240.0.95:
      ICMP to stack:   OK, RTT=1.152604ms
      HTTP to agent:   OK, RTT=977.904µs
$

errordeveloper · 2020-06-16T18:48:36Z

Here is the sysdump: cilium-sysdump-20200616-194701.zip

More general info:

I was working on validating Azure Cloud guide
I have built the Cilium image from master due to cni: Only require ipam.Cidrs when masquerade is enabled #11978 and cilium monitor broken on AKS 1.15 (engine 0.47) #12070 (image is docker.io/errordeveloper/cilium:17aaff80d3d)
The AKS nodes are based on Ubuntu 16.04 with kernel version is 4.15
Kubernetes version is 1.15.11

christarazi · 2020-06-17T05:47:06Z

Deployed another cluster to reproduce the issue @errordeveloper was observing. I ran the following command to see each node's routing table:

(Note the Azure CNI DS has not been touched).

❯ ./contrib/k8s/k8s-cilium-exec.sh bash -c "ip r show table main && hostname && echo"
default via 10.240.0.1 dev eth0
10.240.0.0/16 dev eth0 proto kernel scope link src 10.240.0.35
10.240.0.37 dev lxcdf3c1a24b218 scope link
10.240.0.45 dev lxcf8fa2a919766 scope link
10.240.0.55 dev lxc87bf1643715f scope link
10.240.0.56 dev lxc_health scope link
10.240.0.62 dev lxce6ef6a5e72b3 scope link
10.240.0.65 dev lxc6a0641eb0d0d scope link
168.63.129.16 via 10.240.0.1 dev eth0
169.254.169.254 via 10.240.0.1 dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
aks-nodepool1-26249792-vmss000001

default via 10.240.0.1 dev eth0
10.240.0.0/16 dev eth0 proto kernel scope link src 10.240.0.4
10.240.0.6 dev lxc_health scope link
10.240.0.8 dev lxc97df74ed6c39 scope link
10.240.0.10 dev lxc077cf2fcd176 scope link
10.240.0.11 dev lxca8b7289fe1f6 scope link
10.240.0.22 dev lxc1e0b3888a2ed scope link
10.240.0.33 dev lxc458594a67dd7 scope link
168.63.129.16 via 10.240.0.1 dev eth0
169.254.169.254 via 10.240.0.1 dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
aks-nodepool1-26249792-vmss000000

Unable to use a TTY - input is not a terminal or the right kind of file
default via 10.240.0.1 dev azure0
10.240.0.0/16 dev azure0 proto kernel scope link src 10.240.0.66
10.240.0.71 dev lxc165937946b54 scope link
10.240.0.73 dev lxc9ac55f96650f scope link
10.240.0.75 dev lxc3482a9784d06 scope link
10.240.0.81 dev lxc_health scope link
10.240.0.85 dev lxcc2aca6fe1928 scope link
10.240.0.88 dev lxc650b49ced0b8 scope link
10.240.0.92 dev lxcb5057f5a759e scope link
10.240.0.96 dev lxcc31671d0163a scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
aks-nodepool1-26249792-vmss000002

As you can see there are 3 nodes in this cluster. Only one of them has the azure0 device--it is the ... 000002 node, let's call it C. The other nodes are A and B.

You can see in C's routing table, the azure0 device is being used as the default device as well as handling the routing of 10.240.0.0/16. What's interesting about that is, nodes A and B have eth0 handling that CIDR, along with being the device on the default route as well. So something is definitely borked with node C.

Anyway, my goal was to test all comm between all the nodes. And my finding is that pod-to-pod comms between A and B, work fine. Any comm between C and (A|B), ends up in ICMP timing out (this is also where the unreachable host error comes from as well). This is very likely the root cause of the issue. What we still don't know is why node C has azure0 and the others don't. Hopefully that's helpful.

errordeveloper · 2020-06-17T08:13:09Z

The docs weren't marked as beta, I added that in #12108. I've spoken to @tgraf, and since it's beta, it doesn't have to be a release blocker.

jrajahalme · 2020-10-22T21:25:51Z

Still having some connectivity issues. Had to restart unmanaged pods twice to get kubectl exec to cilium pods working. That seemed to fix the pod-to-external-fqdn-allow-google-cnp fail (either no connectivity to DNS, or DNS fail). Remaining issues with pod-to-a-allowed-cnp (verified no policy drops, SYN/ACK never gets back to the source pod) and pod-to-b-multi-node-headless (no policy enforcement, SYN/ACK never gets back to the source pod).

$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                     READY   STATUS             RESTARTS   AGE    IP            NODE                                NOMINATED NODE   READINESS GATES
cilium-test   echo-a-76c5d9bd76-mdhlw                                  1/1     Running            0          28m    10.240.0.62   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   echo-b-795c4b4f76-vd9hv                                  1/1     Running            0          39m    10.240.0.8    aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   echo-b-host-6b7fc94b7c-8hwd5                             1/1     Running            0          65m    10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   host-to-b-multi-node-clusterip-85476cd779-k5r4l          1/1     Running            12         65m    10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   host-to-b-multi-node-headless-dc6c44cb5-wzsgb            1/1     Running            12         65m    10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   pod-to-a-79546bc469-gxdpd                                1/1     Running            0          39m    10.240.0.26   aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   pod-to-a-allowed-cnp-58b7f7fb8f-dlbdh                    0/1     CrashLoopBackOff   13         38m    10.240.0.17   aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   pod-to-a-denied-cnp-6967cb6f7f-4g4fq                     1/1     Running            0          37m    10.240.0.23   aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   pod-to-b-intra-node-nodeport-9b487cf89-mh5rx             1/1     Running            0          37m    10.240.0.11   aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   pod-to-b-multi-node-clusterip-7db5dfdcf7-4q2zc           1/1     Running            0          36m    10.240.0.55   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   pod-to-b-multi-node-headless-7d44b85d69-z5ldz            0/1     CrashLoopBackOff   13         36m    10.240.0.47   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   pod-to-b-multi-node-nodeport-7ffc76db7c-vqjcx            1/1     Running            1          36m    10.240.0.52   aks-nodepool1-22410938-vmss000001   <none>           <none>
cilium-test   pod-to-external-1111-d56f47579-hzkq7                     1/1     Running            0          36m    10.240.0.7    aks-nodepool1-22410938-vmss000000   <none>           <none>
cilium-test   pod-to-external-fqdn-allow-google-cnp-78986f4bcf-jc9xc   1/1     Running            0          35m    10.240.0.27   aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   azure-cni-networkmonitor-jxmgn                           1/1     Running            0          164m   10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   azure-cni-networkmonitor-tms59                           1/1     Running            0          165m   10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   azure-ip-masq-agent-2kzj8                                1/1     Running            0          165m   10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   azure-ip-masq-agent-9jtfq                                1/1     Running            0          164m   10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   cilium-node-init-6sds9                                   1/1     Running            0          72m    10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   cilium-node-init-wstkb                                   1/1     Running            0          72m    10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   cilium-operator-6655dcd688-c5qbm                         1/1     Running            0          72m    10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   cilium-operator-6655dcd688-k5ws2                         1/1     Running            0          72m    10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   cilium-wjp44                                             1/1     Running            0          72m    10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   cilium-wspxs                                             1/1     Running            0          72m    10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   coredns-869cb84759-l4nf2                                 1/1     Running            0          35m    10.240.0.34   aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   coredns-869cb84759-nt5s4                                 1/1     Running            0          35m    10.240.0.51   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   coredns-autoscaler-5b867494f-kkjfm                       1/1     Running            0          35m    10.240.0.63   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   dashboard-metrics-scraper-5ddb5bf5c8-f6hbl               1/1     Running            0          35m    10.240.0.43   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   kube-proxy-p2txn                                         1/1     Running            0          164m   10.240.0.35   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   kube-proxy-rcbtt                                         1/1     Running            0          165m   10.240.0.4    aks-nodepool1-22410938-vmss000000   <none>           <none>
kube-system   kubernetes-dashboard-5596bdb9f-g2bj9                     1/1     Running            0          35m    10.240.0.44   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   metrics-server-5f4c878d8-8rxt2                           1/1     Running            0          35m    10.240.0.42   aks-nodepool1-22410938-vmss000001   <none>           <none>
kube-system   tunnelfront-787b4b7fc-gz7kv                              1/1     Running            0          35m    10.240.0.38   aks-nodepool1-22410938-vmss000001   <none>           <none>

ti-mo · 2020-11-18T17:20:43Z

I've seen similar connectivity issues with both CNI chaining as well as pure Cilium Azure IPAM. I've traced the root cause back to the default azure-vnet CNI plugin installing ebtables rules in the host netns. When Azure IPAM is enabled, azure-vnet is taken out of the active CNI chain completely, so CNI DEL events are no longer handled by azure-vnet, resulting in these ebtables rules not being cleaned up when the pods are removed. In my case, I had some dangling routes for the affected addresses pointing to azure0 as well.

Note: the versions of ebtables, ebtables-legacy and/or ebtables-nft (as well as their -save commands) we ship with Cilium are incompatible with the current AKS kernel (4.15). You might need to SSH into the host and run ebtables-save there, or the nat and broute won't show up. Alternatively, ebtables-legacy -L -t nat (and -t broute) could work, but make sure it's the ioctl version, not the one that uses netlink.

Currently gathering all the findings in these issues and trying to repro them, so we can work on a more waterproof solution. Will link into overarching issue.

ti-mo · 2020-11-19T19:53:49Z

Another thing to add here: we might have to add tunnelfront to the list of Pods to recreate post-install, since all apiserver -> kubelet communication seems to pass through this service. I've had trouble running kubectl exec against Pods that were scheduled on nodes other than the one where tunnelfront was scheduled on. The apiserver could only initiate new connections to the node where tunnelfront was running.

dctrwatson · 2020-11-20T06:19:54Z

Also ran into weird connectivity behavior when trying to out the new AKS container image, AKSUbuntu-1804-2020.10.28, which is the new default in 1.18. The kernel it uses is 5.4.0-1031-azure so most (all?) eBPF features are enabled by default.

Once I moved the tunnelfront pod back to a node running AKSUbuntu-1604-2020.09.23 / 4.15.0-1096-azure, kubectl exec/logs worked normally again too.

ti-mo · 2020-11-20T12:25:15Z

Another potential issue I've discovered that leads to unreachable Pods is addressed in #14105. By touching /var/run/azure-vnet.json, we trigger a behavioural change in azure-vnet that leads to static ARP entries not being removed on pod delete.

The leftover ebtables rules that don't get cleaned up is currently isolated to running with Azure IPAM and will be addressed separately.

ti-mo · 2020-11-20T12:33:29Z

Once I moved the tunnelfront pod back to a node running AKSUbuntu-1604-2020.09.23 / 4.15.0-1096-azure, kubectl exec/logs worked normally again too.

@dctrwatson Thanks for the report! Will create a separate issue about this. The failure is likely unrelated to the kernel version it's running on, but rather the leftover state described above. Will troubleshoot this in isolation after some other fixes have gone in, although restarting tunnelfront is required anyway to get visibility into its traffic.

errordeveloper added the area/azure Impacts Azure based IPAM. label Jun 16, 2020

This was referenced Jun 16, 2020

Unable to load UI. Error: getaddrinfo EAI_AGAIN cilium/hubble#180

Closed

Validate guides for 1.8 release #11903

Closed

joestringer assigned errordeveloper and christarazi Jun 16, 2020

This was referenced Nov 20, 2020

Azure: tunnelfront connectivity and traffic visibility issue #14106

Closed

nodeinit: remove nodeinit.expectAzureVnet Helm option and behaviour #14105

Closed

ti-mo mentioned this issue Dec 18, 2020

AKS: Clean up azure-vnet state on Cilium agent start #14452

Merged

gandro closed this as completed in #14452 Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connectivity issues in Azure #12113

Connectivity issues in Azure #12113

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020 •

edited

Loading

errordeveloper commented Jun 16, 2020 •

edited

Loading

christarazi commented Jun 17, 2020 •

edited

Loading

errordeveloper commented Jun 17, 2020

jrajahalme commented Oct 22, 2020

ti-mo commented Nov 18, 2020 •

edited

Loading

ti-mo commented Nov 19, 2020

dctrwatson commented Nov 20, 2020

ti-mo commented Nov 20, 2020

ti-mo commented Nov 20, 2020

Connectivity issues in Azure #12113

Connectivity issues in Azure #12113

Comments

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020

errordeveloper commented Jun 16, 2020 • edited Loading

errordeveloper commented Jun 16, 2020 • edited Loading

christarazi commented Jun 17, 2020 • edited Loading

errordeveloper commented Jun 17, 2020

jrajahalme commented Oct 22, 2020

ti-mo commented Nov 18, 2020 • edited Loading

ti-mo commented Nov 19, 2020

dctrwatson commented Nov 20, 2020

ti-mo commented Nov 20, 2020

ti-mo commented Nov 20, 2020

errordeveloper commented Jun 16, 2020 •

edited

Loading

errordeveloper commented Jun 16, 2020 •

edited

Loading

christarazi commented Jun 17, 2020 •

edited

Loading

ti-mo commented Nov 18, 2020 •

edited

Loading