Master to Pod communication is broken in kube-flannel #535

tamalsaha · 2016-10-22T07:11:54Z

I am trying to setup a Kubernetes cluster using kube-flannel mode using vxlan backend. Node to Node communication is working. But Master to Pod network is not working. I am not a Linux networking expert. I see that master flannel.1 is assigned the network address. This seems to causing issues with arp.

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

# ip route show
default via 159.203.160.1 dev eth0 
10.17.0.0/16 dev eth0  proto kernel  scope link  src 10.17.0.8 
10.132.0.0/16 dev eth1  proto kernel  scope link  src 10.132.22.4 
10.244.0.0/16 dev flannel.1  proto kernel  scope link  src 10.244.0.0
159.203.160.0/20 dev eth0  proto kernel  scope link  src 159.203.168.74 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1

# tcpdump -e -i flannel.1 -n arp
root@k-211935-master:~# tcpdump -e -i flannel.1 -n arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:03:26.552296 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552313 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552326 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.552290 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:28.552307 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.560535 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:29.560472 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:30.560456 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28

The problem seems to be that Master flannel.1 is assigned the first IP of the subnet zero, which is indistinguishable from the network address. Can you please confirm that this will fail Master to Pod communication?

I am thinking about using the next Subnet of Node.Spec.PodCIDR in kubeSubnetManager. Will that fix this issue?

cc: @mikedanese

The text was updated successfully, but these errors were encountered:

tamalsaha · 2016-10-22T09:10:40Z

Things seem to be working after applying https://github.com/appscode/flannel/commit/b083788405ce2bf3c34b9d4df7b5d77afc865b4e

tomdee · 2016-10-26T00:18:59Z

@tamalsaha Can you share some of kubernetes commands you were using to repro this? Where are you pinging to and from, is it from your master node to a pod on a different node?

tamalsaha · 2016-10-26T00:20:11Z

@tomdee, I was pinging from master to a pod running on a different node.

tamalsaha · 2016-10-26T00:31:39Z

@tomdee I just ran a nginx pod, then tried to wget from the master host directly.

kubectl run my-nginx --image=nginx --replicas=2 --port=80
kubectl expose deployment my-nginx --port=80 

wget http://<pod-ip>:80

autostatic · 2016-11-15T12:32:51Z

@tamalsaha, tried your fix but still the pods can't communicate properly with each other:
E1115 12:31:12.646494 1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: connect: network is unreachable

tamalsaha · 2016-11-15T13:39:53Z

@autostatic , can you explain your test case bit more so that I can recreate it?

autostatic · 2016-11-15T16:37:22Z

I tested this on a small bare metal Ubuntu 16.04 cluster on OpenStack. One master, two nodes, K8s 1.4.6. I used kubeadm to deploy this cluster so as long as there is no pod network the kube-dns pod will not start up completely. I then deployed the kube-flannel.yml from https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml with the necessary modifications using a Flannel Docker image with your patch. After deploying the kube-dns pod still reports the errors I posted above.
If there are better ways to test this then I'd love to know. My main goal is to run plain Flannel as an add-on. I could use Canal but for some setups I'd prefer plain Flannel, I don't always need Calico.
Here's the kube-flannel.yml I'm using:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "flannelnet",
      "type": "flannel",
      "delegate": {
        "isGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/22",
      "SubnetLen": 24,
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      imagePullSecrets:
        - name: flanneld-registry
      containers:
      - name: kube-flannel
#        image: quay.io/coreos/flannel-git:latest
        image: my.private.gitlab.registry/autostatic/flanneld:20161114
        command: [ "/opt/bin/flanneld" ]
        args: [ "-ip-masq", "-kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: hosts
          mountPath: /etc/hosts
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: busybox
        command: [ "/bin/sh", "-c", "set -e -x; TMP=/etc/cni/net.d/.tmp-flannel-cfg; cp /etc/kube-flannel/cni-conf.json ${TMP}; mv ${TMP} /etc/cni/net.d/10-flannel.conf; while :; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: hosts
          hostPath:
            path: /etc/hosts
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

tamalsaha · 2016-11-15T18:03:47Z

@autostatic , I am not sure that you are having the same issue as I was. The issue I was facing was that pods running on master (with hostNetwork:true in my case) could not connect to pods on regular nodes using Pod IP.

From your log, it seems that DNS pod running on regular node can't connect to kube apiserver (https://10.96.0.1:443). So, If I were you, I would first confirm that the flannel network is actually working as intended. One way to check that is to see if you can ping the IP address of the flannel bridge on master from the node running DNS pod.

FYI, I also had to make some changes to the cni-conf.json. You can see my changes here: https://github.com/appscode/kubernetes/commit/ee660dc997f7ae5042033f226b4416d4513b5422 . The important thing here was, ensuring Kubernetes was using the bridge created by flannel. Without that, pods will be disconnected from the flannel overlay network. It will be helpful to see the result of ifconfig from one of your regular nodes.

If you are unfamiliar with the cni conf option, you will find these docs handy:

autostatic · 2016-11-16T15:35:41Z

Hello @tamalsaha, thanks for the feedback. I made the changes to the CNI config and then Flannel came up successfully, DNS started working and I could deploy a working Dashboard. I don't have a Flannel bridge on my master though, that could be related to the Hairpin setting?
So it indeed looks like I was facing a different issue, much thanks for the pointers in the right direction!
Fwiw, an ifconfig of one of the nodes now looks like this:

cbr0      Link encap:Ethernet  HWaddr 0a:58:0a:f4:03:01  
          inet addr:10.244.3.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::7413:5eff:fec0:2743/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:39641 errors:0 dropped:0 overruns:0 frame:0
          TX packets:41818 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:7529077 (7.5 MB)  TX bytes:4309168 (4.3 MB)

docker0   Link encap:Ethernet  HWaddr 02:42:8c:ed:2b:2a  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ens3      Link encap:Ethernet  HWaddr fa:16:3e:d3:c3:67  
          inet addr:172.16.172.101  Bcast:172.16.172.255  Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fed3:c367/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:522692 errors:0 dropped:0 overruns:0 frame:0```
          TX packets:562404 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:776695900 (776.6 MB)  TX bytes:132399137 (132.3 MB)

flannel.1 Link encap:Ethernet  HWaddr 16:bd:ac:f0:fb:59  
          inet addr:10.244.3.0  Bcast:0.0.0.0  Mask:255.255.252.0
          inet6 addr: fe80::14bd:acff:fef0:fb59/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:10212 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6787 errors:0 dropped:8 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:429606 (429.6 KB)  TX bytes:1063174 (1.0 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:162 errors:0 dropped:0 overruns:0 frame:0
          TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:13460 (13.4 KB)  TX bytes:13460 (13.4 KB)

veth673f5af5 Link encap:Ethernet  HWaddr aa:7e:f3:33:8b:1b  
          inet6 addr: fe80::a87e:f3ff:fe33:8b1b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:39562 errors:0 dropped:0 overruns:0 frame:0
          TX packets:41781 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:8072264 (8.0 MB)  TX bytes:4297709 (4.2 MB)

vetha2245abc Link encap:Ethernet  HWaddr de:35:55:89:4c:e1  
          inet6 addr: fe80::dc35:55ff:fe89:4ce1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:258 (258.0 B)  TX bytes:858 (858.0 B)

vetha99a743a Link encap:Ethernet  HWaddr 56:56:b5:b2:55:00  
          inet6 addr: fe80::5456:b5ff:feb2:5500/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:258 (258.0 B)  TX bytes:1158 (1.1 KB)

tamalsaha · 2016-11-16T18:32:09Z

@autostatic I am glad that your cluster is working. The flannel brdige gets created the first time CNI plguin is called. Since kubernetes does not run regular pod on master, cbr0 bridge has not been created yet.

It also seems that you don't need my patch. I needed this patch because we run a HAproxy based ingress controller on the master that load balances across pods on regular nodes. So, I needed Haproxy on master to be able to connect to pods on regular nodes.

autostatic · 2016-11-16T20:14:19Z

Hi @tamalsaha I did some more tests including a couple of fresh deployments and without your patch the cluster is not functional, I can't ping the other nodes from the master. If I do a deployment with a patched Flannel the cluster comes up properly.

tamalsaha · 2016-11-16T20:28:32Z

Yes, if you want to ping regular nodes from master, you need this patch.

autostatic · 2016-12-11T13:35:46Z

#560 fixes my issues.

mattenklicker · 2017-01-26T10:15:24Z

I have the same problem: First kubernetes node gets the net address 10.244.0.0 from network 10.244.0.0/16 assigned. Therefore this node is not reachable from other nodes. NodePort services, that I want to reach via the first node are unreachable, when the service itself runs on another node. I can see leaving packets from 10.244.0.0 to other nodes, but I can't see returning packets because they are not routable.
The above patch (#535 (comment)) skips the net address, but networking didn't work for me after that. At least in an existing cluster. And there is no check for duplicate addresses.
Perhaps a explicit route like 10.244.0.0/32 dev flannel.1 on all other nodes would work, but I did not test that and it doesn't look nice when an a route 10.244.0.0/16 dev flannel.1 exists.

tamalsaha · 2017-01-26T10:18:42Z

@mattenklicker, which version are you using? https://github.com/coreos/flannel/releases/tag/v0.7.0 is supposed to fix this issue.

mattenklicker · 2017-01-26T10:29:02Z

@tamalsaha v0.7.0

samarjit · 2017-01-28T06:54:35Z

Update: Created a github project to create environment - https://github.com/samarjit/vagrant-kubeadm

I am using v0.7.0 too. But having same issue master to slave node communication failure.

[root@kmaster ~]# kubectl get pods -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
hello-deployment-1725651635-1nnnx   1/1       Running   0          10m       10.244.1.4      kslave
hello-deployment-1725651635-dh3r6   1/1       Running   0          10m       10.244.1.3      kslave
hello-deployment-1725651635-smtx8   1/1       Running   0          10m       10.244.0.2      kmaster
kube-flannel-ds-bklmr               2/2       Running   0          43m       192.168.33.10   kmaster
kube-flannel-ds-m0lbd               2/2       Running   2          35m       192.168.33.11   kslave
[root@kmaster ~]#

Ping 10.244.0.2 -> 10.244.1.4 master to slave does not work.

In master node, if I query dns it seems to work fine:

[root@kmaster ~]# dig +short  @10.96.0.10 _http._tcp.hello-service.default.svc.cluster.local SRV
;; connection timed out; no servers could be reached
[root@kmaster ~]#

[root@kmaster ~]#  tcpdump -e -i flannel.1 -n arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 65535 bytes
06:23:47.701820 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28
06:23:47.701832 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28
06:23:48.703932 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28

If I try tcpdump in slave node no packets are received.

I followed testing DNS as described in https://kubernetes.io/docs/admin/dns/. It works!

[root@kmaster ~]# kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[root@kmaster ~]#

I am starting kubeadm using the following script.

kubeadm init --api-advertise-addresses=192.168.33.10 --token=7baee4.d576223cb4884c9b --pod-network-cidr="10.244.0.0/16"
jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=192.168.33.10"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json


kubectl -n kube-system get ds -l 'component=kube-proxy' -o json \
  | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace","--cluster-cidr=10.244.0.0/16"]' \
  |   kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
  cp /etc/kubernetes/admin.conf /vagrant

kube-flanel.yml.

  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

[root@kmaster ~]# ip route
default via 10.0.2.2 dev enp0s3  proto static  metric 100
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15  metric 100
10.244.0.0/24 dev cni0  proto kernel  scope link  src 10.244.0.1
10.244.0.0/16 dev flannel.1
169.254.0.0/16 dev enp0s8  scope link  metric 1003
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
192.168.33.0/24 dev enp0s8  proto kernel  scope link  src 192.168.33.10
[root@kmaster ~]#

[root@kslave ~]# ip route
default via 10.0.2.2 dev enp0s3  proto static  metric 100
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15  metric 100
10.244.0.0/16 dev flannel.1
10.244.1.0/24 dev cni0  proto kernel  scope link  src 10.244.1.1
169.254.0.0/16 dev enp0s8  scope link  metric 1003
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
192.168.33.0/24 dev enp0s8  proto kernel  scope link  src 192.168.33.11
[root@kslave ~]#

[root@kmaster ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5a:e9:e7 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 82857sec preferred_lft 82857sec
    inet6 fe80::a00:27ff:fe5a:e9e7/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:9b:03:a6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.10/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe9b:3a6/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:4d:51:23:5b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 0e:94:71:89:36:90 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::c94:71ff:fe89:3690/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 0a:58:0a:f4:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::780b:a4ff:fe46:ab02/64 scope link
       valid_lft forever preferred_lft forever
7: veth481ad07c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 7a:0b:a4:46:ab:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::780b:a4ff:fe46:ab02/64 scope link
       valid_lft forever preferred_lft forever
[root@kmaster ~]#

[root@kslave ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5a:e9:e7 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 82847sec preferred_lft 82847sec
    inet6 fe80::a00:27ff:fe5a:e9e7/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:45:d8:7e brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.11/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe45:d87e/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:c3:53:19:21 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 96:5a:33:93:7c:6f brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::945a:33ff:fe93:7c6f/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 0a:58:0a:f4:01:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::3052:1cff:fe84:193b/64 scope link
       valid_lft forever preferred_lft forever
7: veth33545403@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 32:52:1c:84:19:3b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::3052:1cff:fe84:193b/64 scope link
       valid_lft forever preferred_lft forever
8: vethd5892a87@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 22:e8:13:6f:fe:ae brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::20e8:13ff:fe6f:feae/64 scope link
       valid_lft forever preferred_lft forever
9: vethaf799bc1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 4a:27:e1:5a:41:39 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::4827:e1ff:fe5a:4139/64 scope link
       valid_lft forever preferred_lft forever
10: veth84875acc@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 96:5c:ac:de:82:bb brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::945c:acff:fede:82bb/64 scope link
       valid_lft forever preferred_lft forever
[root@kslave ~]#

samarjit · 2017-01-30T12:47:21Z

My issue was solved. Its vagrant environment specific issue.
Vagrant assign 10.0.2.15 IP to each machine which flannel was using as key, so it was creating only one subnet, ideally there should be two subnets for each of the nodes. Solution was to provide --iface=eth1 while launching flanneld. I noticed this after deploying etcd and flannel natively on clean VMs.

Same logic was applied in startup command of flannel in kubernetes.

Kube-Flannel yaml:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" , "--iface=enp0s8"]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

Note: --iface=.

[root@kmaster ~]#
[root@kmaster ~]# kubectl describe hello-service
the server doesn't have a resource type "hello-service"
[root@kmaster ~]# kubectl describe service hello-service
Name:                   hello-service
Namespace:              default
Labels:                 <none>
Selector:               app=hello
Type:                   ClusterIP
IP:                     10.104.194.162
Port:                   http    80/TCP
Endpoints:              10.244.0.2:8080,10.244.1.3:8080,10.244.1.4:8080
Session Affinity:       None
No events.
[root@kmaster ~]#

Shows DNS resolution.

[root@kmaster ~]# dig +short  @10.96.0.10 _http._tcp.hello-service.default.svc.cluster.local SRV
10 100 80 hello-service.default.svc.cluster.local.
[root@kmaster ~]# dig +short  @10.96.0.10 hello-service.default.svc.cluster.local.
10.104.194.162

The service is reachable.

[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-pb9mv
ADDRESSES:
    127.0.0.1/8
    10.244.1.4/24
    ::1/128
    fe80::f067:16ff:fe96:7295/64
[root@kmaster ~]#
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-0t8xx
ADDRESSES:
    127.0.0.1/8
    10.244.1.3/24
    ::1/128
    fe80::c59:b2ff:fe82:ee1a/64
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-51df9
ADDRESSES:
    127.0.0.1/8
    10.244.0.2/24
    ::1/128
    fe80::c4a1:84ff:fe82:ec83/64
[root@kmaster ~]#

rastislavs · 2017-02-01T09:13:34Z

@samarjit Thanks, I run into the same issue, specifying --iface= works for me too.

tamalsaha · 2017-02-13T14:29:49Z

This issue is fixed for me with v7.0 .

linericyang · 2017-05-12T19:33:07Z

@samarjit ran into the same issue in vagrant environment, specifying --iface= in flannel daemon works for me. Thanks.

pradeepkumarspk · 2019-12-30T19:38:59Z

i had the same issue on vagrant; solved now with iface; thanks much

tomdee added the component/kubernetes label Oct 25, 2016

autostatic mentioned this issue Nov 17, 2016

kubernetes 1.4 CNI install from daemonset #545

Closed

This was referenced Nov 28, 2016

kube-flannel: host->remote-pod networking does not work on first node #533

Closed

Improvements to kubernetes lease management & vxlan backend #560

Closed

aaronlevy mentioned this issue Dec 9, 2016

[do-not-merge] backend/vxlan: Remove local broadcast route for network #569

Closed

rastislavs mentioned this issue Feb 1, 2017

Issue with inter-host communication in a multi-host k8s cluster with Flannel kubernetes/kubeadm#139

Closed

rastislavs mentioned this issue Feb 1, 2017

Issue with inter-host communication in a multi-host k8s cluster created using kubeadm #601

Closed

tamalsaha closed this as completed Feb 13, 2017

slawekgh mentioned this issue Sep 20, 2017

cant connect to pods from external k8s nodes , k8s nodes can connect only to local pods, load balancing doesnt work properly #823

Closed

irishandyb mentioned this issue Dec 12, 2017

kube-dns (10.96.0.10) unavailable on minion nodes kubernetes/kubernetes#57096

Closed

MatthewChiappa mentioned this issue Feb 22, 2018

Cannot hit service from other nodes with default Vagrant setup kubernetes-sigs/kubespray#2319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master to Pod communication is broken in kube-flannel #535

Master to Pod communication is broken in kube-flannel #535

tamalsaha commented Oct 22, 2016

tamalsaha commented Oct 22, 2016

tomdee commented Oct 26, 2016

tamalsaha commented Oct 26, 2016

tamalsaha commented Oct 26, 2016

autostatic commented Nov 15, 2016

tamalsaha commented Nov 15, 2016

autostatic commented Nov 15, 2016 •

edited

tamalsaha commented Nov 15, 2016 •

edited

autostatic commented Nov 16, 2016 •

edited

tamalsaha commented Nov 16, 2016

autostatic commented Nov 16, 2016

tamalsaha commented Nov 16, 2016

autostatic commented Dec 11, 2016

mattenklicker commented Jan 26, 2017

tamalsaha commented Jan 26, 2017 •

edited

mattenklicker commented Jan 26, 2017

samarjit commented Jan 28, 2017 •

edited

samarjit commented Jan 30, 2017

rastislavs commented Feb 1, 2017

tamalsaha commented Feb 13, 2017

linericyang commented May 12, 2017

pradeepkumarspk commented Dec 30, 2019

Master to Pod communication is broken in kube-flannel #535

Master to Pod communication is broken in kube-flannel #535

Comments

tamalsaha commented Oct 22, 2016

tamalsaha commented Oct 22, 2016

tomdee commented Oct 26, 2016

tamalsaha commented Oct 26, 2016

tamalsaha commented Oct 26, 2016

autostatic commented Nov 15, 2016

tamalsaha commented Nov 15, 2016

autostatic commented Nov 15, 2016 • edited

tamalsaha commented Nov 15, 2016 • edited

autostatic commented Nov 16, 2016 • edited

tamalsaha commented Nov 16, 2016

autostatic commented Nov 16, 2016

tamalsaha commented Nov 16, 2016

autostatic commented Dec 11, 2016

mattenklicker commented Jan 26, 2017

tamalsaha commented Jan 26, 2017 • edited

mattenklicker commented Jan 26, 2017

samarjit commented Jan 28, 2017 • edited

samarjit commented Jan 30, 2017

rastislavs commented Feb 1, 2017

tamalsaha commented Feb 13, 2017

linericyang commented May 12, 2017

pradeepkumarspk commented Dec 30, 2019

autostatic commented Nov 15, 2016 •

edited

tamalsaha commented Nov 15, 2016 •

edited

autostatic commented Nov 16, 2016 •

edited

tamalsaha commented Jan 26, 2017 •

edited

samarjit commented Jan 28, 2017 •

edited