Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master to Pod communication is broken in kube-flannel #535

Closed
tamalsaha opened this issue Oct 22, 2016 · 22 comments
Closed

Master to Pod communication is broken in kube-flannel #535

tamalsaha opened this issue Oct 22, 2016 · 22 comments

Comments

@tamalsaha
Copy link

I am trying to setup a Kubernetes cluster using kube-flannel mode using vxlan backend. Node to Node communication is working. But Master to Pod network is not working. I am not a Linux networking expert. I see that master flannel.1 is assigned the network address. This seems to causing issues with arp.

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
# ip route show
default via 159.203.160.1 dev eth0 
10.17.0.0/16 dev eth0  proto kernel  scope link  src 10.17.0.8 
10.132.0.0/16 dev eth1  proto kernel  scope link  src 10.132.22.4 
10.244.0.0/16 dev flannel.1  proto kernel  scope link  src 10.244.0.0
159.203.160.0/20 dev eth0  proto kernel  scope link  src 159.203.168.74 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
# tcpdump -e -i flannel.1 -n arp
root@k-211935-master:~# tcpdump -e -i flannel.1 -n arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:03:26.552296 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552313 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552326 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.552290 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:28.552307 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.560535 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:29.560472 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:30.560456 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28

The problem seems to be that Master flannel.1 is assigned the first IP of the subnet zero, which is indistinguishable from the network address. Can you please confirm that this will fail Master to Pod communication?

I am thinking about using the next Subnet of Node.Spec.PodCIDR in kubeSubnetManager. Will that fix this issue?

cc: @mikedanese

@tamalsaha
Copy link
Author

Things seem to be working after applying https://github.com/appscode/flannel/commit/b083788405ce2bf3c34b9d4df7b5d77afc865b4e

@tomdee
Copy link
Contributor

tomdee commented Oct 26, 2016

@tamalsaha Can you share some of kubernetes commands you were using to repro this? Where are you pinging to and from, is it from your master node to a pod on a different node?

@tamalsaha
Copy link
Author

@tomdee, I was pinging from master to a pod running on a different node.

@tamalsaha
Copy link
Author

@tomdee I just ran a nginx pod, then tried to wget from the master host directly.

kubectl run my-nginx --image=nginx --replicas=2 --port=80
kubectl expose deployment my-nginx --port=80 

wget http://<pod-ip>:80

@autostatic
Copy link

@tamalsaha, tried your fix but still the pods can't communicate properly with each other:
E1115 12:31:12.646494 1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: connect: network is unreachable

@tamalsaha
Copy link
Author

@autostatic , can you explain your test case bit more so that I can recreate it?

@autostatic
Copy link

autostatic commented Nov 15, 2016

I tested this on a small bare metal Ubuntu 16.04 cluster on OpenStack. One master, two nodes, K8s 1.4.6. I used kubeadm to deploy this cluster so as long as there is no pod network the kube-dns pod will not start up completely. I then deployed the kube-flannel.yml from https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml with the necessary modifications using a Flannel Docker image with your patch. After deploying the kube-dns pod still reports the errors I posted above.
If there are better ways to test this then I'd love to know. My main goal is to run plain Flannel as an add-on. I could use Canal but for some setups I'd prefer plain Flannel, I don't always need Calico.
Here's the kube-flannel.yml I'm using:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "flannelnet",
      "type": "flannel",
      "delegate": {
        "isGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/22",
      "SubnetLen": 24,
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      imagePullSecrets:
        - name: flanneld-registry
      containers:
      - name: kube-flannel
#        image: quay.io/coreos/flannel-git:latest
        image: my.private.gitlab.registry/autostatic/flanneld:20161114
        command: [ "/opt/bin/flanneld" ]
        args: [ "-ip-masq", "-kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: hosts
          mountPath: /etc/hosts
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: busybox
        command: [ "/bin/sh", "-c", "set -e -x; TMP=/etc/cni/net.d/.tmp-flannel-cfg; cp /etc/kube-flannel/cni-conf.json ${TMP}; mv ${TMP} /etc/cni/net.d/10-flannel.conf; while :; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: hosts
          hostPath:
            path: /etc/hosts
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

@tamalsaha
Copy link
Author

tamalsaha commented Nov 15, 2016

@autostatic , I am not sure that you are having the same issue as I was. The issue I was facing was that pods running on master (with hostNetwork:true in my case) could not connect to pods on regular nodes using Pod IP.

From your log, it seems that DNS pod running on regular node can't connect to kube apiserver (https://10.96.0.1:443). So, If I were you, I would first confirm that the flannel network is actually working as intended. One way to check that is to see if you can ping the IP address of the flannel bridge on master from the node running DNS pod.

FYI, I also had to make some changes to the cni-conf.json. You can see my changes here: https://github.com/appscode/kubernetes/commit/ee660dc997f7ae5042033f226b4416d4513b5422 . The important thing here was, ensuring Kubernetes was using the bridge created by flannel. Without that, pods will be disconnected from the flannel overlay network. It will be helpful to see the result of ifconfig from one of your regular nodes.

If you are unfamiliar with the cni conf option, you will find these docs handy:

@autostatic
Copy link

autostatic commented Nov 16, 2016

Hello @tamalsaha, thanks for the feedback. I made the changes to the CNI config and then Flannel came up successfully, DNS started working and I could deploy a working Dashboard. I don't have a Flannel bridge on my master though, that could be related to the Hairpin setting?
So it indeed looks like I was facing a different issue, much thanks for the pointers in the right direction!
Fwiw, an ifconfig of one of the nodes now looks like this:

cbr0      Link encap:Ethernet  HWaddr 0a:58:0a:f4:03:01  
          inet addr:10.244.3.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::7413:5eff:fec0:2743/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:39641 errors:0 dropped:0 overruns:0 frame:0
          TX packets:41818 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:7529077 (7.5 MB)  TX bytes:4309168 (4.3 MB)

docker0   Link encap:Ethernet  HWaddr 02:42:8c:ed:2b:2a  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ens3      Link encap:Ethernet  HWaddr fa:16:3e:d3:c3:67  
          inet addr:172.16.172.101  Bcast:172.16.172.255  Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fed3:c367/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:522692 errors:0 dropped:0 overruns:0 frame:0```
          TX packets:562404 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:776695900 (776.6 MB)  TX bytes:132399137 (132.3 MB)

flannel.1 Link encap:Ethernet  HWaddr 16:bd:ac:f0:fb:59  
          inet addr:10.244.3.0  Bcast:0.0.0.0  Mask:255.255.252.0
          inet6 addr: fe80::14bd:acff:fef0:fb59/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:10212 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6787 errors:0 dropped:8 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:429606 (429.6 KB)  TX bytes:1063174 (1.0 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:162 errors:0 dropped:0 overruns:0 frame:0
          TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:13460 (13.4 KB)  TX bytes:13460 (13.4 KB)

veth673f5af5 Link encap:Ethernet  HWaddr aa:7e:f3:33:8b:1b  
          inet6 addr: fe80::a87e:f3ff:fe33:8b1b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:39562 errors:0 dropped:0 overruns:0 frame:0
          TX packets:41781 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:8072264 (8.0 MB)  TX bytes:4297709 (4.2 MB)

vetha2245abc Link encap:Ethernet  HWaddr de:35:55:89:4c:e1  
          inet6 addr: fe80::dc35:55ff:fe89:4ce1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:258 (258.0 B)  TX bytes:858 (858.0 B)

vetha99a743a Link encap:Ethernet  HWaddr 56:56:b5:b2:55:00  
          inet6 addr: fe80::5456:b5ff:feb2:5500/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:258 (258.0 B)  TX bytes:1158 (1.1 KB)

@tamalsaha
Copy link
Author

@autostatic I am glad that your cluster is working. The flannel brdige gets created the first time CNI plguin is called. Since kubernetes does not run regular pod on master, cbr0 bridge has not been created yet.

It also seems that you don't need my patch. I needed this patch because we run a HAproxy based ingress controller on the master that load balances across pods on regular nodes. So, I needed Haproxy on master to be able to connect to pods on regular nodes.

@autostatic
Copy link

Hi @tamalsaha I did some more tests including a couple of fresh deployments and without your patch the cluster is not functional, I can't ping the other nodes from the master. If I do a deployment with a patched Flannel the cluster comes up properly.

@tamalsaha
Copy link
Author

Yes, if you want to ping regular nodes from master, you need this patch.

@autostatic
Copy link

#560 fixes my issues.

@mattenklicker
Copy link

I have the same problem: First kubernetes node gets the net address 10.244.0.0 from network 10.244.0.0/16 assigned. Therefore this node is not reachable from other nodes. NodePort services, that I want to reach via the first node are unreachable, when the service itself runs on another node. I can see leaving packets from 10.244.0.0 to other nodes, but I can't see returning packets because they are not routable.
The above patch (#535 (comment)) skips the net address, but networking didn't work for me after that. At least in an existing cluster. And there is no check for duplicate addresses.
Perhaps a explicit route like 10.244.0.0/32 dev flannel.1 on all other nodes would work, but I did not test that and it doesn't look nice when an a route 10.244.0.0/16 dev flannel.1 exists.

@tamalsaha
Copy link
Author

tamalsaha commented Jan 26, 2017

@mattenklicker, which version are you using? https://github.com/coreos/flannel/releases/tag/v0.7.0 is supposed to fix this issue.

@mattenklicker
Copy link

@tamalsaha v0.7.0

@samarjit
Copy link

samarjit commented Jan 28, 2017

Update: Created a github project to create environment - https://github.com/samarjit/vagrant-kubeadm

I am using v0.7.0 too. But having same issue master to slave node communication failure.

[root@kmaster ~]# kubectl get pods -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
hello-deployment-1725651635-1nnnx   1/1       Running   0          10m       10.244.1.4      kslave
hello-deployment-1725651635-dh3r6   1/1       Running   0          10m       10.244.1.3      kslave
hello-deployment-1725651635-smtx8   1/1       Running   0          10m       10.244.0.2      kmaster
kube-flannel-ds-bklmr               2/2       Running   0          43m       192.168.33.10   kmaster
kube-flannel-ds-m0lbd               2/2       Running   2          35m       192.168.33.11   kslave
[root@kmaster ~]#

Ping 10.244.0.2 -> 10.244.1.4 master to slave does not work.

In master node, if I query dns it seems to work fine:

[root@kmaster ~]# dig +short  @10.96.0.10 _http._tcp.hello-service.default.svc.cluster.local SRV
;; connection timed out; no servers could be reached
[root@kmaster ~]#
[root@kmaster ~]#  tcpdump -e -i flannel.1 -n arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 65535 bytes
06:23:47.701820 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28
06:23:47.701832 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28
06:23:48.703932 0e:94:71:89:36:90 > 96:5a:33:93:7c:6f, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.2 tell 10.244.0.0, length 28

If I try tcpdump in slave node no packets are received.

I followed testing DNS as described in https://kubernetes.io/docs/admin/dns/. It works!

[root@kmaster ~]# kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[root@kmaster ~]#

I am starting kubeadm using the following script.

kubeadm init --api-advertise-addresses=192.168.33.10 --token=7baee4.d576223cb4884c9b --pod-network-cidr="10.244.0.0/16"
jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=192.168.33.10"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json


kubectl -n kube-system get ds -l 'component=kube-proxy' -o json \
  | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace","--cluster-cidr=10.244.0.0/16"]' \
  |   kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
  cp /etc/kubernetes/admin.conf /vagrant

kube-flanel.yml.

  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
[root@kmaster ~]# ip route
default via 10.0.2.2 dev enp0s3  proto static  metric 100
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15  metric 100
10.244.0.0/24 dev cni0  proto kernel  scope link  src 10.244.0.1
10.244.0.0/16 dev flannel.1
169.254.0.0/16 dev enp0s8  scope link  metric 1003
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
192.168.33.0/24 dev enp0s8  proto kernel  scope link  src 192.168.33.10
[root@kmaster ~]#
[root@kslave ~]# ip route
default via 10.0.2.2 dev enp0s3  proto static  metric 100
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15  metric 100
10.244.0.0/16 dev flannel.1
10.244.1.0/24 dev cni0  proto kernel  scope link  src 10.244.1.1
169.254.0.0/16 dev enp0s8  scope link  metric 1003
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
192.168.33.0/24 dev enp0s8  proto kernel  scope link  src 192.168.33.11
[root@kslave ~]#
[root@kmaster ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5a:e9:e7 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 82857sec preferred_lft 82857sec
    inet6 fe80::a00:27ff:fe5a:e9e7/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:9b:03:a6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.10/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe9b:3a6/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:4d:51:23:5b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 0e:94:71:89:36:90 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::c94:71ff:fe89:3690/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 0a:58:0a:f4:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::780b:a4ff:fe46:ab02/64 scope link
       valid_lft forever preferred_lft forever
7: veth481ad07c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 7a:0b:a4:46:ab:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::780b:a4ff:fe46:ab02/64 scope link
       valid_lft forever preferred_lft forever
[root@kmaster ~]#
[root@kslave ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5a:e9:e7 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 82847sec preferred_lft 82847sec
    inet6 fe80::a00:27ff:fe5a:e9e7/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:45:d8:7e brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.11/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe45:d87e/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:c3:53:19:21 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 96:5a:33:93:7c:6f brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::945a:33ff:fe93:7c6f/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 0a:58:0a:f4:01:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::3052:1cff:fe84:193b/64 scope link
       valid_lft forever preferred_lft forever
7: veth33545403@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 32:52:1c:84:19:3b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::3052:1cff:fe84:193b/64 scope link
       valid_lft forever preferred_lft forever
8: vethd5892a87@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 22:e8:13:6f:fe:ae brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::20e8:13ff:fe6f:feae/64 scope link
       valid_lft forever preferred_lft forever
9: vethaf799bc1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 4a:27:e1:5a:41:39 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::4827:e1ff:fe5a:4139/64 scope link
       valid_lft forever preferred_lft forever
10: veth84875acc@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether 96:5c:ac:de:82:bb brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::945c:acff:fede:82bb/64 scope link
       valid_lft forever preferred_lft forever
[root@kslave ~]#

@samarjit
Copy link

My issue was solved. Its vagrant environment specific issue.
Vagrant assign 10.0.2.15 IP to each machine which flannel was using as key, so it was creating only one subnet, ideally there should be two subnets for each of the nodes. Solution was to provide --iface=eth1 while launching flanneld. I noticed this after deploying etcd and flannel natively on clean VMs.

Same logic was applied in startup command of flannel in kubernetes.

Kube-Flannel yaml:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" , "--iface=enp0s8"]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

Note: --iface=.

[root@kmaster ~]#
[root@kmaster ~]# kubectl describe hello-service
the server doesn't have a resource type "hello-service"
[root@kmaster ~]# kubectl describe service hello-service
Name:                   hello-service
Namespace:              default
Labels:                 <none>
Selector:               app=hello
Type:                   ClusterIP
IP:                     10.104.194.162
Port:                   http    80/TCP
Endpoints:              10.244.0.2:8080,10.244.1.3:8080,10.244.1.4:8080
Session Affinity:       None
No events.
[root@kmaster ~]#

Shows DNS resolution.

[root@kmaster ~]# dig +short  @10.96.0.10 _http._tcp.hello-service.default.svc.cluster.local SRV
10 100 80 hello-service.default.svc.cluster.local.
[root@kmaster ~]# dig +short  @10.96.0.10 hello-service.default.svc.cluster.local.
10.104.194.162

The service is reachable.

[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-pb9mv
ADDRESSES:
    127.0.0.1/8
    10.244.1.4/24
    ::1/128
    fe80::f067:16ff:fe96:7295/64
[root@kmaster ~]#
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-0t8xx
ADDRESSES:
    127.0.0.1/8
    10.244.1.3/24
    ::1/128
    fe80::c59:b2ff:fe82:ee1a/64
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-51df9
ADDRESSES:
    127.0.0.1/8
    10.244.0.2/24
    ::1/128
    fe80::c4a1:84ff:fe82:ec83/64
[root@kmaster ~]#

@rastislavs
Copy link

@samarjit Thanks, I run into the same issue, specifying --iface= works for me too.

@tamalsaha
Copy link
Author

This issue is fixed for me with v7.0 .

@linericyang
Copy link

@samarjit ran into the same issue in vagrant environment, specifying --iface= in flannel daemon works for me. Thanks.

@pradeepkumarspk
Copy link

i had the same issue on vagrant; solved now with iface; thanks much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants