New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flannel.1 link gets 2 ipv4 addresses on secondary nodes #883

Closed
d11wtq opened this Issue Nov 20, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@d11wtq
Copy link

d11wtq commented Nov 20, 2017

Using kubeadm and kubernetes 1.8.3 to provision a 3 node cluster on HypriotOS (ARM), I can initialize the master fine, but when joining nodes into the cluster, kube-flannel crashes on each node with:

Error registering network: failed to configure interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &{{%!s(int=5) %!s(int=1450) %!s(int=0) flannel.1 76:dd:ec:c0:e6:86 up|broadcast|multicast %!s(uint32=69699) %!s(int=0) %!s(int=0) <nil>  %!s(*netlink.LinkStatistics=&{272198 273028 25973338 108272828 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}) %!s(int=0) %!s(*netlink.LinkXdp=<nil>) ether <nil> unknown} %!s(int=1) %!s(int=3) 192.168.0.14 <nil> %!s(int=0) %!s(int=0) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(int=300) %!s(int=0) %!s(int=8472) %!s(int=0) %!s(int=0)}

When I run ip addr I can see that flannel.1 has been given two different IPs, each a /24 apart from each other.

5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 76:dd:ec:c0:e6:86 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::74dd:ecff:fec0:e686/64 scope link
       valid_lft forever preferred_lft forever

I haven't manually added these addresses, they just appear automatically when flannel starts.

kube-flannel enters CrashLoopBackoff and never recovers from the error. Is this a bug, or something I need to update on my host OS? How do I remove the second address if this is a suitable temporary workaround?

The flannel manifest I have applied is: https://raw.githubusercontent.com/coreos/flannel/v0.9.0/Documentation/kube-flannel.yml (sed s/amd64/arm/g).

The entire log leading up to the error is:

$ kubectl -n kube-system logs kube-flannel-ds-tgs85
I1120 04:39:40.815542       1 main.go:470] Determining IP address of default interface
I1120 04:39:40.817983       1 main.go:483] Using interface with name wlan0 and address 192.168.0.14
I1120 04:39:40.818240       1 main.go:500] Defaulting external address to interface address (192.168.0.14)
I1120 04:39:41.137713       1 kube.go:130] Waiting 10m0s for node controller to sync
I1120 04:39:41.137945       1 kube.go:283] Starting kube subnet manager
I1120 04:39:42.138167       1 kube.go:137] Node controller sync successful
I1120 04:39:42.138306       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - jasmine
I1120 04:39:42.138370       1 main.go:238] Installing signal handlers
I1120 04:39:42.138873       1 main.go:348] Found network config - Backend type: vxlan
I1120 04:39:42.139127       1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1120 04:39:42.141224       1 main.go:280] Error registering network: failed to configure interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &{{%!s(int=5) %!s(int=1450) %!s(int=0) flannel.1 76:dd:ec:c0:e6:86 up|broadcast|multicast %!s(uint32=69699) %!s(int=0) %!s(int=0) <nil>  %!s(*netlink.LinkStatistics=&{272198 273028 25973338 108272828 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}) %!s(int=0) %!s(*netlink.LinkXdp=<nil>) ether <nil> unknown} %!s(int=1) %!s(int=3) 192.168.0.14 <nil> %!s(int=0) %!s(int=0) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(int=300) %!s(int=0) %!s(int=8472) %!s(int=0) %!s(int=0)}
I1120 04:39:42.141346       1 main.go:328] Stopping shutdownHandler...

Expected Behavior

Kube flannel should start properly, not only on the master, but on all additional nodes that join the cluster. flannel.1 should automatically be allocated a single IPv4 address, rather than two addresses.

Current Behavior

Kube flannel starts correctly on the master, but additional nodes that join the cluster appear to allocate two IPv4 addresses to flannel.1, which causes Kube flannel to enter a crash loop.

Steps to Reproduce (for bugs)

Initialize a master node on HypriotOS with kubeadm init.
Join a secondary node on HypriotOS with kubeadm join.
Observe that kube-flannel runs fine on the master, but crashes on the worker node.

Context

This is preventing me from running Kubernetes right now. I previously had a working k8s 1.7 cluster with Flannel 0.7. Only since upgrading Kubernetes and Flannel has this issue started occurring.

Your Environment

  • Flannel version: 0.9.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version:gcr.io/google_containers/etcd-arm:3.0.17 (?)
  • Kubernetes version (if used): 1.8.3
  • Operating System and version: HypriotOS (Debian GNU/Linux 8)
  • Link to your project (optional):
@d11wtq

This comment has been minimized.

Copy link

d11wtq commented Nov 20, 2017

Thoughts: is it possible flannel attempts to allocate a single IP, which works, but flannel believes it failed, so it attempts to allocate the IP again, which results in the two addresses?

@d11wtq

This comment has been minimized.

Copy link

d11wtq commented Nov 20, 2017

Update: I fixed this by just ip link delete flannel.1 on the affected nodes. I'm not sure how it started, but flannel was than able to recreate the interface and work.

@d11wtq d11wtq closed this Nov 20, 2017

@tomdee

This comment has been minimized.

Copy link
Member

tomdee commented Nov 21, 2017

@d11wtq Interesting, I've not seen this before but certainly let me know if you see this again.

@xukunfeng

This comment has been minimized.

Copy link

xukunfeng commented May 4, 2018

we see the same issue with flannel 0.7.1.

ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:a2:ab:59 brd ff:ff:ff:ff:ff:ff
    inet 10.10.14.187/23 brd 10.10.15.255 scope global eth0
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:ad:b4:c5:7f brd ff:ff:ff:ff:ff:ff
    inet 172.30.79.1/24 scope global docker0
       valid_lft forever preferred_lft forever
46: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 62:d6:6c:70:2a:94 brd ff:ff:ff:ff:ff:ff
    inet 172.30.73.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 172.30.79.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
[cloud@psd011 ~]$ etcdctl get /k8s/network/subnets/172.30.73.0-24
Error:  100: Key not found (/k8s/network/subnets/172.30.73.0-24) [1959]
[cloud@psd011 ~]$ etcdctl get /k8s/network/subnets/172.30.79.0-24
{"PublicIP":"10.10.14.187","BackendType":"vxlan","BackendData":{"VtepMAC":"76:ef:58:64:0d:91"}}

Your Environment
Flannel version: 0.7.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version:gcr.io/google_containers/etcd-arm:3.0.17 (?)
Kubernetes version (if used): 1.9.3
Operating System and version: Centos 7.3
Link to your project (optional):

@adarsh1001

This comment has been minimized.

Copy link

adarsh1001 commented Jun 17, 2018

The same issue occurs with Flannel v0.10.0.
14: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether a6:62:08:a6:c2:f0 brd ff:ff:ff:ff:ff:ff inet 10.244.1.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet 169.254.46.120/16 brd 169.254.255.255 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::a462:8ff:fea6:c2f0/64 scope link valid_lft forever preferred_lft forever
Flannel goes into CrashLoopBackOff. Deleting the flannel.1 link does solve the issue.

OS: Raspbian Stretch Lite
Kubernetes version: 1.10.2

@timchenxiaoyu

This comment has been minimized.

Copy link

timchenxiaoyu commented Aug 13, 2018

i get same promblem

@wroney688

This comment has been minimized.

Copy link

wroney688 commented Oct 14, 2018

Same under K8s 1.12.0, flannel v0.10.0; however, sudo ip link delete flannel.1 did allow it to come up without error. (Hypriot 1.9.0 on ARM - Raspberry Pi 3 B+)

@timchenxiaoyu

This comment has been minimized.

Copy link

timchenxiaoyu commented Oct 14, 2018

check you etcd data and loacl env file , flanneld will get preview ip when startup

@ljfranklin

This comment has been minimized.

Copy link

ljfranklin commented Oct 19, 2018

+1, same setup as @wroney688:

-k8s 1.12.0

  • flannel v0.10.0
  • Hypriot 1.9.0 on ARM - Raspberry Pi 3 B+

In my case all the flannel pods initially come up successfully but after ~3 days the flannel pod gets stuck in CrashLoopBackOff with the following error (other 4 workers are fine):

kubectl -n kube-system logs kube-flannel-ds-arm-fcm6r
I1019 03:16:15.097562       1 main.go:475] Determining IP address of default interface
I1019 03:16:15.099403       1 main.go:488] Using interface with name eth0 and address 192.168.1.246
I1019 03:16:15.099582       1 main.go:505] Defaulting external address to interface address (192.168.1.246)
I1019 03:16:15.734392       1 kube.go:131] Waiting 10m0s for node controller to sync
I1019 03:16:15.794863       1 kube.go:294] Starting kube subnet manager
I1019 03:16:16.995022       1 kube.go:138] Node controller sync successful
I1019 03:16:16.995104       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - k8s-worker4
I1019 03:16:16.995131       1 main.go:238] Installing signal handlers
I1019 03:16:16.995337       1 main.go:353] Found network config - Backend type: vxlan
I1019 03:16:16.995492       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1019 03:16:16.996653       1 main.go:280] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:16, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0x96, 0x73, 0x59, 0x4a, 0xa2, 0xd6}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0x13f340e4), Promisc:0, Xdp:(*netlink.LinkXdp)(0x14027100), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xc0, 0xa8, 0x1, 0xf6}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I1019 03:16:16.996768       1 main.go:333] Stopping shutdownHandler...

Here's the interfaces on the worker with the failing flannel pod:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:27:eb:fa:0d:e4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.246/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a775:2ad:bcca:1d44/64 scope link
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether b8:27:eb:af:58:b1 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:fc:53:bc:ad brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
16: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 96:73:59:4a:a2:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 169.254.47.220/16 brd 169.254.255.255 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::697a:866d:c96e:5849/64 scope link
       valid_lft forever preferred_lft forever

As with others, running sudo ip link delete flannel.1 resolves it at least temporarily.

Can we re-open this issue? Any logs I can grab next time to help debug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment