Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes IPv6 problem on Docker 17.x #531

Closed
nyren opened this Issue Jan 25, 2018 · 32 comments

Comments

Projects
None yet
10 participants
@nyren
Copy link

nyren commented Jan 25, 2018

Hi, I am struggling with the following error when trying to deploy a Kubernetes 1.9.2 cluster with IPv6 networking.

kubelet[13358]: E0125 20:07:53.234698   13358 cni.go:259] Error adding network: failed to add IP addr {Version:6 Interface:0xc420014ae0 Address:{IP:fd00:1234:2000::13 Mask:ffffffffffffffff0000000000000000} Gateway:fd00:1234:2000::1} to "eth0": permission denied

The same error occurs on both master (for kube-dns) and worker nodes (for any other pod). Deploying a IPv4 kubernetes cluster with the CNI bridge driver works fine. I tried builing the latest CNI plugins from master but still got the same error.

OS: CentOS 7.4.1708
Kubernetes: 1.9.2
CNI: 0.6.0
Docker: 17.12.0.ce-1.el7.centos

/etc/cni/net.d/10-bridge-v6.conf:

{
  "cniVersion": "0.3.0",
  "name": "mynet",
  "type": "bridge",
  "bridge": "cbr0",
  "isDefaultGateway": true,
  "ipMasq": false,
  "hairpinMode": true,
  "ipam": {
    "type": "host-local",
    "ranges": [
      [
        {
          "subnet": "fd00:1234:2000::/64",
          "gateway": "fd00:1234:2000::1"
        }
      ]
    ]
  }
}

If you have any ideas on things to try it would be very much appreciated. Thanks!

@squeed

This comment has been minimized.

Copy link
Member

squeed commented Jan 25, 2018

Interesting. I was able to manually add that using cnitool. Can you try using cnitool on your machine?

See https://github.com/containernetworking/cni/tree/master/cnitool for the basics.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 25, 2018

Ah, cnitool, cool stuff :)

I was not able to solve the problem but perhaps get some more clues:

  • Ran CNI_PATH=/opt/cni/bin cnitool add mynet /var/run/netns/testing resulting in this error: failed to set bridge addr: could not set bridge's mac: invalid argument
  • Deleted the cbr0 and mynet0 interfaces
  • Re-ran cnitool and then it worked
  • Cleaned up the interfaces again and re-deployed kubernetes (with kubeadm)
  • Same failed to add IP addr error as before :(
  • Ran kubeadm reset and stopped docker
  • Re-ran cnitool and got the set bridge addr ... error again

This was with the cni plugins built from the master v0.6.0-58-g412b6d3.

@squeed

This comment has been minimized.

Copy link
Member

squeed commented Jan 25, 2018

I think kubeadm is giving you older CNI plugins. However, I'm concerned that a newer CNI plugin isn't able to "adopt" an existing bridge. Hmm.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 26, 2018

Thank you for your help, cnitool was awesome while debugging this issue. I think I have found the issue now and the problem is not within CNI but in docker.

The problem is that the container created by docker has the following sysctl settings (even though the host has disable_ipv6=0 for all interfaces):

/proc/sys/net/ipv6/conf/all/disable_ipv6: 1
/proc/sys/net/ipv6/conf/default/disable_ipv6: 1

This means that when CNI creates the interface in the container's network namespace it will get disable_ipv6=1. The IPv6 address assignment then fails (as it should) which gives the Error adding network: failed to add IP addr error.

I must confess I did not run docker with ipv6=true at first but I tried that now but it made no difference unfortunately. /proc/sys/net/ipv6/conf/default/disable_ipv6 is still 1 even if docker runs with ipv6 enabled, i.e. any new interface created (by anyone else than docker itself) will have disable_ipv6=1 by default. The interface created by CNI is one such example.

Looks like there are some IPv6 regressions in Docker 17.x. Maybe the disable_ipv6 thing got introduced at the same time? I need to do some more digging.

A question, what do you think about having CNI set disable_ipv6=0 when creating a new interface designated for IPv6 use? At least it would prevent future container runtime defaults from causing this error again?

@qrpike

This comment has been minimized.

Copy link

qrpike commented Jan 30, 2018

I am also getting this issue after upgrading to kubernetes 1.9.x. I have ipv6 disable at boot, this is the kind messages I am getting:

Jan 29 22:43:27 blade02 kubelet-wrapper[2062]: E0130 03:43:27.396405    2062 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "node-problem-detector-xqp5k_kube-system" network: open /proc/sys/net/ipv6/conf/eth0/accept_dad: no such file or directory
Jan 29 22:43:27 blade02 kubelet-wrapper[2062]: E0130 03:43:27.396456    2062 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "node-problem-detector-xqp5k_kube-system(7d585c3f-fbee-11e7-b6ee-1eb535d7075a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "node-problem-detector-xqp5k_kube-system" network: open /proc/sys/net/ipv6/conf/eth0/accept_dad: no such file or directory
Jan 29 22:43:27 blade02 kubelet-wrapper[2062]: E0130 03:43:27.396485    2062 kuberuntime_manager.go:647] createPodSandbox for pod "node-problem-detector-xqp5k_kube-system(7d585c3f-fbee-11e7-b6ee-1eb535d7075a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "node-problem-detector-xqp5k_kube-system" network: open /proc/sys/net/ipv6/conf/eth0/accept_dad: no such file or directory
Jan 29 22:43:27 blade02 kubelet-wrapper[2062]: E0130 03:43:27.396551    2062 pod_workers.go:186] Error syncing pod 7d585c3f-fbee-11e7-b6ee-1eb535d7075a ("node-problem-detector-xqp5k_kube-system(7d585c3f-fbee-11e7-b6ee-1eb535d7075a)"), skipping: failed to "CreatePodSandbox" for "node-problem-detector-xqp5k_kube-system(7d585c3f-fbee-11e7-b6ee-1eb535d7075a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"node-problem-detector-xqp5k_kube-system(7d585c3f-fbee-11e7-b6ee-1eb535d7075a)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"node-problem-detector-xqp5k_kube-system\" network: open /proc/sys/net/ipv6/conf/eth0/accept_dad: no such file or directory"

Maybe the CNI should check if the directory exists before trying to write to it?

@nyren nyren changed the title IPv6 CNI bridge error Kubernetes IPv6 problem on Docker 17.x Jan 30, 2018

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 30, 2018

I am also getting this issue after upgrading to kubernetes 1.9.x. I have ipv6 disable at boot, this is the kind messages I am getting:

[...]

Maybe the CNI should check if the directory exists before trying to write to it?

I do not think this is the same problem as I encountered but thanks for chiming in nevertheless. In my case I want to use IPv6 but cannot due to recent changes in docker. I updated the title of the Issue now that I know more of the cause of the problem.

Regarding your issue I think it should be fixed in the master branch so might be worth a try recompiling the CNI plugins from master. a124fb36e668aecd917d92aa7965a8332b3d5f74 should be the relevant commit, it changes the implementation to only set accept_dad if IPv6 addresses are present.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 30, 2018

Regarding the original issue I wrote a small patch setting disable_ipv6 to 0, i.e. enable IPv6, in CNI plugins pkg/ipam/ipam_linux.go:ConfigureIface(). This solved the issue of containers not starting in my IPv6 kubernetes deployment.

However, I still had issues with the kube-dns pod which kept restarting. Turned out to be the same root cause but this time for the loopback lo interface. Docker also disables IPv6 for lo which causes the health monitor for kube-dns to not being able to contact the service on [::1]:10053. Manually running the following command made kube-dns happy again:

ip netns exec <container-ns-name> sh -c 'echo 0 > /proc/sys/net/ipv6/conf/default/disable_ipv6'
@abhijitherekar

This comment has been minimized.

Copy link

abhijitherekar commented Jan 30, 2018

Hey Nyren,

I am also hitting the same issue. I am so glad that someone other me has found the issue.

I also found that POD has the disable_ipv6 =1 even though the HOST has disable_ipv6 =0.

But, then I tried to do a work-around by trying to get into the POD-netns and change that /proc/..disbale_ipv6 = 0, but, that doesn't work. Because, the /proc/ file system in the POD is always a read-only, how did you get to work-around it???, as your patch should give error saying /proc/ is only a read-only.

Please, let me know.

Thanks
Abhijit

@bboreham

This comment has been minimized.

Copy link
Member

bboreham commented Jan 31, 2018

Just wondering: has anyone having trouble here tried telling Docker they want to use IPv6? I.e. run the daemon with --ipv6

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 31, 2018

Just wondering: has anyone having trouble here tried telling Docker they want to use IPv6? I.e. run the daemon with --ipv6

I have tried both with and without ipv6 "enabled" in the docker daemon config file. Unfortunately it makes no difference since docker still sets /proc/sys/net/ipv6/conf/default/disable_ipv6 to 1 which means that any interface created by CNI will get disable_ipv6=1 as well.

The only thing that happens with ipv6=true in docker daemon config is that if Docker itself sets the IPv6 address of an interface it will set disable_ipv6=0, but any other interface (including lo) will still have disable_ipv6=1.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Jan 31, 2018

I also found that POD has the disable_ipv6 =1 even though the HOST has disable_ipv6 =0.

But, then I tried to do a work-around by trying to get into the POD-netns and change that /proc/..disbale_ipv6 = 0, but, that doesn't work. Because, the /proc/ file system in the POD is always a read-only, how did you get to work-around it???, as your patch should give error saying /proc/ is only a read-only.

Yes, you cannot change disable_ipv6 from "within" the container because those processes will have their capabilities dropped. Instead you need to only set the same network namespace and write to disable_ipv6 as root. Please see the ip netns exec ... command above for an example.

However, even if you do this it is not possible to change disable_ipv6 in the small time gap between CNI creating the interface and trying to set the IPv6 address.

I will try and upload a patch with a workaround in a couple of days.

@abhijitherekar

This comment has been minimized.

Copy link

abhijitherekar commented Feb 1, 2018

Even, I ran the docker with --ipv6, but, the docker expects that the interface will be created by itself and docker itself will assign the IPv6 address, which I don't want.

I want the CNI plugin to do the work of assigning the IP-address.

Thanks

@pmichali

This comment has been minimized.

Copy link

pmichali commented Feb 1, 2018

I see the same issue on Ubuntu 16.04 with docker 17.12.0-ce. I'm going to try older versions, as this was working previously.

@pmichali

This comment has been minimized.

Copy link

pmichali commented Feb 1, 2018

FYI: I downgraded to 17.03.2-ce and the IP address assignment is now working.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Feb 1, 2018

Thanks, great to know. So not all 17.x releases were bad then.

I just tried spinnig up a container on Docker 18.01.0-ce with ipv6=true and the issue seems still to be there although loopback is OK now.

/proc/sys/net/ipv6/conf/all/disable_ipv6=1
/proc/sys/net/ipv6/conf/default/disable_ipv6=1

i.e. a new interface created by CNI would get disable_ipv6=1.

Btw, if you would like to run on a newer Docker release, please try the patch in the pull request above and see if it helps.

@pmichali

This comment has been minimized.

Copy link

pmichali commented Feb 2, 2018

I heard that 17.09 should also work. Has anyone raised an issue with docker on this regression?

@bboreham

This comment has been minimized.

Copy link
Member

bboreham commented Feb 2, 2018

The issue is linked above.
moby/moby#33099

@SpComb

This comment has been minimized.

Copy link

SpComb commented Feb 8, 2018

I heard that 17.09 should also work. Has anyone raised an issue with docker on this regression?

Testing with docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6 shows that that Docker 17.06, 17.09 and 17.12 are all broken in this regard. 17.03 works, which corresponds with the IPv6 regression in moby/moby#33099.

Relevant commit in libnetwork that sets net.ipv6.conf.all.disable_ipv6=1: docker/libnetwork@947eb35#diff-f661b3057d4299c1f3c8a93ab19a81ff

Testing

17.03

$ docker info  -f '{{.ServerVersion}}'
17.03.2-ce
$ docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.lo.disable_ipv6 = 0

17.06

$ docker info  -f '{{.ServerVersion}}'
17.06.2-ce
$ docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

17.09

$ docker info  -f '{{.ServerVersion}}'
17.09.1-ce
$ docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

17.12

$ docker info  -f '{{.ServerVersion}}'
17.12.0-ce
$ docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
@nyren

This comment has been minimized.

Copy link
Author

nyren commented Feb 12, 2018

Thanks, great to know which Docker releases works and which do not. I can add the following as well:

18.01

$ docker info  -f '{{.ServerVersion}}'                                
18.01.0-ce
$ docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Excellent that you found the docker commit which introduced the issue.

I have updated the CNI plugins pull request with the workaround, hope it will be merged.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Feb 14, 2018

@abhijitherekar asked about how CNI can write the disable_ipv6 sysctl even though /proc/sys is read-only from "inside" the container.

What happens here is that CNI does not run "inside" the container (i.e. the CNI process has not dropped its capabilities) and only hooks into the network namespace of the container. Therefore /proc/sys in the container's network namespace will be writable by CNI.

@abhijitherekar

This comment has been minimized.

Copy link

abhijitherekar commented Feb 14, 2018

@nyren I am a little new to this.
I have some basic questions, How do you check in which namespace are you programmatically????
I am currently having issues with 2 Ipv6 global address for my containers which is due to the autoconf == 1 in my POD namespace, which I am trying to disable just before calling ipam.ConfigureIface and it throws error saying that "/proc/sys" is a read-only File-system.

So, I am trying to figure out why did it fail.
Please, Let me know.

@euank

This comment has been minimized.

Copy link

euank commented Feb 20, 2018

@abhijitherekar You can check the network namespace of a given pid with readlink /proc/$pid/ns/net or use /proc/self/ns/net for whatever the current one is; the documentation in man 7 namespaces recommends this and has a lot more useful information related to working with namespaces.

The way cni is run is quite similar to nsenter --net=/proc/$pid/ns/net /path/to/cni/plugin which switches over the network namespace, but keeps the mount namespace (and thus the host view of /sys/fs and the host view of cni binaries). Typically, CNI actually uses namespace files in /var/run/netns/* which can exist independently of any given pid.
More information about this can be gleaned from the cnitool and scripts examples in this repository.

@nyren

This comment has been minimized.

Copy link
Author

nyren commented Feb 21, 2018

CNI Plugins v0.7.0 includes the disable_ipv6 sysctl workaround so hopefully Kubernetes IPv6 will be easier to get up and running now on newer Docker releases. Thanks for the swift response from the CNI maintainers!

@telmich

This comment has been minimized.

Copy link

telmich commented Dec 22, 2018

For the next one running into here, yes, docker 18.06 is still/also effected:

root@kube-node2:~# docker --version
Docker version 18.06.1-ce, build e68fc7a
root@kube-node2:~# docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
90e01955edcd: Pulling fs layer
90e01955edcd: Verifying Checksum
90e01955edcd: Download complete
90e01955edcd: Pull complete
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
root@kube-node2:~# 

@wuerzelchen

This comment has been minimized.

Copy link

wuerzelchen commented Jan 20, 2019

Docker 18.09 is on this list, as well (OS/Arch: linux/arm):
$ sudo docker info -f '{{.ServerVersion}}'
18.09.0
$ sudo docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
c79064c22828: Pulling fs layer
c79064c22828: Verifying Checksum
c79064c22828: Download complete
c79064c22828: Pull complete
Digest: sha256:7964ad52e396a6e045c39b5a44438424ac52e12e4d5a25d94895f2058cb863a0
Status: Downloaded newer image for busybox:latest
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

@telmich

This comment has been minimized.

Copy link

telmich commented Jan 20, 2019

@wuerzelchen Did you figure any alternative way out to get IPv6 running with k8s?

@wuerzelchen

This comment has been minimized.

Copy link

wuerzelchen commented Jan 20, 2019

Not yet. Docker itself runs with IPv6. But now I'm stuck with those premission denied issues.

@wuerzelchen

This comment has been minimized.

Copy link

wuerzelchen commented Feb 11, 2019

@telmich I'm back... did you upgrade CNI? I'm currently looking into my apt packages and see that kubernetes-cni is at v 0.6.0-00 (not sure if that's the right path to check for the correct CNI version) and in this master branch the most recent release is v6, as well. I assume that I somehow need to upgrade to a newer version. If this kubernetes-cni is related to this repo/package. I'm currently quite lost and need some effort to get where I want to.

apt-cache policy kubernetes-cni
kubernetes-cni:
  Installed: 0.6.0-00
  Candidate: 0.6.0-00
  Version table:
 *** 0.6.0-00 500
        500 https://apt.kubernetes.io kubernetes-xenial/main armhf Packages
        100 /var/lib/dpkg/status
     0.5.1-00 500
        500 https://apt.kubernetes.io kubernetes-xenial/main armhf Packages
     0.3.0.1-07a8a2-00 500
        500 https://apt.kubernetes.io kubernetes-xenial/main armhf Packages
@bboreham

This comment has been minimized.

Copy link
Member

bboreham commented Feb 12, 2019

@wuerzelchen this repo is vendor'd into Kubernetes as a copy of the Go code.
A package would be from the other repo https://github.com/containernetworking/plugins

@telmich

This comment has been minimized.

Copy link

telmich commented Feb 15, 2019

... so just for my understanding: when/how will be 0.7 available?

@bboreham

This comment has been minimized.

Copy link
Member

bboreham commented Feb 15, 2019

Releases are at https://github.com/containernetworking/plugins/releases; 0.7.4 is the latest.
The plugins are also packaged downstream, but you'd have to ask whoever does that for the specific downstream project.

@wuerzelchen

This comment has been minimized.

Copy link

wuerzelchen commented Feb 19, 2019

So, another try. Thank you @bboreham for this hint. I downloaded the arm tar ball and extracted it to /opt/cni/bin.
sudo tar xvf cni-plugins-arm-v0.7.4.tgz -C /opt/cni/bin

pi@raspberrypi:~/dl $ ls -alh /opt/cni/bin/
total 44M
drwxrwxr-x 2 root root 4.0K Nov  8 13:43 .
drwxr-xr-x 3 root root 4.0K Jan  5 21:03 ..
-rwxr-xr-x 1 root root 3.5M Nov  8 13:42 bridge
-rwxr-xr-x 1 root root 8.5M Nov  8 13:43 dhcp
-rwxr-xr-x 1 root root 2.5M Nov  8 13:42 flannel
-rwxr-xr-x 1 root root 2.8M Nov  8 13:42 host-device
-rwxr-xr-x 1 root root 2.7M Nov  8 13:43 host-local
-rwxr-xr-x 1 root root 3.2M Nov  8 13:42 ipvlan
-rwxr-xr-x 1 root root 2.7M Nov  8 13:42 loopback
-rwxr-xr-x 1 root root 3.2M Nov  8 13:43 macvlan
-rwxr-xr-x 1 root root 3.1M Nov  8 13:42 portmap
-rwxr-xr-x 1 root root 3.5M Nov  8 13:43 ptp
-rwxr-xr-x 1 root root 2.3M Nov  8 13:43 sample
-rwxr-xr-x 1 root root 2.5M Nov  8 13:42 tuning
-rwxr-xr-x 1 root root 3.2M Nov  8 13:43 vlan

Then I restarted and the same behavior exists.

pi@raspberrypi:~/dl $ sudo docker run --rm -it --net=none busybox sysctl -a | grep disable_ipv6
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
4f49904eebef: Pulling fs layer
4f49904eebef: Verifying Checksum
4f49904eebef: Download complete
4f49904eebef: Pull complete
Digest: sha256:061ca9704a714ee3e8b80523ec720c64f6209ad3f97c0ff7cb9ec7d19f15149f
Status: Downloaded newer image for busybox:latest
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

What am I missing here. Documentation on how to debug such things on my own would be very much appreciated.

aojea added a commit to aojea/kind that referenced this issue Feb 25, 2019

Bump CNI version to support IPv6
Fixes Kubernetes IPv6 problem on Docker

containernetworking/cni#531

aojea added a commit to aojea/kind that referenced this issue Mar 4, 2019

Bump CNI version to support IPv6
Fixes Kubernetes IPv6 problem on Docker

containernetworking/cni#531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.