New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes 1.12 and flannel does not work out of the box #1044

Open
outcoldman opened this Issue Sep 28, 2018 · 21 comments

Comments

Projects
None yet
@outcoldman
Contributor

outcoldman commented Sep 28, 2018

Seems like a new behavior with kubeadm, after I created a master, I see two taints on the master node:

Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule

But https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml has toleration only to

- key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

I added a toleration to kube-flannel.yml to solve the issue:

      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoSchedule

Expected Behavior

The docs should work with flannel out of the box
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

Current Behavior

Possible Solution

Maybe instead it should use a toleration without a key?

tolerations:
        - effect: NoSchedule
          operator: Exists

Steps to Reproduce (for bugs)

  1. Bootstrap master node with kubeadm
  2. Apply as suggested https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml from the docs.

Context

Your Environment

  • Flannel version: v0.10.0
  • Backend used (e.g. vxlan or udp):
  • Etcd version:
  • Kubernetes version (if used): 1.12
  • Operating System and version: Linux master1 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux, "Ubuntu 16.04.5 LTS"
  • Link to your project (optional):
@geerlingguy

This comment has been minimized.

geerlingguy commented Sep 28, 2018

I can confirm as well—on 1.11.3 the configuration applies correctly. On 1.12.0 it does not.

@Balonno

This comment has been minimized.

Balonno commented Sep 28, 2018

usign the toleration without a key worked for me. Would this be the solution?

@caseydavenport

This comment has been minimized.

caseydavenport commented Sep 28, 2018

usign the toleration without a key worked for me. Would this be the solution?

That sounds fine to me - flannel should probably tolerate all NoSchedule taints, since it's a critical piece of infrastructure.

Anyone want to submit a PR?

@outcoldman

This comment has been minimized.

Contributor

outcoldman commented Sep 28, 2018

@caseydavenport I have submitted PR against master https://github.com/coreos/flannel/pull/1045/files

But it will be good to have the same fix for the tag v0.10.0, considering that in a lot of places there is a reference to this path https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

Considering that this is just a configuration change, maybe make a release v0.10.1 and update the Kubernetes documentation?

@jmyung

This comment has been minimized.

jmyung commented Sep 29, 2018

thanks @outcoldman. it helps :)

alanpeng added a commit to wise2c-devops/breeze that referenced this issue Sep 30, 2018

alanpeng added a commit to wise2c-devops/breeze that referenced this issue Sep 30, 2018

schu added a commit to schu/kubedee that referenced this issue Sep 30, 2018

manifests/kube-flannel: ignore all taints (hotfix)
There seems to be an issue and deadlock with Flannel on v1.12 clusters
where Flannel pods don't start on unready nodes and nodes don't become
ready w/o Flannel / container networking.

Issue upstream, albeit with kubeadm:

coreos/flannel#1044

Follow up on commit or revert.
@adhipati-blambangan

This comment has been minimized.

adhipati-blambangan commented Oct 1, 2018

thanks @outcoldman ! it works like a charm. ;)

@mauilion

This comment has been minimized.

mauilion commented Oct 1, 2018

Flannel should probably set

tolerations:
        - operator: Exists

as the default tolerations set. This will ensure that the flannel ds tolerates all taints.

@ReSearchITEng

This comment has been minimized.

ReSearchITEng commented Oct 5, 2018

For anyone willing to test the flannel fix for 1.12 ,
kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

@NerdyShawn

This comment has been minimized.

NerdyShawn commented Oct 5, 2018

For anyone willing to test the flannel fix for 1.12 ,
kubeadm -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

#trying on a pi2 b+ master
`HypriotOS/armv7: root@piNode01 in ~

kubeadm -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Error: unknown command "apply" for "kubeadm"
Run 'kubeadm --help' for usage.
error: unknown command "apply" for "kubeadm"
`

@cablespaghetti

This comment has been minimized.

cablespaghetti commented Oct 5, 2018

@rberg2

This comment has been minimized.

rberg2 commented Oct 5, 2018

@NerdyShawn

This comment has been minimized.

NerdyShawn commented Oct 5, 2018

So that got me closer, but still no dice, here is the docker output the apiserver container seems no bueno. Sorry I'm struggling with text formatting so here is the screenshot.
kubectl_apply_1 12

@cablespaghetti

This comment has been minimized.

cablespaghetti commented Oct 5, 2018

Hi @NerdyShawn,

I don't think you've got your kubectl configured correctly to connect to your cluster. As it seems like @rberg2 has managed to get this working, maybe it would be good to continue this on one of the support channels like slack rather than this issue.

@ReSearchITEng

This comment has been minimized.

ReSearchITEng commented Oct 5, 2018

Sorry, it was a typo, it's kubectl.

For those interested, k8s 1.12 deployment with all the goodies (ingress, dashboard, optional vsphere*, etc) automated with ansible and maintained here: github.com/ReSearchITEng/kubeadm-playbook/
The above has been scripted there as well.

@tallaxes

This comment has been minimized.

tallaxes commented Oct 6, 2018

@ReSearchITEng, confirm works (1.12.1).
The link to ansible playbook is broken.

@hegdedarsh

This comment has been minimized.

hegdedarsh commented Oct 7, 2018

Hello,
Even with the tolerations, it still says , i used the below link to run the flannel

https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Please find the output of the pods:-

[user@darshan-p-hegde-89ca8c531 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-9r27x 0/1 ContainerCreating 0 6m
coredns-576cbf47c7-qc4tm 0/1 ContainerCreating 0 6m
etcd-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 4m54s
kube-apiserver-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m2s
kube-controller-manager-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m2s
kube-flannel-ds-amd64-gm5z7 0/1 CrashLoopBackOff 5 4m56s
kube-proxy-mbtcj 1/1 Running 0 6m
kube-scheduler-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m13s

I have described the flannel pod and and the output is below:-

Name: kube-flannel-ds-amd64-gm5z7
Namespace: kube-system
Priority: 0
PriorityClassName:
Node: darshan-p-hegde-89ca8c531.mylabserver.com/172.31.42.12
Start Time: Sun, 07 Oct 2018 06:37:31 +0000
Labels: app=flannel
controller-revision-hash=6697bf5fc6
pod-template-generation=1
tier=node
Annotations:
Status: Running
IP: 172.31.42.12
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://b085e4a7d80b26730dc795d4a72b8a278ddc4ba71e5c463bfcd0172b793de349
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port:
Host Port:
Command:
cp
Args:
-f
/etc/kube-flannel/cni-conf.json
/etc/cni/net.d/10-flannel.conflist
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 07 Oct 2018 06:37:33 +0000
Finished: Sun, 07 Oct 2018 06:37:33 +0000
Ready: True
Restart Count: 0
Environment:
Mounts:
/etc/cni/net.d from cni (rw)
/etc/kube-flannel/ from flannel-cfg (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-llwn4 (ro)
Containers:
kube-flannel:
Container ID: docker://a8096a56009a0566b53e4b0aac09430b75120979e63dbe32eb8ed91053666a77
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port:
Host Port:
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 07 Oct 2018 06:43:46 +0000
Finished: Sun, 07 Oct 2018 06:43:48 +0000
Ready: False
Restart Count: 6
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: kube-flannel-ds-amd64-gm5z7 (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/etc/kube-flannel/ from flannel-cfg (rw)
/run from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-llwn4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
cni:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
flannel-cfg:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-flannel-cfg
Optional: false
flannel-token-llwn4:
Type: Secret (a volume populated by a Secret)
SecretName: flannel-token-llwn4
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 6m57s default-scheduler Successfully assigned kube-system/kube-flannel-ds-amd64-gm5z7 to darshan-p-hegde-89ca8c531.mylabserver.com
Normal Pulling 6m57s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com pulling image "quay.io/coreos/flannel:v0.10.0-amd64"
Normal Pulled 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Successfully pulled image "quay.io/coreos/flannel:v0.10.0-amd64"
Normal Created 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Created container
Normal Started 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Started container
Normal Started 6m5s (x4 over 6m53s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Started container
Normal Pulled 5m11s (x5 over 6m54s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Container image "quay.io/coreos/flannel:v0.10.0-amd64" already present on machine
Normal Created 5m11s (x5 over 6m53s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Created container
Warning BackOff 105s (x23 over 6m48s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Back-off restarting failed container

Please find the output of the coreos pods:-

Warning FailedCreatePodSandBox 7m50s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5f6770d9dfcb53738a0dd428b86e815d4d85e9b71a76d17b10b1f764f102fb61" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 7m49s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "009e9e0099f993086300649a89995a28a0fdf1a128863f7a71e3ff1973788c26" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 7m48s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ea0ddaf5c411dd026cfd23366e49424526b7cc547652ca262a346f4c800f0c04" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory

@outcoldman

This comment has been minimized.

Contributor

outcoldman commented Oct 7, 2018

@hegdedarsh possible that it is a different problem, but I would suggest using a released version https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml, modify the tolerations and give it a try.

@pkeuter

This comment has been minimized.

pkeuter commented Oct 11, 2018

This fixes the issue for me. Thanks for the PR!

@telecodani

This comment has been minimized.

telecodani commented Oct 11, 2018

Adding the toleration in the Flannel yaml works for me also. Tested on v1.12.1 Kubernetes. Thanks.

@benn0r

This comment has been minimized.

benn0r commented Oct 11, 2018

I am using the yaml file recommended in this issue. But for me nodePort and "externalIps" doesn't work anymore unless its from the same node that the pods are located on. If i try to telnet via the master ip i get a timeout.
this is since the upgrade to kubernetes 1.12.

Is this a problem with flannel?

@sarlacpit

This comment has been minimized.

sarlacpit commented Oct 11, 2018

I am on a fresh install of k8s 1.12 and have just tried downloading v0.10 and the tolerations seem to exist already. So I applied the yml

      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

It tried creating the flannel pod but came up with 'Error' and eventually "CrashLoopBackOff".
Still very new to k8s. Any debug I can provide let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment