Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/k8s: detect k8s mode if env variable K8S_NODE_NAME is set #11021

Merged
merged 1 commit into from Apr 16, 2020

Conversation

aanm
Copy link
Member

@aanm aanm commented Apr 16, 2020

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set KUBERNETES_SERVICE_HOST nor
KUBERNETES_SERVICE_PORT in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on KUBERNETES_SERVICE_HOST nor
KUBERNETES_SERVICE_PORT for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins andre@cilium.io

Do not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` environment variables to detect if cilium is running in k8s mode

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm added kind/bug This is a bug in the Cilium logic. pending-review priority/high This is considered vital to an upcoming release. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Apr 16, 2020
@aanm aanm requested a review from a team April 16, 2020 15:39
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.3 Apr 16, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.6.9 Apr 16, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Apr 16, 2020
@aanm
Copy link
Member Author

aanm commented Apr 16, 2020

test-me-please

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.005%) to 46.772% when pulling 97827ab on pr/add-k8s-mode-enabled-detector into e92fd24 on master.

@aanm aanm merged commit 604dab4 into master Apr 16, 2020
@aanm aanm deleted the pr/add-k8s-mode-enabled-detector branch April 16, 2020 18:15
1.8.0 automation moved this from In progress to Merged Apr 16, 2020
@joestringer
Copy link
Member

joestringer commented Apr 16, 2020

This broke the dev VM for developers not running k8s there like @jrajahalme and I, fix will be out soon.

@aanm
Copy link
Member Author

aanm commented Apr 17, 2020

@joestringer @jrajahalme how? The /etc/cilium/sysconfig file contains K8S_NODE_NAME=k8s1 ah, not running. But still if you are not running k8s why would cilium detect if it was running with k8s?

raybejjani added a commit to raybejjani/cilium that referenced this pull request Apr 17, 2020
…t k8s

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See cilium#11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
@joestringer
Copy link
Member

We unconditionally write the hostname into the sysconfig file, for example K8S_NODE_NAME=runtime1. @jrajahalme said he was going to send a fix but i didn't look to see if it was out yet.

pchaigno pushed a commit that referenced this pull request Apr 21, 2020
…t k8s

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
jrajahalme added a commit that referenced this pull request Apr 21, 2020
Fixes: #11021
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
christarazi pushed a commit that referenced this pull request Apr 22, 2020
Fixes: #11021
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.3 Apr 22, 2020
jrfastab pushed a commit that referenced this pull request Apr 23, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
jrfastab pushed a commit that referenced this pull request Apr 28, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
joestringer pushed a commit that referenced this pull request Apr 29, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.3 Apr 29, 2020
christarazi pushed a commit that referenced this pull request Apr 30, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.9 Apr 30, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.9 Apr 30, 2020
joestringer pushed a commit that referenced this pull request May 4, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.9 May 13, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.9 May 13, 2020
pchaigno pushed a commit that referenced this pull request Oct 5, 2020
[ upstream commit fff6d6c ]

Fixes: #11021
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
gandro pushed a commit that referenced this pull request Oct 6, 2020
[ upstream commit fff6d6c ]

Fixes: #11021
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. priority/high This is considered vital to an upcoming release. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.6.9
Backport done to v1.6
1.7.3
Backport done to v1.7
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants