Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium no longer runs on Bottlerocket Linux as of 1.11.3 #19256

Closed
2 tasks done
alex-berger opened this issue Mar 29, 2022 · 9 comments
Closed
2 tasks done

Cilium no longer runs on Bottlerocket Linux as of 1.11.3 #19256

alex-berger opened this issue Mar 29, 2022 · 9 comments
Assignees
Labels
area/helm Impacts helm charts and user deployment experience integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related.

Comments

@alex-berger
Copy link
Contributor

alex-berger commented Mar 29, 2022

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

With the changes introduced in #18897 the Cilium agent no longer starts on Bottlerocket Linux nodes.

This is because the node-init DaemonSet cannot write the timestamp to the bootstrapFile ("/tmp/cilium-bootstrap-time") in https://github.com/cilium/cilium/blob/v1.11.3/install/kubernetes/cilium/files/nodeinit/startup.bash#L123 as there is no date command available on Bottlerocket Linux. Instead, it writes an empty bootstrapFile, which in turn causes the test for a non-empty bootstrapFile at https://github.com/cilium/cilium/blob/v1.11.3/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml#L386 to fail.

Due to the above described problem, the agent is stuck forever waiting for the bootstrapFile to become non-empty, which will never happen.

Cilium Version

1.11.3

Kernel Version

5.10.102

Kubernetes Version

1.21

Sysdump

No response

Relevant log output

Waiting on node-init to run...
Waiting on node-init to run...
Waiting on node-init to run...
Waiting on node-init to run...
Waiting on node-init to run...
...

Anything else?

Looks like this is related to #15393 and that the same work-around applies, disabling the node-init DaemonSet.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@alex-berger alex-berger added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Mar 29, 2022
@pchaigno pchaigno added the kind/community-report This was reported by a user in the Cilium community, eg via Slack. label Mar 31, 2022
@pchaigno
Copy link
Member

cc @aanm

@aanm
Copy link
Member

aanm commented Mar 31, 2022

@alex-berger what's the output of ls -la /bin from a Bottlerocket Linux node?

@aanm aanm self-assigned this Mar 31, 2022
@alex-berger
Copy link
Contributor Author

@aanm bottlerocket has no tools on the host itself, not even ls nor is there a shell. Everything is inside a container on bottlerocket.

@ubaniabalogun
Copy link

Hey is this now a known issue? I'm experiencing errors install Cilium in a bottlerocket EKS node group as well and would like to know if I should debug harder or stop debugging because its broken. Thanks,

@DamiaPoquet
Copy link

I'm experiencing the same issue. @aanm is it a confirmed bug already?

@aanm
Copy link
Member

aanm commented May 25, 2022

@ubaniabalogun @DamiaPoquet correct this is a confirmed bug

@carlosjgp
Copy link

carlosjgp commented Aug 19, 2022

You don't need node-init (mount bpf FS) when using BottleRocket just disable it

nodeinit:
  enabled: false # false when using bottlerocket

Edit:

for completeness
bottlerocket-os/bottlerocket#1405 (comment)

@christarazi
Copy link
Member

We expect to ship a fix for this in v1.13.2.

@christarazi christarazi added area/helm Impacts helm charts and user deployment experience integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Mar 24, 2023
@nebril
Copy link
Member

nebril commented May 11, 2023

This has been fixed as of 1.13.2, please refer to this comment for details: #15393 (comment)

@nebril nebril closed this as completed May 11, 2023
@christarazi christarazi added the sig/agent Cilium agent related. label May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Impacts helm charts and user deployment experience integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

8 participants