Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeinit pods failing in 1.15.5 #32674

Open
2 of 3 tasks
dlahn opened this issue May 22, 2024 · 8 comments
Open
2 of 3 tasks

nodeinit pods failing in 1.15.5 #32674

dlahn opened this issue May 22, 2024 · 8 comments
Labels
info-completed The GH issue has received a reply from the author kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. sig/agent Cilium agent related.

Comments

@dlahn
Copy link

dlahn commented May 22, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

We have recently tried to upgrade to 1.15.5 and the latest pre-release, but our nodeinit pods are failing with the following error:

nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'

Reverting to 1.15.4 resolves the issue.

Cilium Version

1.15.5

Kernel Version

.

Kubernetes Version

v1.30.0-gke.145700

Regression

1.15.4

Sysdump

No response

Relevant log output

nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@dlahn dlahn added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 22, 2024
@dlahn
Copy link
Author

dlahn commented May 22, 2024

@lmb lmb added kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. sig/agent Cilium agent related. labels May 23, 2024
@lmb
Copy link
Contributor

lmb commented May 23, 2024

Do you have any custom helm config related to the nodeinit pod?

@lmb lmb added the need-more-info More information is required to further debug or fix the issue. label May 23, 2024
@dlahn
Copy link
Author

dlahn commented May 23, 2024

@lmb

  nodeinit:
    enabled: true
    reconfigureKubelet: true
    removeCbrBridge: true

@github-actions github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels May 23, 2024
@aanm
Copy link
Member

aanm commented May 23, 2024

@dlahn can you provide the steps you used for both 1.15.4 and 1.15.5? Thank you

@aanm aanm added need-more-info More information is required to further debug or fix the issue. and removed info-completed The GH issue has received a reply from the author labels May 23, 2024
@dlahn
Copy link
Author

dlahn commented May 23, 2024

@aanm I think it may have happened here, https://github.com/cilium/cilium/pull/31641/files#diff-0ea42ad21164b19bec1732225e254d3096d1e4040481c00053669287d81015fe, so I mispoke, and I think the last working verison was 1.15.3. If we simply upgrade the helm chart to the newest version, we receive these errors.

The only way to get 1.15.4 to work is to add this to the nodeinit section:

  nodeinit:
    enabled: true
    reconfigureKubelet: true
    removeCbrBridge: true
    image:
      tag: "62093c5c233ea914bfa26a10ba41f8780d9b737f"

However, this doesn't work in 1.15.5

@github-actions github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels May 23, 2024
@dlahn
Copy link
Author

dlahn commented May 29, 2024

Any ideas here?

@jbmolle
Copy link

jbmolle commented Jun 4, 2024

Hi!
I raised this issue on K8s Github kubernetes/kubernetes#125069
I don't know if your error is related to that but I couldn't start Cilium either with version 1.15.5 because the pod annotations were removed and replaced by appArmorProfile type Unconfined.
But the appArmorProfile Unconfined doesn't work for me with containerd. So if you also use containerd you can try to reput the annotations like on 1.15.4:
container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined"
container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined"
container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined"
container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"

@dlahn
Copy link
Author

dlahn commented Jun 4, 2024

Adding these annotations seems to have resolved the issue for us.

@ti-mo ti-mo removed the needs/triage This issue requires triaging to establish severity and next steps. label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-completed The GH issue has received a reply from the author kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

5 participants