-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
Agent expects exactly three parts to the colon-separated list in /proc/self/cgroup. On my kubernetes instance, cgroup contains additional colons that break the logic. This causes the agent to crash and never connect back to the server. The agent was working correctly a few days ago. We recently installed updates to the nodes, so it's possible that something has changed the cgroup behavior.
root ➜ / $ cat /proc/self/cgroup
0::/system.slice/kubepods-burstable-pod72e25f20_2e04_4dd4_95b8_dc297d70dd49.slice:cri-containerd:d24f9ccece3428c101a0aff4b8dc99f58c09e3de2ccef817c20eef3d54f604d7
coder-07468728-338a-4cd7-9f61-8106968baea8-6c4b96c7d6-cwm8c_dev.log
Relevant Log Output
2026-02-12 19:21:12.922 [warn] run exited with error ...
error= error in routine resources monitor:
github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startAgentAPI.func1
/home/runner/work/coder/coder/agent/agent.go:2209
- failed to create resources fetcher:
github.com/coder/coder/v2/agent.(*agent).run.func5
/home/runner/work/coder/coder/agent/agent.go:1073
- get cgroup statter:
github.com/coder/clistat.New
/home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/stat.go:207
- get current cgroup:
github.com/coder/clistat.(*Statter).getCgroupStatter
/home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/cgroup.go:50
- parse entry /proc/self/cgroup: %!w(<nil>):
github.com/coder/clistat.currentProcCgroup
/home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/cgroup.go:133Expected Behavior
The agent should account for additional colons in /proc/self/cgroup.
Steps to Reproduce
- Create a workspace using the kubernetes devcontainer template
I'm not sure which components of our kubernetes environment are required to reproduce the issue.
Environment
- Coder server host OS: Ubuntu server 24.04
- rke2 node OS: Ubuntu server 24.04
- Coder version: Multiple versions, including v2.30.1
Additional Context
The issue is new (previously worked fine), The issue occurs consistently, The issue happens on multiple deployments, I have tested this on the latest version