Skip to content

bug: Agent cgroup parsing error on Kubernetes #34

@sdunlap-afit

Description

@sdunlap-afit

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Agent expects exactly three parts to the colon-separated list in /proc/self/cgroup. On my kubernetes instance, cgroup contains additional colons that break the logic. This causes the agent to crash and never connect back to the server. The agent was working correctly a few days ago. We recently installed updates to the nodes, so it's possible that something has changed the cgroup behavior.

root ➜ / $ cat /proc/self/cgroup 
0::/system.slice/kubepods-burstable-pod72e25f20_2e04_4dd4_95b8_dc297d70dd49.slice:cri-containerd:d24f9ccece3428c101a0aff4b8dc99f58c09e3de2ccef817c20eef3d54f604d7

coder-07468728-338a-4cd7-9f61-8106968baea8-6c4b96c7d6-cwm8c_dev.log

Relevant Log Output

2026-02-12 19:21:12.922 [warn]  run exited with error ...
    error= error in routine resources monitor:
               github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startAgentAPI.func1
                   /home/runner/work/coder/coder/agent/agent.go:2209
             - failed to create resources fetcher:
               github.com/coder/coder/v2/agent.(*agent).run.func5
                   /home/runner/work/coder/coder/agent/agent.go:1073
             - get cgroup statter:
               github.com/coder/clistat.New
                   /home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/stat.go:207
             - get current cgroup:
               github.com/coder/clistat.(*Statter).getCgroupStatter
                   /home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/cgroup.go:50
             - parse entry /proc/self/cgroup: %!w(<nil>):
               github.com/coder/clistat.currentProcCgroup
                   /home/runner/go/pkg/mod/github.com/coder/clistat@v1.2.0/cgroup.go:133

Expected Behavior

The agent should account for additional colons in /proc/self/cgroup.

Steps to Reproduce

  1. Create a workspace using the kubernetes devcontainer template

I'm not sure which components of our kubernetes environment are required to reproduce the issue.

Environment

  • Coder server host OS: Ubuntu server 24.04
  • rke2 node OS: Ubuntu server 24.04
  • Coder version: Multiple versions, including v2.30.1

Additional Context

The issue is new (previously worked fine), The issue occurs consistently, The issue happens on multiple deployments, I have tested this on the latest version

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions