Skip to content

fix(sandbox): use cgroup pids controller instead of RLIMIT_NPROC on K8s#1325

Closed
Arnonrgo wants to merge 1 commit into
NVIDIA:mainfrom
Arnonrgo:fix/cgroup-pids-controller
Closed

fix(sandbox): use cgroup pids controller instead of RLIMIT_NPROC on K8s#1325
Arnonrgo wants to merge 1 commit into
NVIDIA:mainfrom
Arnonrgo:fix/cgroup-pids-controller

Conversation

@Arnonrgo
Copy link
Copy Markdown
Contributor

Summary

  • RLIMIT_NPROC is per-UID kernel-wide, not per-container — on dense K8s nodes, containers sharing the same UID exhaust the shared quota causing spurious EAGAIN from pthread_create
  • When cgroup v2 pids controller is active, skip RLIMIT_NPROC entirely (cgroup provides per-container fork-bomb protection)
  • Fall back to RLIMIT_NPROC on systems without cgroup pids support

Changes

  • Replace inline RLIMIT_NPROC logic with limit_pids() that detects cgroup v2 pids controller
  • Add OPENSHELL_MAX_PIDS env var for configurable limit (default: 512)
  • Add cgroup_pids_active() detection via /sys/fs/cgroup/pids.max

Testing

  • Unit test: default max_pids returns 512
  • Unit test: OPENSHELL_MAX_PIDS env override works
  • Unit test: invalid env value falls back to default
  • Unit test: cgroup_pids_active() doesn't panic (Linux only)
  • cargo check -p openshell-sandbox passes

RLIMIT_NPROC is a per-UID kernel-wide limit, not per-container. In
Kubernetes without user namespace isolation, all containers sharing
the same UID on a node share the RLIMIT_NPROC quota. On dense nodes
this causes spurious EAGAIN from pthread_create when unrelated pods
exhaust the limit.

When a cgroup v2 pids controller is active (/sys/fs/cgroup/pids.max
exists), skip RLIMIT_NPROC entirely. Fall back to RLIMIT_NPROC on
systems without cgroup pids support. The limit is now configurable
via OPENSHELL_MAX_PIDS (default: 512).
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@github-actions
Copy link
Copy Markdown

Thank you for your interest in contributing to OpenShell, @Arnonrgo.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions Bot closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant