Skip to content

Add PID + mount namespace isolation with GPU device restriction (#99 PR 2/6)#102

Merged
powderluv merged 1 commit intomainfrom
users/powderluv/isolation-pr2-namespaces
Apr 18, 2026
Merged

Add PID + mount namespace isolation with GPU device restriction (#99 PR 2/6)#102
powderluv merged 1 commit intomainfrom
users/powderluv/isolation-pr2-namespaces

Conversation

@powderluv
Copy link
Copy Markdown
Collaborator

@powderluv powderluv commented Apr 18, 2026

Summary

Second PR in the isolation series (#99). Adds PID and mount namespace isolation for both bare-metal and container execution paths.

Changes

  • PID namespace: unshare --pid --fork — jobs see only their own processes
  • Mount namespace: Private /tmp, /dev/shm per job; GPU /dev/dri restricted to allocated devices via selective bind-mount
  • Container path: Added --pid --fork to existing unshare --mount in container.rs
  • Bare-metal path: New namespace wrapper script in executor.rs
  • TrackedJob: has_pid_namespace flag for correct nsenter behavior in exec_in_job

GPU access verification

  • Only allocated renderD* devices bind-mounted into private /dev/dri
  • /dev/kfd preserved for AMD GPU access
  • Unallocated GPU devices not visible in job's mount namespace

Addresses

Test plan

  • Job with PID namespace: ps aux shows only job processes (not host)
  • Job with mount namespace: ls /tmp is empty (private tmpfs, not host)
  • GPU device restriction: ls /dev/dri/ inside 1-GPU job shows only 1 renderD device
  • GPU access after setuid (per @shiv-tyagi): non-root user with video/render group can access /dev/kfd and /dev/dri/renderD*. PR Add UID/GID enforcement, pids.max, and OOM isolation (#99 PR 1/6) #100 needs initgroups() to inherit supplementary groups — without it, setuid to a non-root UID loses video/render group membership and GPU device access fails with EACCES.
  • Attach to namespaced job: nsenter --mount --pid enters correct namespaces
  • Container path: ps aux inside container shows only container processes
  • who command inside job: only shows current session (not host users)
  • rocm-smi inside namespaced job: shows only allocated GPU
  • HIP runtime test: hipGetDeviceCount returns correct count inside namespace
  • Full test suite passes (0 failures)

Note on supplementary groups

As @shiv-tyagi pointed out, Command::uid().gid() sets primary UID/GID but does NOT set supplementary groups. GPU devices are typically root:video or root:render. PR #100 (setuid PR) needs to call initgroups(username, gid) in pre_exec to ensure the job process inherits video, render, and other groups. Without this, GPU access will fail for non-root users.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member

@shiv-tyagi shiv-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test plan, we should also verify if a job process running under a non-root uid and gid is correctly able to access the gpu device nodes.

The gpu nodes are restricted to video and render groups, we are setting uid and gid for the process, which doesn't seem to inherit supplementary groups of the user running the commands. I suspect that might lead to GPUs not being accessible to the process.

@powderluv
Copy link
Copy Markdown
Collaborator Author

Good catch @shiv-tyagi — you're right that Command::uid().gid() doesn't set supplementary groups. GPU devices are typically owned by root:video or root:render, so the job process needs to be in those groups.

I'll update PR #100 (setuid PR) to call initgroups() in pre_exec to inherit all supplementary groups for the target user. This ensures the process gets video, render, and any other groups the user belongs to.

Updated test plan for this PR:

  • Job with PID namespace: ps aux shows only job processes
  • Job with mount namespace: private /tmp, only allocated GPU in /dev/dri
  • GPU access after setuid: non-root user with video/render group membership can access /dev/kfd and /dev/dri/renderD*
  • Attach to namespaced job: nsenter with --pid flag works correctly

Will add initgroups() call in a follow-up to PR #100.

…PR 2/6)

Jobs now run in isolated PID and mount namespaces (when spurd is root):

- PID namespace: jobs see only their own processes
- Mount namespace: private /tmp, /dev/shm; GPU /dev/dri restricted to
  allocated devices via selective bind-mount
- Container path: added --pid --fork to existing unshare --mount
- TrackedJob: has_pid_namespace flag for correct nsenter in exec_in_job

Non-root fallback: skip namespaces, current behavior preserved.

Closes: part of #99

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shiv-tyagi
Copy link
Copy Markdown
Member

Good catch @shiv-tyagi — you're right that Command::uid().gid() doesn't set supplementary groups. GPU devices are typically owned by root:video or root:render, so the job process needs to be in those groups.

I'll update PR #100 (setuid PR) to call initgroups() in pre_exec to inherit all supplementary groups for the target user. This ensures the process gets video, render, and any other groups the user belongs to.

Updated test plan for this PR:

  • Job with PID namespace: ps aux shows only job processes
  • Job with mount namespace: private /tmp, only allocated GPU in /dev/dri
  • GPU access after setuid: non-root user with video/render group membership can access /dev/kfd and /dev/dri/renderD*
  • Attach to namespaced job: nsenter with --pid flag works correctly

Will add initgroups() call in a follow-up to PR #100.

Submitted #107 to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants