Skip to content

fix: mount host /etc/os-release in privileged snapshot agent#113

Merged
mchmarny merged 2 commits intoNVIDIA:mainfrom
yuanchen8911:fix/snapshot-agent-host-os-release
Feb 12, 2026
Merged

fix: mount host /etc/os-release in privileged snapshot agent#113
mchmarny merged 2 commits intoNVIDIA:mainfrom
yuanchen8911:fix/snapshot-agent-host-os-release

Conversation

@yuanchen8911
Copy link
Contributor

@yuanchen8911 yuanchen8911 commented Feb 12, 2026

Summary

  • The OS collector reads /etc/os-release inside the container, which returns the container image OS (Debian 12 from golang:bookworm) instead of the host OS
  • This causes OS constraint validation to fail on Ubuntu nodes
  • Mounts the host /etc/os-release into the privileged agent pod so the collector reads the correct host OS information

Before (without host mount)

The validation ran successfully but found 2 constraint failures:

Constraint Expected Actual Status
K8s.server.version >= 1.34 >= 1.34 v1.34.3-eks-ac2d5a0 passed
OS.release.ID = ubuntu ubuntu debian failed
OS.release.VERSION_ID = 24.04 24.04 12 failed
OS.sysctl kernel >= 6.8 >= 6.8 6.14.0-1018-aws passed

After (with host mount)

All 4 constraints passed:

Constraint Expected Actual Status
K8s.server.version >= 1.34 v1.34.3-eks-ac2d5a0 passed
OS.release.ID ubuntu ubuntu passed
OS.release.VERSION_ID 24.04 24.04 passed
OS.sysctl kernel >= 6.8 6.14.0-1018-aws passed

Test plan

  • Tested on EKS cluster with H100 GPU nodes (Ubuntu 24.04)
  • eidos validate --phase readiness passes all 4 constraints
  • Unit tests pass (make test)

🤖 Generated with Claude Code

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner February 12, 2026 22:39
@yuanchen8911 yuanchen8911 requested a review from xdu31 February 12, 2026 22:42
The OS collector reads /etc/os-release inside the container, which
returns the container image OS (Debian 12 from golang:bookworm)
instead of the host OS. This causes OS constraint validation to fail
on Ubuntu nodes.

Mount the host /etc/os-release into the privileged agent pod so the
collector reads the correct host OS information.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 force-pushed the fix/snapshot-agent-host-os-release branch from 83dd301 to ddc16a2 Compare February 12, 2026 22:48
Copy link
Collaborator

@dims dims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mchmarny mchmarny merged commit b50dc73 into NVIDIA:main Feb 12, 2026
7 checks passed
@mchmarny mchmarny deleted the fix/snapshot-agent-host-os-release branch February 12, 2026 23:35
lockwobr pushed a commit that referenced this pull request Feb 12, 2026
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
lockwobr pushed a commit that referenced this pull request Feb 12, 2026
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants