Conversation
shiv-tyagi
approved these changes
Apr 18, 2026
Member
shiv-tyagi
left a comment
There was a problem hiding this comment.
Just one thought. We can merge though.
| privileged: spec.privileged, | ||
| host_ipc: spec.host_ipc, | ||
| shm_size: spec.shm_size.clone().unwrap_or_default(), | ||
| extra_resources: spec.extra_resources.clone(), |
Member
There was a problem hiding this comment.
Note to self:
Does the extra device plugin resources field would only make sense for k8s or spur in general? If later is not true, it might we worth keeping them within spur-k8s somehow.
I will take a look on this later.
…Resources (#85, #86, #87, #88) Thread five new fields through the full CRD → core JobSpec → proto → pod creation pipeline to support GPU training workloads on Kubernetes. Fixes: - #85: hostNetwork field was in CRD but silently dropped — now applied to PodSpec - #86: Add privileged mode — sets SecurityContext.privileged on container - #87: Add hostIPC and shmSize — hostIPC enables IPC namespace sharing for NCCL, shmSize creates an emptyDir volume at /dev/shm with the specified size limit (e.g., "64Gi") - #88: Add extraResources map — arbitrary device plugin resources (e.g., rdma/devices, nvidia.com/mig) added to pod resource requests/limits Example SpurJob: spec: hostNetwork: true privileged: true hostIpc: true shmSize: "64Gi" extraResources: rdma/hca_shared_devices_a: "1" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22dbe94 to
e9d4bde
Compare
This was referenced Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #85, #86, #87, #88 — adds five K8s pod options required for GPU training workloads.
Changes
Thread five new fields through the CRD → core JobSpec → proto → pod creation pipeline:
hostNetworkPodSpec.host_network = trueprivilegedContainer.security_context.privileged = truehostIpcPodSpec.host_ipc = trueshmSize/dev/shmwith sizeLimitextraResourcesrdma/devices: 1)Example SpurJob
Test plan
@amd-vpenumal — these are the fields you requested. Please verify the CRD schema matches your expected usage.
🤖 Generated with Claude Code