Skip to content

Add K8s pod options: hostNetwork, privileged, hostIPC, shmSize, extraResources (#85, #86, #87, #88)#98

Merged
powderluv merged 1 commit intomainfrom
users/powderluv/fix-85-host-network
Apr 18, 2026
Merged

Add K8s pod options: hostNetwork, privileged, hostIPC, shmSize, extraResources (#85, #86, #87, #88)#98
powderluv merged 1 commit intomainfrom
users/powderluv/fix-85-host-network

Conversation

@powderluv
Copy link
Copy Markdown
Collaborator

Summary

Fixes #85, #86, #87, #88 — adds five K8s pod options required for GPU training workloads.

Changes

Thread five new fields through the CRD → core JobSpec → proto → pod creation pipeline:

Field Issue Pod Effect
hostNetwork #85 PodSpec.host_network = true
privileged #86 Container.security_context.privileged = true
hostIpc #87 PodSpec.host_ipc = true
shmSize #87 emptyDir volume at /dev/shm with sizeLimit
extraResources #88 Added to pod resource requests/limits (e.g., rdma/devices: 1)

Example SpurJob

apiVersion: spur.ai/v1alpha1
kind: SpurJob
spec:
  image: nvcr.io/nvidia/pytorch:latest
  hostNetwork: true
  privileged: true
  hostIpc: true
  shmSize: "64Gi"
  extraResources:
    rdma/hca_shared_devices_a: "1"
  gpus:
    count: 8
    gpuType: mi300x

Test plan

  • Full test suite passes (0 failures)
  • Existing CRD parse tests updated with new fields
  • K8s integration: deploy SpurJob with hostNetwork=true, verify pod has host networking

@amd-vpenumal — these are the fields you requested. Please verify the CRD schema matches your expected usage.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member

@shiv-tyagi shiv-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thought. We can merge though.

privileged: spec.privileged,
host_ipc: spec.host_ipc,
shm_size: spec.shm_size.clone().unwrap_or_default(),
extra_resources: spec.extra_resources.clone(),
Copy link
Copy Markdown
Member

@shiv-tyagi shiv-tyagi Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self:

Does the extra device plugin resources field would only make sense for k8s or spur in general? If later is not true, it might we worth keeping them within spur-k8s somehow.

I will take a look on this later.

…Resources (#85, #86, #87, #88)

Thread five new fields through the full CRD → core JobSpec → proto →
pod creation pipeline to support GPU training workloads on Kubernetes.

Fixes:
- #85: hostNetwork field was in CRD but silently dropped — now applied
  to PodSpec
- #86: Add privileged mode — sets SecurityContext.privileged on container
- #87: Add hostIPC and shmSize — hostIPC enables IPC namespace sharing
  for NCCL, shmSize creates an emptyDir volume at /dev/shm with the
  specified size limit (e.g., "64Gi")
- #88: Add extraResources map — arbitrary device plugin resources
  (e.g., rdma/devices, nvidia.com/mig) added to pod resource
  requests/limits

Example SpurJob:
  spec:
    hostNetwork: true
    privileged: true
    hostIpc: true
    shmSize: "64Gi"
    extraResources:
      rdma/hca_shared_devices_a: "1"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@powderluv powderluv force-pushed the users/powderluv/fix-85-host-network branch from 22dbe94 to e9d4bde Compare April 18, 2026 16:57
@powderluv powderluv merged commit a94bbd8 into main Apr 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRD fields hostNetwork silently dropped — never applied to pods

2 participants