Skip to content

fix(spur-k8s): support SpurJobs across all namespaces#78

Merged
shiv-tyagi merged 2 commits intoROCm:mainfrom
shiv-tyagi:global-namespace-support
Apr 14, 2026
Merged

fix(spur-k8s): support SpurJobs across all namespaces#78
shiv-tyagi merged 2 commits intoROCm:mainfrom
shiv-tyagi:global-namespace-support

Conversation

@shiv-tyagi
Copy link
Copy Markdown
Member

The operator was previously scoped to a single namespace (--namespace, default spur). As a result, any SpurJob created outside that namespace was never picked up by the controller.


What was broken

  • The controller only watched one namespace, so jobs in other namespaces were ignored.
  • Cleanup logic used the operator’s namespace instead of the job’s namespace, causing deletions to fail or target the wrong resources.

Why this matters

Running all workloads in the operator’s namespace is not a good security model. User jobs should run in their own namespaces so they can benefit from proper RBAC, quotas, and network policies. The previous setup forced everything into a shared namespace with no isolation. This forces job pods to run in the controller’s namespace. If those pods use a ServiceAccount with elevated permissions in that namespace, it can lead to unintended privilege escalation and potential security risks.

What’s changed

  • The operator now watches SpurJobs cluster-wide.
  • Jobs are created and managed in their own namespaces.
  • Cleanup always happens in the correct namespace.

How it works

spurctld is namespace-agnostic and only deals with job IDs. To bridge that, the operator adds a spur.ai/job-id label to each SpurJob. When a job is dispatched, the system looks up the namespace using this label—making Kubernetes the source of truth and avoiding in-memory state.

Why cluster-wide scope

A single operator managing the whole cluster is the intended model (similar to Volcano or Argo). Running one operator per namespace would add unnecessary complexity, and the RBAC was already cluster-scoped anyway.

Before / After

Before (job not admitted outside spur):
Screenshot 2026-04-13 204651

After (job admitted in its own namespace):
Screenshot 2026-04-13 205140

@shiv-tyagi
Copy link
Copy Markdown
Member Author

Will rebase and mark ready after #79 is merged.

@shiv-tyagi shiv-tyagi force-pushed the global-namespace-support branch 4 times, most recently from a07f4fe to 4df279d Compare April 14, 2026 09:00
@shiv-tyagi
Copy link
Copy Markdown
Member Author

This PR has a test file which would conflict with #81. I will rebase and merge post that. @powderluv Please approve if this looks okay to you. I will take care of merging on CI pass.

@shiv-tyagi shiv-tyagi marked this pull request as ready for review April 14, 2026 09:15
@shiv-tyagi shiv-tyagi requested a review from powderluv April 14, 2026 09:15
@shiv-tyagi shiv-tyagi force-pushed the global-namespace-support branch from 4df279d to 730a2a3 Compare April 14, 2026 13:39
@shiv-tyagi
Copy link
Copy Markdown
Member Author

Resolved the merge conflict with main. CI is looking good. Merging now. Thanks for the review @powderluv.

@shiv-tyagi shiv-tyagi merged commit 7b885e4 into ROCm:main Apr 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants