fix(controller): recreate InferenceService Deployment on immutable selector change (#606)#607
Merged
Defilan merged 1 commit intoJun 2, 2026
Conversation
…lector change (defilantech#606) A Deployment created by an older operator can carry a smaller pod selector than the current operator generates (pre-0.8 used {app: <name>}; 0.8.x also adds inference.llmkube.dev/service). Deployment.spec.selector is immutable, so the in-place Update in reconcileDeployment failed permanently with "field is immutable" and the controller hot-looped, unable to reconcile any pre-0.8 InferenceService Deployment (observed live on a cluster upgraded to 0.8.1: ~14 errors/min across every pre-0.8 service). Detect the selector mismatch before the update and migrate by deleting and recreating the Deployment with the correct selector, requeuing if the old Deployment is still terminating. One-time, logged recreate. Test: an envtest spec seeds a Deployment with the pre-0.8 {app}-only selector and asserts reconcileDeployment migrates it to the current selector without error. The real apiserver rejects the immutable in-place update, so the test reproduces the production failure faithfully (red before the fix). Fixes defilantech#606 Signed-off-by: Christopher Maher <chris@mahercode.io>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
5 tasks
4dedd17 to
63aeb86
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
Upgrade impact (not an API break, but plan for it). On the first reconcile
after upgrading, each InferenceService Deployment created by a pre-0.8.0
operator is deleted and recreated to migrate its immutable selector. That
is a one-time pod restart + model reload for affected services (minutes of
unavailability for large models, once). 0.8.x-created Deployments are
unaffected.
What
Migrates InferenceService Deployments whose pod selector predates the current operator's selector label set, by recreating them instead of hot-looping on an immutable in-place update.
Why
Fixes #606
A Deployment created by an older operator carries a smaller selector than 0.8.x generates (pre-0.8:
{app: <name>}; now alsoinference.llmkube.dev/service: <name>).Deployment.spec.selectoris immutable, soreconcileDeployment's wholesaleSpecreplace +Updatefailed forever withfield is immutable, and the controller hot-looped, unable to reconcile any pre-0.8 InferenceService Deployment. Observed live after upgrading a cluster to 0.8.1 (~14 errors/min across every pre-0.8 service).How
In
reconcileDeployment, after fetching the existing Deployment, compare its selector against the desired one. On mismatch (immutable), delete and recreate the Deployment with the correct selector; if the old one is still terminating, requeue. One-time, logged recreate; the brief pod churn is unavoidable for an immutable-selector change. Service selectors are mutable and unaffected.Testing
New envtest spec (
inferenceservice_selector_migration_test.go) seeds a Deployment with the pre-0.8{app}-only selector and callsreconcileDeploymentwith the model ready, asserting the selector is migrated to the current label set with no error. Because envtest runs a real apiserver, the immutable-selector rejection is genuine, so the test is red before the fix and green after.Checklist
make testpasses locally (full controller suite green)make lintpasses locally (native +GOOS=linux)