feat: add scale sub resource by mircea-pavel-anton · Pull Request #474 · defilantech/LLMKube

mircea-pavel-anton · 2026-05-16T23:51:57Z

What

Scale-to-zero support: keep InferenceService.spec.replicas: 0 at idle, use a KEDA ScaledObject (HTTP or external trigger) to scale to 1 on demand, and scale back to 0 after a cooldown period. Essentially allowing me to hot swap models on limited hardware.

Why

This feature allows InferenceServices to be scaled up/down on-demand via kubectl scale or via external scalers such as KEDA. This also allows scale to zero to work.

Fixes #473

How

Added scale sub-resource.

Checklist

Tests added/updated
make test passes locally
make lint passes locally
Commit messages follow conventional commits
All commits are signed off (git commit -s) per DCO
Documentation updated (if user-facing change)

codecov · 2026-05-17T00:33:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Expose spec.replicas via the standard /scale subresource so that external autoscalers (KEDA, HPA) can target InferenceService directly. Without this, KEDA's operator immediately deletes any ScaledObject whose scaleTargetRef points at a CRD that does not implement /scale. Changes: - Add +kubebuilder:subresource:scale marker with specpath=.spec.replicas and statuspath=.status.replicas to InferenceService type - Add status.Replicas int32 field (mirrors readyReplicas; this is the path the scale subresource reads to report current replica count) - Populate status.Replicas = readyReplicas in updateStatusWithSchedulingInfo - Regenerate config/crd/bases CRD YAML via make manifests generate - Update charts/llmkube/templates/crds/inferenceservices.yaml to match kubectl scale and KEDA ScaledObjects can now target InferenceService: kubectl scale inferenceservice/my-model -n ai --replicas=1 Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

Defilan

Thanks for this, and thanks for getting the commits signed off. A clean, well-scoped PR: clear What/Why/How, the CRD regen done correctly, and a sample, test, and docs included. CI is green.

One thing I'd like resolved before merge. status.replicas currently mirrors readyReplicas, but the scale subresource's statusReplicasPath is conventionally the total current replica count. HPA reads that field to compute its scaling ratio, so reporting "ready" instead of "total" can cause over-scaling during rollouts. The PR and the README mention HPA, but the subresource also lacks a selectorpath, which autoscaling/v2 HPA needs to resolve pods. Two clean ways forward: either (a) point status.replicas at the Deployment's replica count and add a status.selector plus selectorpath for real HPA support, or (b) scope this PR to KEDA, which needs neither, and adjust the README to say KEDA rather than KEDA/HPA, with HPA as a follow-up.

Also strongly suggested: a test that drives the /scale subresource directly (and a scale-to-0 case) would be more reassuring than the current field-copy assertion, and a one-line note that the scale subresource and spec.autoscaling are mutually exclusive would save users a footgun. Nice contribution overall, and happy to help land it.

Defilan · 2026-05-17T03:54:15Z

 	isvc.Status.Phase = phase
 	isvc.Status.ModelReady = modelReady
 	isvc.Status.ReadyReplicas = readyReplicas
+	isvc.Status.Replicas = readyReplicas


Status.Replicas is the path statusReplicasPath reads for the scale subresource. By convention it should report the total current replica count (so HPA computes a correct scaling ratio), not the ready count. Reporting readyReplicas here can cause over-scaling during rollouts when pods aren't yet Ready. Consider sourcing this from the Deployment's *Spec.Replicas instead, keeping ReadyReplicas as the separate ready signal. If KEDA-only support is the intent, that's acceptable but should be documented and the README's HPA claim dropped.

Something like this perhaps? d797121

If I understand correctly, this means that KEDA always sees the intended replica count rather than the ready replica count. During a rollout with pods not yet Ready, it reports 3 (desired) instead of 0 (ready), which prevents any false over-scaling signal.

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

Defilan

Excellent work getting this done so fast! You addressed everything I mentioned. I'll get this merged momentarily. Welcome to LLMKube!

mircea-pavel-anton requested a review from Defilan as a code owner May 16, 2026 23:51

mircea-pavel-anton added 5 commits May 17, 2026 03:39

test: add tests

64c2fa7

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

docs: add a note in the readme

fbc4a6a

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

docs: add example yaml with doc comments

991a838

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

chore: cleanup

fae4f76

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

mircea-pavel-anton force-pushed the feat/scale-subresource branch from e545dd4 to fae4f76 Compare May 17, 2026 00:39

Defilan requested changes May 17, 2026

View reviewed changes

mircea-pavel-anton added 4 commits May 17, 2026 12:24

docs: remove blank line from README

2e40c88

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

chore: update example to be more complete; include doc comments

22b3860

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

fix: set Status.Replicas to desiredReplicas

d797121

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

test: update tests

7388137

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

mircea-pavel-anton requested a review from Defilan May 17, 2026 10:06

Defilan approved these changes May 17, 2026

View reviewed changes

Defilan merged commit 73419a5 into defilantech:main May 17, 2026
21 checks passed

github-actions Bot mentioned this pull request May 17, 2026

chore: release 0.7.9 #470

Merged

mircea-pavel-anton deleted the feat/scale-subresource branch May 17, 2026 16:36

Defilan mentioned this pull request May 17, 2026

[FEATURE] Tutorial: metrics-driven autoscaling for InferenceService #478

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add scale sub resource#474

feat: add scale sub resource#474
Defilan merged 9 commits into
defilantech:mainfrom
mircea-pavel-anton:feat/scale-subresource

mircea-pavel-anton commented May 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 17, 2026

Uh oh!

Defilan left a comment

Uh oh!

Defilan May 17, 2026

Uh oh!

mircea-pavel-anton May 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Defilan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mircea-pavel-anton commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Checklist

Uh oh!

codecov Bot commented May 17, 2026

Codecov Report

Uh oh!

Defilan left a comment

Choose a reason for hiding this comment

Uh oh!

Defilan May 17, 2026

Choose a reason for hiding this comment

Uh oh!

mircea-pavel-anton May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Defilan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mircea-pavel-anton commented May 16, 2026 •

edited

Loading