fix(catalog): default phi-4-mini context to 8K (closes #386) by Defilan · Pull Request #387 · defilantech/LLMKube

Defilan · 2026-05-03T19:06:05Z

Closes #386.

The bug

`catalog/catalog.yaml` shipped `phi-4-mini` with `context_size: 128000` paired with a `memory: 4Gi` request. KV cache for 128K context on a 3.8B model exceeds 4Gi during model load. Pod lands in `CrashLoopBackOff` with repeated `SystemOOM` events.

This is the first model the quickstart docs recommend deploying. A copy-paste `llmkube deploy phi-4-mini` from the catalog defaults breaks on minute one of a new user's session. HN-killer for the launch window.

The fix

Drop the catalog default to `context_size: 8192`. Phi-4 still supports 128K when users explicitly set `spec.contextSize` on InferenceService or pass `--context 128000` on the CLI. Other catalog small-models (3B–7B class) already default to 8192–32768; phi-4-mini was the outlier.

Test plan

`llmkube catalog info phi-4-mini` reports `Context Size: 8,192 tokens`
`llmkube deploy phi-4-mini` on a fresh kind cluster reaches Ready in 13s (verified locally)
OpenAI-compatible API request returns 200 OK after deploy
`go test ./pkg/cli/... -run Catalog` passes
`go vet` clean

Note on the duplicate catalog files

This repo currently maintains two copies of `catalog.yaml`: one at the repo root (`catalog/catalog.yaml`) and one embedded into the CLI binary (`pkg/cli/catalog.yaml` via `go:embed`). Both are updated in this PR so the next binary build picks up the fix.

The duplication is asking for drift. Worth a follow-up issue to consolidate to a single source of truth (probably `catalog/catalog.yaml` as canonical with a build-time copy / symlink to `pkg/cli/`). Out of scope for this PR.

…#386) The catalog shipped phi-4-mini with context_size: 128000 paired with a 4Gi memory request. The KV cache for 128K context on a 3.8B model exceeds 4Gi during model load, so a `llmkube deploy phi-4-mini` (the catalog default for our smallest model) lands the pod in CrashLoopBackOff with SystemOOM events. This is the first model the quickstart docs recommend, so the failure mode hits on minute one of a new user's session. Drop the catalog default to context_size: 8192. Phi-4 still supports 128K when users explicitly set spec.contextSize on InferenceService or pass --context 128000 on the CLI. Other catalog small-models (3B-7B class) already default to 8192-32768 context; phi-4-mini was the outlier. Verified locally on a fresh kind cluster: with the fix, `llmkube deploy phi-4-mini` reaches Ready in 13 seconds and serves inference at the OpenAI-compatible endpoint as expected. Note: this repo currently maintains two copies of catalog.yaml, one at the repo root (catalog/catalog.yaml) and one embedded into the CLI binary (pkg/cli/catalog.yaml). Both are updated here so the next binary build picks up the fix. A follow-up issue should consolidate these into a single source of truth. Signed-off-by: Christopher Maher <chris@mahercode.io>

codecov · 2026-05-03T19:09:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Defilan merged commit 7bcd685 into main May 3, 2026
19 checks passed

Defilan deleted the fix/issue-386-phi4mini-catalog-context branch May 3, 2026 19:22

github-actions Bot mentioned this pull request May 3, 2026

chore: release 0.7.6 #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(catalog): default phi-4-mini context to 8K (closes #386)#387

fix(catalog): default phi-4-mini context to 8K (closes #386)#387
Defilan merged 1 commit intomainfrom
fix/issue-386-phi4mini-catalog-context

Defilan commented May 3, 2026

Uh oh!

codecov Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented May 3, 2026

The bug

The fix

Test plan

Note on the duplicate catalog files

Uh oh!

codecov Bot commented May 3, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant