Skip to content

fix(catalog): default phi-4-mini context to 8K (closes #386)#387

Merged
Defilan merged 1 commit intomainfrom
fix/issue-386-phi4mini-catalog-context
May 3, 2026
Merged

fix(catalog): default phi-4-mini context to 8K (closes #386)#387
Defilan merged 1 commit intomainfrom
fix/issue-386-phi4mini-catalog-context

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 3, 2026

Closes #386.

The bug

`catalog/catalog.yaml` shipped `phi-4-mini` with `context_size: 128000` paired with a `memory: 4Gi` request. KV cache for 128K context on a 3.8B model exceeds 4Gi during model load. Pod lands in `CrashLoopBackOff` with repeated `SystemOOM` events.

This is the first model the quickstart docs recommend deploying. A copy-paste `llmkube deploy phi-4-mini` from the catalog defaults breaks on minute one of a new user's session. HN-killer for the launch window.

The fix

Drop the catalog default to `context_size: 8192`. Phi-4 still supports 128K when users explicitly set `spec.contextSize` on InferenceService or pass `--context 128000` on the CLI. Other catalog small-models (3B–7B class) already default to 8192–32768; phi-4-mini was the outlier.

Test plan

  • `llmkube catalog info phi-4-mini` reports `Context Size: 8,192 tokens`
  • `llmkube deploy phi-4-mini` on a fresh kind cluster reaches Ready in 13s (verified locally)
  • OpenAI-compatible API request returns 200 OK after deploy
  • `go test ./pkg/cli/... -run Catalog` passes
  • `go vet` clean

Note on the duplicate catalog files

This repo currently maintains two copies of `catalog.yaml`: one at the repo root (`catalog/catalog.yaml`) and one embedded into the CLI binary (`pkg/cli/catalog.yaml` via `go:embed`). Both are updated in this PR so the next binary build picks up the fix.

The duplication is asking for drift. Worth a follow-up issue to consolidate to a single source of truth (probably `catalog/catalog.yaml` as canonical with a build-time copy / symlink to `pkg/cli/`). Out of scope for this PR.

…#386)

The catalog shipped phi-4-mini with context_size: 128000 paired with a
4Gi memory request. The KV cache for 128K context on a 3.8B model
exceeds 4Gi during model load, so a `llmkube deploy phi-4-mini` (the
catalog default for our smallest model) lands the pod in
CrashLoopBackOff with SystemOOM events. This is the first model the
quickstart docs recommend, so the failure mode hits on minute one of a
new user's session.

Drop the catalog default to context_size: 8192. Phi-4 still supports
128K when users explicitly set spec.contextSize on InferenceService
or pass --context 128000 on the CLI.

Other catalog small-models (3B-7B class) already default to
8192-32768 context; phi-4-mini was the outlier.

Verified locally on a fresh kind cluster: with the fix,
`llmkube deploy phi-4-mini` reaches Ready in 13 seconds and serves
inference at the OpenAI-compatible endpoint as expected.

Note: this repo currently maintains two copies of catalog.yaml,
one at the repo root (catalog/catalog.yaml) and one embedded into the
CLI binary (pkg/cli/catalog.yaml). Both are updated here so the next
binary build picks up the fix. A follow-up issue should consolidate
these into a single source of truth.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Defilan Defilan merged commit 7bcd685 into main May 3, 2026
19 checks passed
@Defilan Defilan deleted the fix/issue-386-phi4mini-catalog-context branch May 3, 2026 19:22
@github-actions github-actions Bot mentioned this pull request May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] catalog: phi-4-mini ships with context_size: 128000, OOMs on the recommended 4Gi memory

1 participant