Skip to content

chore: repo polish — PROJECT path, ROADMAP/SECURITY refresh, hygiene sweep, Phase enum#309

Merged
Defilan merged 8 commits intomainfrom
chore/repo-polish
Apr 21, 2026
Merged

chore: repo polish — PROJECT path, ROADMAP/SECURITY refresh, hygiene sweep, Phase enum#309
Defilan merged 8 commits intomainfrom
chore/repo-polish

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Apr 21, 2026

Summary

First pass of the v0.7.0 critical-issues audit — eight atomic commits that fix the metadata/hygiene items a new contributor or reviewer hits in the first minute. No behavior changes in the controller; one small agent bug fix.

Each commit is standalone and reviewable on its own:

  1. chore: correct PROJECT repo path — kubebuilder PROJECT file still referenced the pre-org-move path (github.com/defilan/llmkube) in three places; align to github.com/defilantech/llmkube so future scaffolding regenerates with the correct imports.
  2. docs: refresh ROADMAP and SECURITY for v0.7.0 baseline — ROADMAP was frozen at v0.5.0/March 2026 with Q1–Q4 sections already shipped or stale; rewrite to reflect the v0.7.0 reality (multi-runtime, HPA, hybrid offload, CUDA 13, agentic-coding flags) and replace dated quarterly buckets with direction-of-travel sections. SECURITY supported-versions matrix bumped to 0.7.x + 0.6.x with an explicit support policy.
  3. chore: tighten .gitignore — patterns for .env, .playwright-mcp/, root-level /*.png (keeping docs/images/*.png), and /home-snapshot.md so editor/MCP session artifacts don't accumulate in the working tree.
  4. docs: remove internal phase specs from public docs/gpu-performance-phase0.md and hybrid-offload-phase2-spec.md read like internal RFCs and are moved to the internal design-docs record (not in this repo). Public docs/ now contains only user-facing guides.
  5. docs(samples): add Qwen3 hybrid offload and agentic coding samples — ship two sample manifests paired with the v0.7.0 hybrid-offload and agentic-coding features so users have working starting points.
  6. feat(api): validate Status.Phase as enum — lock down both InferenceServiceStatus.Phase and ModelStatus.Phase with +kubebuilder:validation:Enum derived from every string currently assigned in the reconcilers. Silent drift is now impossible; regenerated CRDs include the constraint. First step toward the longer v1beta1 Phase → Conditions migration.
  7. chore: drop kubebuilder scaffolding TODO(user) comments — three TODO(user): comments from the original project scaffold (cert-manager hint in cmd/main.go, two test placeholders in model_controller_test.go). Replaced with a plain doc comment for the cert-manager reference; test placeholders deleted since the inline logic they hinted at already exists.
  8. fix(agent): log swallowed StopProcess error on health check failure — if StopProcess failed after a health-check failure, the process would linger on the Metal host with no log entry. Now surfaced via the executor logger at warn level; original health-check error is still wrapped and returned to the caller.

Context

Output of the v0.7.0 repo maturity audit. Companion workstreams to follow on separate branches:

  • B: Supply-chain MVPinstall.sh checksum verification, govulncheck in CI, gosec/bodyclose linters, codecov integration.
  • C: Controller split — decompose the 1567-line inferenceservice_controller.go across ~7 small pure-move PRs.

Test plan

  • make manifests generate vet — clean
  • make test — all packages pass; internal/controller coverage 83.1%
  • bin/golangci-lint run ./... — 0 issues
  • git check-ignore on .env, blog-*.png, .playwright-mcp/, docs/images/logo.png — ignore rules behave as expected
  • CRD diff verified: enum constraints appear in both inferenceservice and model CRDs under spec.versions[*].schema.openAPIV3Schema.properties.status.properties.phase.enum
  • CI: test, lint, e2e, helm-chart workflows green on this branch

Defilan added 8 commits April 21, 2026 09:16
The kubebuilder PROJECT metadata still referenced the pre-org-move path
(github.com/defilan/llmkube) in three places: the top-level repo field
and both Model/InferenceService resource paths. go.mod has been
github.com/defilantech/llmkube for some time; align PROJECT to match so
future kubebuilder scaffolding regenerates with the correct import
paths.

Signed-off-by: Christopher Maher <chris@mahercode.io>
ROADMAP was frozen at v0.5.0 / March 2026 with entire Q1-Q4 2026
sections that had already shipped or slipped. Rewrite to reflect the
v0.7.0 reality: multi-runtime support (vLLM, TGI, PersonaPlex, oMLX,
Ollama), HPA, hybrid CPU/GPU offload, CUDA 13, agentic-coding flags.
Replace dated quarterly buckets with direction-of-travel sections
(Near-term focus on v1beta1 API prep + supply-chain hardening +
operator decomposition; Medium-term on distributed inference, more
GPU vendors, edge). Past Releases backfilled through v0.7.0.

SECURITY supported-versions matrix was two minors behind. Bump to
0.7.x + 0.6.x and state the support policy explicitly ("latest minor
plus one prior").

Signed-off-by: Christopher Maher <chris@mahercode.io>
…ifacts

The repo root has been collecting untracked cruft from editor/MCP
sessions: playwright screenshots (blog-*.png, features-*.png, reddit-*.png,
home-mobile-*.png, etc.), an audit.log file, a home-snapshot.md
playwright artifact, a .env file, and .playwright-mcp/ session state.
None were tracked but all show up in "git status" on every clone.

Add patterns so future runs stay out of the working tree:
  - .env / .env.* (with !.env.example escape hatch)
  - .playwright-mcp/ local MCP state
  - /*.png (root-level PNGs only; docs/images/*.png still tracked)
  - /home-snapshot.md
Also sweep the existing stray artifacts out of the working tree.

Signed-off-by: Christopher Maher <chris@mahercode.io>
gpu-performance-phase0.md and hybrid-offload-phase2-spec.md read like
internal design RFCs — they reference "Phase 0" / "Phase 2", internal
issue numbers, and implementation plans rather than user guidance. They
belong with the rest of the internal design record, not in the public
docs tree where new users scanning docs/ will mistake them for
user-facing instructions.

Files have been moved to the internal llmkube-internal/design-docs/
folder (not in this repo). The public docs tree now contains only
user-facing guides.

Signed-off-by: Christopher Maher <chris@mahercode.io>
Ship the two Qwen3 sample manifests that pair with the v0.7.0 hybrid
GPU/CPU offload and agentic-coding flag features, so new users trying
those capabilities have working starting points:

  - qwen3-30b-hybrid-offload.yaml exercises MoE expert offloading to
    CPU RAM via the hybrid offloading spec introduced in #283.
  - qwen3-coder-30b.yaml shows an agentic-coding configuration suitable
    for the Qwen Code CLI.

Signed-off-by: Christopher Maher <chris@mahercode.io>
Status.Phase was declared as an unvalidated string on both CRDs. The
doc comments listed the allowed values, but nothing prevented a typo
in the reconciler from silently propagating to stored status. Lock the
allowed values down with kubebuilder:validation:Enum so the apiserver
rejects any unexpected phase value:

  InferenceService: Pending, Creating, Progressing, Ready, WaitingForGPU, Failed
  Model:            Pending, Downloading, Copying, Ready, Failed

The enum values are derived from every string literal currently
assigned to Status.Phase in the reconcilers (confirmed via grep over
internal/controller/). Fields remain +optional so fresh resources
without a set Phase still validate. Regenerated CRDs include the enum
constraint.

First step toward the longer v1beta1 direction of migrating Phase into
standard Conditions; until then, at least the enum shields us from
silent drift.

Signed-off-by: Christopher Maher <chris@mahercode.io>
Three kubebuilder-generated "TODO(user):" comments have been lingering
since the initial project scaffold:

  cmd/main.go                                  — cert-manager enable hint
  internal/controller/model_controller_test.go — two test scaffolding placeholders

Replace the cert-manager TODO with a plain documentation comment that
still points to the relevant kustomization markers. Drop the two test
TODOs — the cleanup logic they hinted at is already implemented inline.

Signed-off-by: Christopher Maher <chris@mahercode.io>
When llama-server health check fails after StartProcess, the code
attempted to stop the process and discarded any error from StopProcess
with a blank identifier. If StopProcess itself failed (e.g., permission
denied sending SIGTERM, process already gone, etc.), the orphaned
process would linger on the Metal host with no log entry to explain
why.

Surface the stop failure via the executor logger at warn level with
pid and port context. The original health-check error is still
returned wrapped to the caller so reconciliation flow is unchanged.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit 038fb6f into main Apr 21, 2026
16 checks passed
@Defilan Defilan deleted the chore/repo-polish branch April 21, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant