Skip to content

feat(security): supply-chain MVP — checksum install, govulncheck, gosec, codecov#310

Merged
Defilan merged 7 commits intomainfrom
chore/supply-chain-mvp
Apr 21, 2026
Merged

feat(security): supply-chain MVP — checksum install, govulncheck, gosec, codecov#310
Defilan merged 7 commits intomainfrom
chore/supply-chain-mvp

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Apr 21, 2026

Summary

Second pass on the v0.7.0 audit — supply-chain MVP. Four commits, all independently reviewable:

  1. feat(install): verify sha256 against release checksums.txt — the curl -sSL .../install.sh | bash install path previously piped GitHub releases straight into tar with no integrity check. Now fetches the same checksums.txt goreleaser already publishes, computes sha256 of the downloaded archive, and aborts with a clear error on mismatch. LLMKUBE_SKIP_CHECKSUM=1 escape hatch for test environments. Also fixes two pre-existing bugs uncovered along the way: the script constructed lowercase llmkube_...tar.gz but goreleaser emits LLMKube_...tar.gz (every install attempt was downloading a 404 page and failing at tar), and curl -sSL without -f silently saved error pages on HTTP 4xx/5xx.

  2. feat(ci): add govulncheck security workflow — new .github/workflows/security.yml runs govulncheck ./... on push-to-main, every PR, and weekly (Monday 07:00 UTC). Uses default source mode so only reachable vulnerabilities fail the build. Closes the gap where the only dependency-vuln signal was Dependabot noticing after the fact, post-merge.

  3. feat(ci): enable gosec and bodyclose linters — two security-leaning linters added to .golangci.yml. gosec's five categorically-noisy-for-an-operator rules disabled in config (G107/G204/G301/G304/G306) with documented reasons. The remaining rules caught 9 G115 (integer overflow conversion) sites, each suppressed with //nolint:gosec // G115: <reason> citing why the conversion is safe (CRD validation, protocol bounds, explicit guards). bodyclose was clean on first run.

  4. feat(ci): upload coverage to Codecov — wires codecov-action@v5 after make test so coverage trend is tracked per PR. Tokenless public upload, fail_ci_if_error: false, no enforced threshold yet. Starts the data feedback loop so we can set a coverage floor once there are a few weeks of numbers. Current baseline: internal/controller 83.1%, internal/metrics 100%, pkg/gguf 61.9%, pkg/agent 38.4%, pkg/cli 36.2%, pkg/license 100%.

Context

Companion PR to #309 (repo polish). Both are independent workstreams of the v0.7.0 critical-issues audit. Deferred to a later hardening pass: cosign signing of binaries/images/chart, SBOM generation, SLSA provenance, CodeQL — each deserves its own focused PR.

Test plan

  • End-to-end install.sh run against v0.7.0 (positive path: downloads, verifies, extracts, binary reports correct version)
  • End-to-end install.sh run with tampered archive (negative path: aborts with "Checksum mismatch" + expected/actual hashes)
  • bin/golangci-lint run --max-issues-per-linter=0 --max-same-issues=0 ./... — 0 issues
  • make test — all packages pass
  • govulncheck ./... — runs; Go-stdlib vulns found are 1.26.0-only and will not trigger CI (go.mod pins 1.25.0)
  • CI: test + lint + security + e2e workflows green on this branch
  • Codecov dashboard ingests the first report

Defilan added 7 commits April 21, 2026 10:31
The install.sh script previously piped release tarballs directly from
GitHub into tar without any integrity check. A CDN compromise or
redirected download would deliver unverified binaries to users running
the recommended "curl -sSL .../install.sh | bash" flow. Close that gap
by fetching the same checksums.txt goreleaser already publishes and
comparing the downloaded archive before unpacking it.

Also fix two pre-existing issues uncovered while wiring this up:

  - The constructed archive filename used lowercase "llmkube_" but
    goreleaser emits "LLMKube_" for this project (resolved .ProjectName
    template), so every install attempt was actually downloading a 404
    HTML page and failing at the tar step. Split BINARY_NAME (the binary
    inside the archive, still "llmkube") from the new ARCHIVE_PREFIX
    ("LLMKube") used for the tarball filename.
  - curl calls used "-sSL" without "-f", so HTTP 4xx/5xx responses were
    silently saved to disk instead of aborting. Switch to "-fsSL" on
    both tarball and checksums downloads for fail-fast behaviour.

Behavioural notes:

  - LLMKUBE_SKIP_CHECKSUM=1 is an explicit escape hatch for CI/test
    environments; it logs a warning and proceeds without verification.
  - When checksums.txt is missing, lacks an entry, or the sha256 does
    not match, the script aborts with a clear error. No silent fallback.
  - Detects sha256sum (Linux) or shasum -a 256 (macOS); errors out if
    neither is present.

Verified end-to-end against the v0.7.0 release:
  - Positive path: downloads, verifies, extracts, binary runs version cmd.
  - Negative path: injected byte into downloaded tarball; script aborts
    with "Checksum mismatch" + expected/actual hashes.

Signed-off-by: Christopher Maher <chris@mahercode.io>
Run govulncheck on push-to-main, every PR, and weekly (Mondays 07:00
UTC). The scheduled run catches newly-disclosed vulnerabilities in
dependencies even when the repo is otherwise quiet.

govulncheck operates in its default source mode, so the call graph is
analyzed and we only fail on vulnerabilities that code actually
reaches — imports that are merely present but never invoked do not
block the build.

This closes one of the supply-chain gaps flagged in the v0.7.0 audit:
previously the only signal on dependency vulns was Dependabot noticing
after the fact, with no pre-merge gate.

Signed-off-by: Christopher Maher <chris@mahercode.io>
Extend the golangci-lint config with two security-leaning linters that
catch genuine bugs:

  - bodyclose: flags HTTP response bodies that are never closed,
    a quiet FD-leak class that unit tests rarely exercise.
  - gosec:     the standard Go static analysis suite for security
    issues (weak crypto, hardcoded creds, overflow conversions, etc.)

gosec on its own is famously noisy for operator/CLI-style code that
legitimately runs user-provided binaries and reads user-provided
paths. Rather than papering every call-site with #nosec annotations,
disable the rules that categorically do not apply:

  G107 (variable URL in HTTP request) — we intentionally GET user-provided model URLs.
  G204 (subprocess with variable args) — we intentionally exec runtime binaries.
  G301 (dir ≤0750)                     — 0755 is correct for shared cache dirs.
  G304 (file inclusion via variable)   — we intentionally read user-specified files.
  G306 (file ≤0600)                    — 0644 is correct for cache/report files.

The remaining rules still catch real bugs; G115 (integer overflow
conversion) surfaced nine sites where the conversion is safe because
the value is bounded positive by either CRD validation, protocol
constraints (TCP ports ≤65535), or explicit zero/positive guards.
Each site is annotated with //nolint:gosec // G115: <reason> so the
suppression is audited and documented, not blanket-silenced.

bodyclose was already clean on first run — no changes needed. Any
future HTTP response body leak will now fail CI.

Signed-off-by: Christopher Maher <chris@mahercode.io>
make test already writes cover.out; wire up the codecov-action step so
coverage trend is tracked on every PR. This establishes the data
feedback loop that lets us set a coverage floor later once we have a
few weeks of numbers.

First-pass defaults intentionally conservative:
  - fail_ci_if_error: false  — a Codecov outage does not gate merges
  - no ratchet / target      — observation only, no blocking thresholds
  - tokenless public upload  — no secret required; a CODECOV_TOKEN can
    be added later if tokenless rate-limits hit

Current unit-test coverage for reference (from last make test run):
  internal/controller    83.1%
  internal/metrics       100.0%
  pkg/gguf               61.9%
  pkg/agent              38.4%
  pkg/cli                36.2%
  pkg/license            100.0%

Signed-off-by: Christopher Maher <chris@mahercode.io>
Running govulncheck against the v0.7.0 audit-driven security workflow
revealed that CI was provisioning Go 1.25.0 (via setup-go reading the
go-version-file: go.mod) and hitting 20 stdlib vulnerabilities fixed
in 1.25.2 and 1.25.3:

  - crypto/tls GO-2025-4008 (ALPN info leak)
  - crypto/x509 GO-2025-4007 (quadratic name-constraint check; requires 1.25.3)
  - crypto/x509 GO-2025-4008 / GO-2025-4005
  - encoding/asn1 GO-2025-4011 (DER memory exhaustion)
  - encoding/pem GO-2025-4009 (quadratic PEM decode)
  - net/http / net/url / net/mail — several parsing CVEs
  - plus related entries fixed in 1.25.2

Bump the module floor to 1.25.3 so setup-go provisions a patched
toolchain for CI, goreleaser release builds, and any downstream
consumers. All 20 reachable stdlib CVEs should clear; govulncheck
still reports informational entries for imported-but-never-called
packages (not build-failing).

No code or dependency changes — go mod tidy produced no go.sum diff.

Signed-off-by: Christopher Maher <chris@mahercode.io>
The earlier bump to 1.25.3 cleared the GO-2025-* series but govulncheck
in CI then surfaced a second wave of GO-2026-* vulnerabilities that
had landed in Go stdlib between 1.25.4 and 1.25.9:

  - crypto/x509 GO-2026-4947 / GO-2026-4946 (Fixed in 1.25.9)
  - crypto/tls  GO-2026-4870 (TLS 1.3 KeyUpdate DoS, Fixed in 1.25.9)
  - html/template GO-2026-4865 (JsBraceDepth XSS, Fixed in 1.25.9)
  - net/url GO-2026-4601 (IPv6 literal parsing, Fixed in 1.25.8)
  - crypto/tls GO-2026-4337 (unexpected session resumption, Fixed in 1.25.7)
  - …12 total, spanning 1.25.5 through 1.25.9

Bump the module floor to 1.25.9 so setup-go and goreleaser both
install the fully-patched toolchain. Follow-up: once this lands, add a
Dependabot rule for the Go toolchain directive so these rolls happen
automatically instead of whack-a-mole on CI failures.

Signed-off-by: Christopher Maher <chris@mahercode.io>
The codecov-action does not auto-detect secrets — GitHub Actions only
exposes secrets to a step when the workflow explicitly references
them. The first CI run against this workflow uploaded coverage with
Token length: 0 and was rejected by Codecov with "Token required
because branch is protected" despite the CODECOV_TOKEN secret being
set at repo level.

Pass the token through "with: token:" so the action can authenticate.
Uploads from forks will have an empty token (secrets are not exposed
to PRs from forks) which is expected; fail_ci_if_error: false keeps
those runs green.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@Defilan Defilan merged commit f17f59d into main Apr 21, 2026
18 checks passed
@Defilan Defilan deleted the chore/supply-chain-mvp branch April 21, 2026 18:16
@github-actions github-actions Bot mentioned this pull request Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant