feat(evidence): emit Recipe Evidence v1 bundle from aicr validate#873
Conversation
Introduce pkg/evidence/attestation, the library that builds, signs, and
publishes the Recipe Evidence Bundle v1 — a tamper-evident artifact
binding a hydrated recipe, the snapshot it was validated against,
the validator outcome, and (optionally) per-check logs.
The bundle is consumed by `aicr validate --emit-attestation/--bom/--push`
in a follow-up commit; this change is library-only and can be
imported without touching the CLI.
Public surface:
- Builder: canonicalize recipe, hash files, assemble manifest
- Fingerprint: sha256 over canonical inputs (recipe, snapshot,
validator output, optional logs)
- Manifest: v1 schema + JSON marshaling/validation
- Predicate: in-toto predicate wrapping the manifest
- Signer: Sigstore keyless (Fulcio + Rekor) signing
- OCI: push manifest + blobs to a registry
- Pointer: compact human-readable reference (digest, ref)
The bundle format itself is documented in
docs/spec/recipe-evidence-v1.md (added here, sidebar-linked from
docs/index.yml and site/.vitepress/config.ts).
Delete the stub fingerprint.go and fingerprint_test.go and substitute the real types directly in the predicate. The on-the-wire schema is unchanged because pkg/fingerprint types serialize to the same JSON/YAML shape the spec already documented. - pkg/evidence/attestation/fingerprint.go: deleted - pkg/evidence/attestation/fingerprint_test.go: deleted (coverage in pkg/fingerprint) - Predicate.Fingerprint -> fingerprint.Fingerprint - Predicate.CriteriaMatch -> fingerprint.MatchResult - PointerFingerprint: drops intent/platform (not cluster facts), adds region - builder.go: calls fingerprint.FromMeasurements + fp.Match directly - docs/spec/recipe-evidence-v1.md: example reflects the richer schema
Replace the duplicated DSSE/Fulcio/Rekor plumbing in pkg/evidence/attestation/signer.go with a thin call to bundler/attestation.SignStatement (the predicate-agnostic primitive landed in NVIDIA#852). KeylessSigner.Sign now packages its OIDC token and URLs into bundleattest.SignOptions and converts the bundler's SignedAttestation back into the local SignResult shape that pointer files and the builder consume. Wins: - ~150 LoC of duplicated signing ceremony removed (sign.Bundle, ephemeral keypair, Fulcio/Rekor wiring, protojson marshal, extractSignerClaims, extractIssuerExtension, extractRekorLogIndex). - Inherits the bundler primitive's improvements automatically: SigstoreSignTimeout deadline, ErrCodeTimeout vs ErrCodeUnavailable classification, ASN.1-aware OIDC issuer extraction with current/legacy OID precedence, PII-safe Debug-only identity logging. - DefaultFulcioURL/DefaultRekorURL are now consts pointing at the bundler's so the two packages cannot drift on the Sigstore public-good URLs. KeylessSigner, NoOpSigner, SignBundle, and SignResult retain the same public shape so the rest of the package (builder, pointer, OCI push) is unaffected.
Extend AICRConfig with the recipe-evidence schema flagged in ADR-007:
spec.validate.evidence carries two sibling sections, one per evidence
kind aicr validate can emit.
spec.validate.evidence.cncf → --evidence-dir, --cncf-submission,
--feature
spec.validate.evidence.attestation → --emit-attestation, --bom,
--include-logs, --push, --push-logs,
--plain-http, --insecure-tls
Bool fields in the resolved view are pointers so an explicit-false in
config survives the CLI flag's default-true / default-false reading via
boolFlagOrConfig + derefBoolOr — same pattern ValidateExecutionSpec
already uses for FailOnError.
The SIGSTORE_ID_TOKEN secret stays out of the schema by design; tokens
are short-lived and must not be embedded in version-controlled config.
The CLI reads it from env at sign time.
Adds three pkg/defaults constants for the evidence pipeline:
EvidenceBundleBuildTimeout (60s local I/O), EvidenceBundleSignTimeout
(aliased to SigstoreSignTimeout for ergonomic divergence later),
EvidenceBundlePushTimeout (2m for multi-blob ORAS upload).
Library only — the CLI wiring follows in the next commit.
Wire the evidence/attestation bundle into 'aicr validate':
--emit-attestation <path> write bundle (manifest + blobs) to a local
directory; safe to run without network.
--bom <path> write the validator BOM JSON alongside the
bundle for audit consumers (required when
--emit-attestation is set).
--include-logs embed validator logs in logs-bundle/.
--push <oci-ref> publish the bundle to an OCI registry; uses
pkg/oci scheme helpers and Sigstore keyless
signing.
--push-logs also push the logs bundle as <push>-logs.
--plain-http use HTTP on push (local registry tests).
--insecure-tls skip TLS verification on push.
Each flag falls through to spec.validate.evidence.attestation in the
loaded AICRConfig via the stringFlagOrConfig / boolFlagOrConfig
helpers, matching the precedence the rest of the validate command
already uses (CLI > config > flag Value: default). Existing CNCF
flags (--evidence-dir, --cncf-submission, --feature) are now also
plumbed through spec.validate.evidence.cncf.
emitRecipeEvidence runs even when phases failed — a failed validate is
still useful evidence for a contributor documenting hardware-specific
limitations.
Closes NVIDIA#754.
…e config
Add the --emit-attestation flag family (--bom, --include-logs, --push,
--push-logs, --plain-http, --insecure-tls) to the aicr validate flag
table and example block, with a worked example covering both the
unsigned local-only path (Sigstore not required) and the keyless
sign+push path (SIGSTORE_ID_TOKEN required).
Extend the 'Validate Config File Mode' YAML example with the new
spec.validate.evidence.{cncf,attestation} subtree so the file matches
what the CLI now accepts. Drop the prior 'evidence flags are CLI-only'
caveat — that ship has sailed with this PR.
Pass over the evidence wiring driven by a code-review sweep:
- Drop unjustified *bool ceremony in EvidenceCNCFResolved and
EvidenceAttestationResolved. The wire-form EvidenceCNCFSpec /
EvidenceAttestationSpec use bool with omitempty, so YAML cannot
distinguish 'absent' from 'explicit false' upstream; the pointer
indirection on the resolved side carried no extra signal.
- Delete the five evidenceAtt{String,Bool} / evidenceCNCF{String,Bool}
/ evidenceCNCFFeatures wrapper helpers and the seven getter
closures in pkg/cli/validate.go. With plain-bool resolved fields,
a single 'if att == nil { att = &Resolved{} }' zero-default guard
at the top of buildRecipeEvidenceConfig replaces them all and the
flag-or-config call sites become direct field reads.
- Defer the yaml.Marshal of recipe + snapshot until inside
emitRecipeEvidence, after the BOM/push/push-logs precondition
checks have passed. A misconfigured --emit-attestation run no
longer pays the (often multi-MB) marshal cost before failing.
Drop RecipeYAML/SnapshotYAML from recipeEvidenceConfig — they
were transient, not config.
- Drop the redundant os.MkdirAll on cfg.OutDir: attestation.Build
already creates the output tree.
- Pre-validate --push as an OCI reference before signing, so a
typo in the registry URL no longer burns a Fulcio cert + Rekor
inclusion proof on a push that will fail seconds later.
- Run the summary and logs OCI uploads concurrently via errgroup
when --push-logs is set. They are independent artifacts sharing
only the cancellation deadline; a failure in either aborts the
other through the shared gctx.
- Wrap the (signResult, summaryPush, logsPush, error) return of
signAndPushBundle into a signPushOutcome struct returned by value.
buildPointerInputs collapses from four parameters to two.
- Introduce attestation.SigstoreIDTokenEnv constant for the
'SIGSTORE_ID_TOKEN' env-var name; CLI uses it in both the
push-time check and the user-facing error message so the
contract name lives in one place.
The recipe-evidence push path read SIGSTORE_ID_TOKEN directly via
os.Getenv, ignoring the OIDC source-precedence chain pkg/bundler/attestation
already implements: pre-fetched COSIGN_IDENTITY_TOKEN > ambient GitHub
Actions ACTIONS_ID_TOKEN_* > device-code flow (RFC 8628) > interactive
browser. That meant 'aicr validate --push' worked only when an operator
had manually exported a token — every other production path 'aicr bundle
--attest' supports was unavailable.
Changes:
- Extract ResolveOIDCToken(ctx, opts) in pkg/bundler/attestation/resolver.go.
Walks the same four-source precedence chain ResolveAttester does, but
returns the raw token instead of constructing the bundler's Attester.
ResolveAttester is now a thin wrapper.
- Add --identity-token (bound to COSIGN_IDENTITY_TOKEN env) and
--oidc-device-flow (bound to AICR_OIDC_DEVICE_FLOW env) flags to
aicr validate, mirroring the bundle command's surface byte-for-byte
so operators do not have to learn two different OIDC flag sets.
- Resolve the OIDC token in the Action body (only when --push is set)
via bundleattest.ResolveOIDCToken, and carry the resulting token
on recipeEvidenceConfig. signAndPushBundle uses it directly. Up-front
resolution means an interactive browser or device-code prompt fires
before the long-running validation begins instead of after.
- Drop the SigstoreIDTokenEnv constant I invented in pkg/evidence/attestation —
it was a non-standard name (Sigstore docs and cosign use
COSIGN_IDENTITY_TOKEN) and is no longer referenced after the
handoff to the bundler primitive.
- Update docs/user/cli-reference.md: flag table gains the two new
flags, the worked example drops 'export SIGSTORE_ID_TOKEN=...' in
favor of describing the precedence chain, and the config-file mode
section refers to --identity-token for token-acquisition specifics.
… auto-gen BOM
Four cleanups in response to PR feedback:
1. Make --bom optional; auto-generate a recipe-bound CycloneDX BOM
when --bom is unset. The synthesizer enumerates:
- Each enabled rec.ComponentRefs entry (helm chart / kustomize
source metadata: repository, chart, version, namespace).
- A synthetic 'validators' component carrying every container
image in the validator catalog that this aicr build ships.
The auto-generated BOM does not render individual container images
inside helm charts — doing so would require the helm binary in the
validate hot path, which is too heavy. Operators who need an
exhaustive BOM continue to pass 'make bom' output via --bom; the
path-supplied BOM always wins. Implementation uses pkg/bom's
existing BuildBOM + CycloneDX encoder, so the auto-generated BOM
and 'make bom' output share the same root-and-tree shape.
2. Drop docs/spec/recipe-evidence-v1.md. The spec was almost entirely
a slim restatement of ADR-007's 'V1 surface', 'Bundle anatomy',
'Predicate body', 'Pointer schema', and 'Verifier steps' sections.
For a brand-new V1 with no external consumers, the duplication
adds churn without adding signal. References in code comments and
docs now point at docs/design/007-recipe-evidence.md (the ADR is
the single source of truth until the format ships).
3. Extract recipe-evidence wiring from pkg/cli/validate.go into a
sibling pkg/cli/validate_evidence.go: recipeEvidenceConfig,
signPushOutcome, buildRecipeEvidenceConfig, emitRecipeEvidence,
signAndPushBundle, pushArtifact, buildPointerInputs,
loadOrGenerateBOM, buildAutoBOM, recipeBOMName. validate.go drops
from ~1100 LoC to ~800 and stays focused on the core validation
command surface (snapshot capture, phase parsing, agent config,
Action body); validate_evidence.go owns the optional
--emit-attestation pipeline.
4. Revert the site/.vitepress/config.ts and docs/index.yml additions
from the previous commit — both pointed at the now-deleted spec
doc and shouldn't have been touched by this branch in the first
place.
Both flags were misleading no-ops in the current shape:
- BuildOptions.PhaseLogs has a slot but no production caller ever
populates it; the only writers are builder_test.go fixtures.
- validator.PhaseResult carries {Phase, Status, Report, Duration}
only — no log file paths, no captured bytes.
- The validator engine runs containerized Jobs but does not preserve
pod logs to a path the CLI can pick up. pkg/k8s/pod.StreamLogs
exists but is not invoked during the per-job watch.
So today --include-logs produced an empty logs-bundle/ directory and
pre-committed zero hashes in the manifest, and --push-logs would push
an empty OCI artifact. Better to remove the flags than ship a no-op
that auditors will assume contains logs.
Surface changes:
- pkg/cli/validate.go: drop --include-logs and --push-logs flag
definitions.
- pkg/cli/validate_evidence.go: drop recipeEvidenceConfig.IncludeLogs/
PushLogs, the cfg.PushLogs precondition, the IncludeLogs argument
to attestation.Build (defaults to false), the logs-push goroutine
in signAndPushBundle (collapses to a single sequential push and
removes the now-unused errgroup import), and signPushOutcome.Logs.
- pkg/config/config.go: drop EvidenceAttestationSpec.IncludeLogs and
PushLogs.
- pkg/config/resolve.go: drop the matching fields on
EvidenceAttestationResolved + the Resolve() assignments.
- docs/user/cli-reference.md: remove the two flag rows and trim the
config-file example.
Library surface preserved (pkg/evidence/attestation.BuildOptions.PhaseLogs,
Bundle.LogsDir, Pointer.LogsBundle, PointerLogsBundle, LogsBundleDirName,
writeLogsBundle, the builder's IncludeLogs branching) so the future
log-capture work in pkg/validator can plug in without an API change.
ADR-007 still describes --include-logs / --push-logs as part of the
V1 surface; that's intentional — when log capture lands in pkg/validator,
the flags come back exactly as designed.
…in auto BOM Two coupled cleanups: * Export attestation.RecipeNameFor and reuse it from buildAutoBOM in pkg/cli so the BOM root component shares the same criteria-derived identifier the predicate uses (h100-eks-ubuntu-training etc.). The local recipeBOMName helper is gone; "aicr-recipe" stays as the fallback when the recipe has no resolvable name. * The validator catalog lists one entry per validator-check, and most checks in a phase share a container image. The auto BOM was therefore listing the same image dozens of times under the "validators" dependency. Dedupe by image string before adding the component so consumers see each validator image exactly once.
When --bom is unset, the BOM auto-generator now folds the snapshot's
K8s.image.* measurements into a synthetic 'observed-images' component
alongside the existing chart-refs and validator catalog entries.
For the typical post-deployment validate flow, this captures the
real container image:tag set running on the cluster — the practical
equivalent of rendered helm manifests, drawn from authoritative
observed state rather than speculation about what a chart would emit.
recipe-name
├── /gpu-operator (chart metadata: repo, chart, version)
├── /kubeflow-trainer
├── /validators (catalog images, full refs)
└── /observed-images (NEW: snapshot K8s.image.* readings)
├── img:gpu-operator:v25.10.1
├── img:k8s-driver-manager:v0.7.0
├── img:dcgm-exporter:3.3.10-3.6.1-ubuntu22.04
└── ...
Implementation notes:
- pkg/collector/k8s/image.go is unchanged. The constraint-evaluation
collector strips registries for measurement-key stability across
registry mirrors; the BOM consumer accepts that lossy form rather
than bending the upstream collector to serve a downstream audit
purpose. A more authoritative full-ref BOM still requires
Generating BOM into dist/bom...
bom: wrote /Users/nhensley/dev/code/github/NVIDIA/aicr/.claude/worktrees/pure-riding-hollerith/dist/bom/bom.cdx.json and /Users/nhensley/dev/code/github/NVIDIA/aicr/.claude/worktrees/pure-riding-hollerith/dist/bom/bom.md (22 components, 13 image refs) via --bom.
- When snap is nil or carries no K8s.image.* readings (--no-cluster
runs, pre-deployment snapshots), the observed-images component is
omitted entirely — no empty entry pollutes the BOM.
- Reuses pkg/bom.BuildBOM's existing component+image+dependency
shape so the auto-generated BOM and 'make bom' output remain
structurally compatible. No new BOM schema.
Audit findings #2, #3, #4 against generated evidence bundles: predicate.validatorImages was always null, predicate.validatorCatalogVersion was always empty, and the pointer's signer block emitted zero-valued identity/issuer + rekorLogIndex:0 for unsigned bundles — making a real Rekor log index 0 indistinguishable from 'no Rekor entry' and an unsigned bundle indistinguishable from one whose signer fields failed to extract. Changes: - emitRecipeEvidence loads the validator catalog once and feeds it to both the BOM (chart refs + validator images) and the predicate's ValidatorCatalogVersion + ValidatorImages fields. Shared dedupValidatorImages helper collapses the catalog's one-entry-per-check duplication (deployment/perf/conformance all share images). - PointerAttestation.Signer becomes *PointerSigner, omitempty. Unsigned bundles emit the attestation entry with no signer block at all; signed bundles always carry non-nil Signer with populated Identity and Issuer. - PointerSigner.RekorLogIndex becomes *int64, omitempty. nil = no Rekor entry created (--no-rekor signing path); non-nil = real Rekor log index, even when zero (Rekor's first-ever entry occupies index 0 and that's a legitimate position a consumer should be able to verify). - signAndPushBundle's no-push path returns signPushOutcome{} rather than signPushOutcome{Sign: &SignResult{}}, matching its existing 'All fields are nil when --push is absent' comment and letting buildPointerInputs do the right nil-check. Test coverage: - pkg/evidence/attestation/pointer_test.go: unsigned bundles MUST omit the signer YAML block; signed-without-rekor bundles MUST omit rekorLogIndex. Both assertions check rendered YAML, not just the struct. - pkg/cli/validate_evidence_test.go (new): catalogVersion, dedupValidatorImages, validatorImagesForPredicate helpers; and buildPointerInputs covers the three outcome shapes (unsigned, signed-with-rekor, signed-without-rekor). Notes: - ValidatorImages.Digest remains blank in this PR. The catalog records image refs by tag, not by digest; resolving each ref to a digest needs a registry round-trip per image, which validate's hot path avoids. Audit item #10 (BOM image digests) tracks the longer-term resolver path; the same resolver will populate this field when it lands.
Audit findings #5 and #6: the pointer file was duplicating fingerprint, criteriaMatch, phaseSummary, and attestedAt — nearly a copy of the predicate, with no Go consumer reading the denormalized fields. Two sources of truth with no good answer for which to trust on mismatch. A pointer's job is to *locate* the signed bundle, not to summarize it. A reviewer who wants fingerprint dimensions or per-phase pass/fail counts fetches the bundle from PointerBundle.OCI and reads predicate.json — that's the authoritative copy and the only one. Before After ────── ───── schemaVersion schemaVersion recipe recipe attestations[]: attestations[]: bundle bundle signer signer (omitempty) attestedAt attestedAt fingerprint ──── DROPPED criteriaMatch ──── DROPPED phaseSummary ──── DROPPED (also resolves audit #6: pointer phaseSummary block omitted .skipped despite predicate carrying it) logsBundle ──── DROPPED (feature deferred in 0277fec; will return as omitempty when log capture lands) Wire changes: - PointerAttestation: 7 fields → 3. - PointerFingerprint, PointerCriteriaMatch, PointerPhaseStat, PointerLogsBundle types deleted along with their build helpers (pointerFingerprintFrom, pointerPhaseSummaryFrom). - PointerInputs.LogsBundle dropped — no caller ever populated it. - Schema version stays at 1.0.0; no committed pointer files exist yet and the design doc's '2.0.0 reserved for multi-instance' plan is preserved. Test coverage: - TestBuildPointer_OmitsDenormalizedFields: asserts the rendered YAML has no fingerprint:, criteriaMatch:, phaseSummary:, or logsBundle: keys. Catches regressions where a future commit adds a field back without updating the design rationale. - Existing fingerprint round-trip and logsBundle round-trip tests deleted (they asserted denormalization that no longer happens). Design doc (docs/design/007-recipe-evidence.md): - Pointer schema example shrunk to match. - Added 'The pointer is a locator, not a denormalized cache' paragraph documenting the rationale. - Verifier step 7 updated to read criteriaMatch from the predicate rather than the pointer. - Verifier step 11 (logs bundle verification) marked deferred pending log capture; spelled out that V1 pointers omit the field entirely so verifiers don't error on absence.
Pointed at a SHA in the deferred-logs-bundle note of the verifier flow. Commits get rewritten on rebase/squash/sign before merge so the reference would dangle after merge. Replaced with a one-line description of what was deferred and why.
Repro: 'aicr validate --emit-attestation <dir> --push <oci-ref>'
succeeds and pushes a valid OCI artifact, but leaves a stray file
named 'AICR Recipe Evidence' (with spaces) inside
<dir>/summary-bundle/. The pushed artifact is fine; only the
on-disk copy gets polluted.
Cause: preparePushDir had a shortcut returning sourceDir unchanged
when subDir was empty, so the oras file store ended up rooted in
the caller's directory. The OCI title annotation
('org.opencontainers.image.title') has been observed materializing
as a literal filename inside the file store's root, and that
filename then survives on disk.
Fix: drop the shortcut. Always create a temp directory and
hardlink the source contents into it, regardless of whether subDir
is set. Hardlinks keep this near-free (no extra disk space, no
copy time on the same filesystem) and keep the source directory
strictly read-only from the push path's perspective.
The existing subDir code path is now the common path; subDir == ''
just hardlinks the whole source tree to tempDir instead of joining
the subdir name onto src/dst.
Test: TestPreparePushDir/no_subdir_hardlinks_to_temp_dir,_leaves_source_untouched
asserts the invariant the bug violated — a snapshot of the source
directory's file set is identical before and after the call, and
the returned path is never equal to sourceDir.
Repro: 'aicr validate --recipe <real-recipe> --push <oci-ref>' against
a real cluster. After ~3 min of phases, sign fails with
'Fulcio returned 400: There was an error processing the identity
token'. Same flow with --no-cluster (sub-10s gap) signs+pushes cleanly.
Cause: pkg/cli/validate.go resolved the Sigstore identity token as
part of command setup, before runValidation started. Fulcio binds
the token to a fresh nonce at issue time; by the time
attestation.SignBundle ran several minutes later, the nonce-bound
token was stale.
Fix: drop the early ResolveOIDCToken call from validate.go. In its
place, recipeEvidenceConfig now carries OIDCResolve, the resolve-time
*inputs* (--identity-token flag, ACTIONS_ID_TOKEN_REQUEST_URL/TOKEN
env captures, --oidc-device-flow flag). signAndPushBundle calls
ResolveOIDCToken right before SignBundle so the token is always
freshly nonced when Fulcio sees it.
UX softening: an info-level log line lands right before the
resolution call ('resolving OIDC token for cosign keyless signing
(may prompt)'), so an operator running an interactive flow isn't
surprised by a browser prompt 3 min into a validate run.
ResolveOptions precedence is unchanged: --identity-token > ambient
GitHub Actions > --oidc-device-flow > interactive browser. CI
flows (GitHub Actions ambient, COSIGN_IDENTITY_TOKEN env) keep
working because the env capture happens in buildRecipeEvidenceConfig
at command-setup time and the actual fetch reads the captured
values at sign time — same source, fresher fetch.
The 'may prompt' log line was always-Info regardless of which resolution path was taken. For CI/programmatic flows (--identity-token / COSIGN_IDENTITY_TOKEN, GitHub Actions ambient OIDC) the message was misleading and added noise to build logs. Switch to a four-way conditional matching ResolveOIDCToken's own precedence chain: - identity-token / COSIGN_IDENTITY_TOKEN -> Debug - ambient GitHub Actions OIDC -> Debug - device-code flow -> Info (with code-entry note) - browser flow -> Info (with browser note) Programmatic paths now stay quiet at Info; interactive paths print a specific, actionable heads-up so the operator knows what to expect when the prompt fires several minutes into a validate run.
Today 'aicr validate --emit-attestation --push' packs the Sigstore
Bundle (attestation.intoto.jsonl) as a regular file *inside* the OCI
artifact, and the signed Statement's subject is the recipe digest.
cosign can't see signatures inside artifacts — it discovers them via
the OCI Distribution 1.1 Referrers API at /v2/<name>/referrers/<digest>
— so 'cosign verify-attestation <ref>' fails with 'no signatures
found' even after a successful push.
This reorders the push pipeline to pack-then-sign-then-attach, with
the pushed artifact's digest as the signed subject, and attaches the
Sigstore Bundle as an OCI Referrer of that digest. The recipe
identity is preserved in the predicate body so the chain back to
recipe content is still verifiable from the signed payload alone.
New sequence (cli/validate_evidence.go signAndPushBundle):
1. Push the bundle dir as an OCI artifact -> {digest, mediaType, size}.
2. Build an artifact-subject Statement
(subject.digest = artifactDigest, subject.name = oci ref).
3. Resolve OIDC token adjacent to sign (carried over from prior
fix; Fulcio binds the token to a fresh nonce at issue).
4. Sign the artifact-subject Statement -> Sigstore Bundle JSON.
5. Write the signed bytes into summary-bundle/ for local inspection
and attach the Sigstore Bundle as an OCI Referrer of the main
artifact (subject.digest = artifactDigest).
Predicate schema (unchanged version 1.0.0; nothing has shipped):
- New additive field 'recipe: {name, digest}' on Predicate so
pushed bundles can identify the recipe even though the signed
in-toto subject now points at the artifact, not the recipe.
- Build() populates RecipeRef from the existing canonicalized
recipe digest + recipe name.
API additions:
- attestation.BuildArtifactStatement(ociRef, artifactDigest, pred)
-> []byte. The push-path equivalent of BuildStatement. Requires
pred.Recipe.{Name,Digest} populated; rejects short/empty inputs.
- attestation.AttachSigstoreBundleAsReferrer(ctx, opts) -> *PushResult.
Wraps the new generic oci.PushReferrer with the Sigstore Bundle
media type. Builds a subject descriptor from {Digest, MediaType,
Size} and pushes a single-layer manifest with that Subject set.
- attestation.WriteSignedAttestation(b, bytes) — extracted from
SignBundle so the push path can write the local copy after signing
a different (artifact-subject) statement.
- attestation.SigstoreBundleMediaType constant
(application/vnd.dev.sigstore.bundle.v0.3+json).
- oci.PushReferrer / oci.ReferrerOptions — generic primitive for
pushing a single-layer manifest with a Subject pointing at an
existing artifact. packReferrer is exposed as an internal helper
that handles the file store + manifest pack so tests can inspect
the manifest body without a real registry.
- oci.PackageAndPushResult / oci.PushResult / oci.PackageResult
now propagate MediaType + Size so callers can build a subject
descriptor without re-fetching the manifest.
Local --emit-attestation <dir> (no --push) path is unchanged: the
unsigned recipe-subject Statement still lands in summary-bundle/
as statement.intoto.json. The push path additionally writes the
signed Sigstore Bundle into the same dir for inspection — the
pushed OCI artifact does NOT carry that file (it was signed after
push), but the local copy lets operators verify the signature
shape without round-tripping to the registry.
Tests:
- TestBuildArtifactStatement_SubjectIsArtifactDigest: the signed
Statement's subject.digest is the artifact digest, NOT the recipe
digest; predicate.recipe.{name,digest} carry the recipe identity.
- TestBuildArtifactStatement_RejectsBadInputs: validation for
every required input.
- TestPackReferrer_ManifestHasSubject: the cosign-discovery
regression guard. Inspects the packed manifest bytes and asserts
Subject is set with the correct digest/mediaType/size. If
packReferrer ever stops setting Subject, this fails loudly.
- TestPackReferrer_RejectsMissingFields: per-input validation.
Skipped: full e2e against cosign + a real registry:2 + ambient OIDC.
The Referrer mechanism is the bug-relevant surface and is unit-tested
above; the cosign exec + Sigstore root-of-trust path is exercised
by upstream cosign's own tests.
Acceptance:
cosign verify-attestation <ref> \
--type 'https://aicr.nvidia.com/recipe-evidence/v1' \
--certificate-identity <signer> --certificate-oidc-issuer <issuer>
now succeeds against a real registry (manual verification expected
on first --push after this commit lands).
…ry OIDC comment - `--push` flag usage said keyless signing requires SIGSTORE_ID_TOKEN. The resolver chain has been --identity-token / COSIGN_IDENTITY_TOKEN / ambient GitHub Actions / device flow / interactive browser since the bundler primitive was reused for validate. Document the actual chain. - emitRecipeEvidence's behavior-matrix docstring carried the same stale env-var name. Reword to point at bundleattest.ResolveOIDCToken. - Drop the 9-line block explaining why the OIDC token is no longer resolved up front. The rationale belongs in the original fix's commit message, not the call site — what the call site needs to show is just the absence of the resolve call.
…on surface
Comment-shrinking pass across pkg/evidence/attestation and its callers.
The behavior is unchanged; net 421 deletions vs 154 insertions are
mostly multi-paragraph field documentation collapsed to the load-bearing
sentence (the why, not the what). Field names already say what each
field is for.
Touched: pkg/evidence/attestation/{builder,canonicalize,doc,manifest,
oci,pointer,predicate,signer,types}.go, pkg/cli/{validate,
validate_evidence}.go, pkg/config/{config,resolve}.go, pkg/defaults/
timeouts.go, pkg/bundler/attestation/resolver.go, pkg/oci/push.go,
plus aligned test/companion files.
Five small simplifications that fall out of the prior doc-trim: - pkg/evidence/attestation.SignResult is now a type alias of bundleattest.SignedAttestation. The KeylessSigner.Sign method no longer needs to copy fields one-by-one between the two struct types. - pkg/cli.recipeEvidenceConfig.OIDCResolve uses bundleattest.ResolveOptions directly instead of mirroring its fields in a local OIDCResolveInputs type. PromptWriter is set just before the resolver call. - pkg/evidence/attestation.CleanOCIRef was a one-line wrapper around oci.TrimScheme. Drop it; callers use oci.TrimScheme directly. - The recipeName == "" fallback in buildAutoBOM is redundant since attestation.RecipeNameFor already returns "recipe" for empty input via the defaultRecipeName const. - pkg/oci.packReferrer returned a manifestDesc the caller immediately dropped (the tag and digest are equivalent post-copy). Stop returning the unused value. Behavior unchanged. Tests + lint pass.
Brings pkg/cli/validate_evidence_test.go from the small predicate-input
coverage it shipped with up to the orchestration surface that the
--emit-attestation flag family actually runs through. Coverage on
pkg/cli rises from 58.9% to 64.1% and the project total clears the 75%
gate.
New cases:
* buildRecipeEvidenceConfig — table-driven over the four
flag/config permutations (none, flag-only, config-only,
flag-overrides-config) including the OIDC resolve-inputs.
* signAndPushBundle — the --push="" early-return branch.
* observedImagesFromSnapshot — nil/empty, non-K8s rejection,
non-image subtype rejection, happy path, dedup.
* loadOrGenerateBOM — explicit-path read, missing-path error,
auto-generation fallback when no path is supplied.
* buildAutoBOM — recipe + validator-catalog input produces a
CycloneDX BOM with gpu-operator + validators components and
omits disabled components.
* emitRecipeEvidence — happy-path (no push) end-to-end: writes
pointer.yaml plus the full summary-bundle file set; rejects a
malformed --push reference up front.
The --push-set branch of signAndPushBundle and the body of
pushArtifact remain uncovered: both require a live OCI registry plus
Sigstore Fulcio/Rekor and are exercised by the manual smoke tests
documented in the PR description.
|
Whoa, that escalated quickly... 23 commits, 4618 additions, 28 files ;) |
This comment was marked as resolved.
This comment was marked as resolved.
The K8s collector emits a "image" subtype carrying the unique container images observed across pods (one entry per <name>:<tag>). The recipe-evidence emit path in pkg/cli/validate_evidence.go matches on that subtype name to harvest the same set for the auto-generated BOM, and was inlining the literal "image" string — fragile, and trivially out of sync if the collector ever renamed the subtype. Export the literal as pkg/collector/k8s.SubtypeImage and use it on both ends. The collector test's local constant becomes an alias to keep the call site shape; nothing else changes.
mchmarny
left a comment
There was a problem hiding this comment.
Solid end-to-end work — bundle layout, in-toto/Sigstore plumbing, and the GB200/EKS test run all look right. Requesting changes on architectural seams that will be hard to walk back later: parallel attestation packages (pkg/evidence/attestation duplicating pkg/bundler/attestation signer surface), inconsistent OIDC token-resolution timing between aicr validate --push and aicr bundle --attest, and a 465-line orchestration file in pkg/cli carrying business logic that should live in functional packages per CLAUDE.md. Plus dead log-handling scaffolding with an incomplete trust boundary, and a brittle snapshot→BOM coupling without contract tests. Details inline.
Five spec-and-safety nits surfaced in coderabbit review: * pkg/config: switch the optional bool fields on EvidenceCNCFSpec and EvidenceAttestationSpec (CNCFSubmission, PlainHTTP, InsecureTLS) to *bool. The wire form already used `omitempty` so an absent key and an explicit `false` looked identical at the spec layer; the resolved layer flattens to plain bool via a new boolPtrOrFalse helper. Lets future spec consumers distinguish absent from explicit-off without changing today's behavior (both flatten to false downstream). * pkg/oci/push.go preparePushDir: reject non-local subDir values (absolute paths, parent-traversal, reserved Windows names) before filepath.Join can stitch them into a hardlink target outside the temp dir or a source outside the caller's sourceDir. Defense in depth — the only in-tree caller passes a constant — but the function is public and the cost of the check is one syscall. * pkg/evidence/attestation/oci.go: wrap the three bare error returns from Push and AttachSigstoreBundleAsReferrer with PropagateOrWrap. Internal callers already got structured errors via the wrapped oci.PackageAndPush, but the bare Push() return path was leaking whatever pkg/oci handed back unwrapped. Aligns this surface with the rest of pkg/evidence/attestation. * docs/design/007-recipe-evidence.md: mark --include-logs and --push-logs as deferred (log capture is not implemented in V1) so the design doc and the CLI surface agree. Fix a docs-side mention of a top-level predicate.json file — the predicate body lives nested inside statement.intoto.json.
The recipe-evidence orchestration (bundle build, sign, push, pointer
write, BOM auto-generation, OIDC token resolution) lived in
pkg/cli/validate_evidence.go alongside CLI plumbing. That violated
the project's split between user-interaction packages (pkg/cli,
pkg/api) and business-logic packages: the API server and any future
non-CLI caller couldn't reuse the orchestration without
re-implementing it.
Move the orchestration into pkg/evidence/attestation as a typed
library:
- attestation.Emit(ctx, opts) — full emit→sign→push→pointer pipeline.
- attestation.LoadOrGenerateBOM / BuildAutoBOM — BOM resolution
that was the same shape inside the CLI helper.
- attestation.ObservedImagesFromSnapshot, DedupValidatorImages,
CatalogVersion, ValidatorImagesForPredicate — BOM-input helpers
promoted to package surface.
pkg/cli/validate_evidence.go becomes a thin caller: parse flags,
build EmitOptions, call attestation.Emit. Tests for the moved
helpers move with them (pkg/evidence/attestation/{emit,bom}_test.go);
pkg/cli/validate_evidence_test.go shrinks to flag-plumbing and
pointer-input cases that genuinely belong in the CLI layer.
Side effects:
- pkg/bundler/attestation.ResolveOptions gains a helper for the
log-mode classification the emit pipeline needs; pkg/cli/bundle.go
picks up the same helper to keep the logging contract consistent
between aicr bundle --attest and aicr validate --emit-attestation.
- SignResult is removed as a type alias; callers use
bundleattest.SignedAttestation directly now that the orchestration
isn't in pkg/cli where the alias previously kept imports tidy.
No CLI surface change, no wire-format change. make qualify clean;
coverage gate holds at 75.1%.
…tester locking Five small but real fixes: * pkg/oci/push.go: when sourceDir and $TMPDIR are on different filesystems (tmpfs in containers, NFS-mounted workspaces, overlay inside CI runners), os.Link returns EXDEV and the push fails before oras even sees the layer. Add copyDir as a streaming fallback so the push still succeeds; the temp dir still gives the oras file store its own root, so the "annotation-as-filename" leak the always-isolate refactor was meant to prevent stays prevented. * pkg/defaults + pkg/evidence/attestation/bom.go: cap the operator- supplied --bom file at 8 MiB via io.LimitReader. The previous os.ReadFile would happily allocate whatever a symlinked /proc or a hostile NFS mount handed it. 8 MiB covers the largest observed recipe BOMs with headroom; existing in-tree BOMs are a few hundred KiB. * pkg/bundler/attestation/resolver.go: serialize LazyKeylessAttester lazy init and the Identity() read with a sync.Mutex. The bundler doesn't invoke Attest concurrently today, but the Attester interface is held across enough call sites that defensive locking is cheaper than the next data-race bug. * pkg/config/config.go: drop the stale "Evidence intentionally not in ValidateSpec yet" paragraph. EvidenceSpec landed earlier in this branch; the docstring now points the reader at it. * docs/design/007-recipe-evidence.md: tighten the wording around the evidence flag surface to match the shipped behavior. No CLI or wire-format change. make qualify clean; coverage 75.1%.
…d OIDC Three coupled correctness fixes: * pkg/oci/push.go: thread context.Context through preparePushDir, hardLinkDir, copyDir, and copyFile. The directory walks ran unbounded; a parent timeout (push timeout, server shutdown signal, user Ctrl-C) couldn't cancel staging mid-flight, so a hostile or pathological source tree could pin the staging temp dir indefinitely. Per-entry context checks let cancellation propagate; copyFile checks on entry only since a single file's copy is bounded by its own size, not by walk duration. Canceled walks return ErrCodeUnavailable so the caller can distinguish "I asked it to stop" from "internal failure." * pkg/evidence/attestation/emit.go: wrap the interactive OIDC token resolve in defaults.OIDCAuthTimeout. Browser callback and device-code flows wait on the user; without a cap, a stalled browser tab holds the sign context open for the whole bundle-sign budget. Pre-fetched and ambient paths complete well under the bound, so the cap only kicks in on genuinely interactive paths. * pkg/evidence/attestation/bom.go: BuildAutoBOM now rejects a nil *recipe.RecipeResult up front with an InvalidRequest error instead of panicking inside the ComponentRefs walk. Add a test to lock the contract. No CLI or wire-format change. make qualify clean.
…aults.EvidenceBundlePushTimeout Encode the network-bound contract on the public functions instead of trusting every caller to wrap them in context.WithTimeout. The current emit pipeline already imposes the same cap at the call site, so behavior is unchanged — but any future caller that passes a longer-lived ctx now gets an opinionated upper bound on the registry round-trip rather than open-ended hang potential. Touches: - attestation.Push: wrap oci.PackageAndPush in defaults.EvidenceBundlePushTimeout - attestation.AttachSigstoreBundleAsReferrer: same for oci.PushReferrer
Summary
Adds Recipe Evidence Bundle v1 emission (ADR-007) to
aicr validate: signed, in-toto / Sigstore-bundled, cosign-verifiable attestations of a validate run, optionally pushed to an OCI registry as an artifact + Referrers-attached signature.Motivation / Context
Recipe-test evidence (ADR-007) is the missing piece between "this recipe was advertised as working on hardware X" and "someone signed off on actually proving it." This PR delivers the emit half: a contributor (or CI) running
aicr validateagainst a real cluster can produce a tamper-evident artifact binding the resolved recipe, the captured snapshot, per-phase CTRF reports, a CycloneDX BOM, and a fingerprint of the cluster, signed by Sigstore keyless OIDC and discoverable on the registry via the OCI Referrers API.Fixes: #754
Related: #859 (filed during testing — separate bundler deadlock surfaced on clean GB200/EKS clusters, not addressed here)
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)pkg/validator)pkg/bundler,pkg/component/*) — onlypkg/bundler/attestation, reused as a primitivedocs/,examples/)pkg/evidence/attestation(new),pkg/config,pkg/ociImplementation Notes
Surface
New flags on
aicr validate:--emit-attestation <dir>--bom <path>--emit-attestation(optional)make bom).--push <oci-ref>--emit-attestation--identity-token,--oidc-device-flow,--plain-http,--insecure-tls--pushaicr bundle --attestknobs. OIDC chain:--identity-token→COSIGN_IDENTITY_TOKENenv → ambient GH Actions OIDC → device flow → interactive browser.A new
spec.validate.evidenceblock in AICRConfig (pkg/config) wires the same surface through--config.Bundle layout
When
--pushis set, the Sigstore Bundle v0.3 is attached as a separate OCI artifact whose manifest references the main artifact's digest via the OCIsubjectfield, socosign verify-attestation <ref>discovers it via/v2/<name>/referrers/<digest>. The signed Statement'ssubject[0]is the OCI artifact digest (anchoring cosign verification to the artifact ref); the recipe digest is preserved in the predicate body (predicate.recipe.{name,digest}) for portable provenance even if the artifact is later repackaged.Notable design choices
pkg/oci/push.gopreviously returnedsourceDirdirectly whensubDir == ""; that shortcut let the oras file store leak annotation-named manifest blobs back into the caller's source tree (regression test inpkg/oci/push_test.go). Every push — both the new evidence flow and the existingpkg/cli/bundle.goPackageAndPush— now copies through a tempdir via hardlink. Hardlink walk cost is negligible on typical bundles; reviewers touching the bundle path should note the behavior shift.<image>.sigsibling tag) are not produced — Referrers covers all modern registries (ttl.sh, ghcr.io, ECR, Harbor, GHCR).aicr verify-evidenceyet: verification is via standardcosign verify-attestation. The dedicated wrapper command (aicr verify-evidence command for reviewers and CI #753) is intentional out-of-scope follow-up.pointer.yamlis local-only metadata, deliberately unsigned. Every claim in it (artifact digest, signer identity, Rekor log index) is independently verifiable from the bundle it references. Contributors commit it torecipes/evidence/<recipe>.yamlto publish a reviewable reference.pkg/bundler/attestationprimitive. No new sign/verify code paths;aicr validate --pushandaicr bundle --attestshare the same precedence chain and the same Fulcio/Rekor URLs.Out of scope (intentional)
aicr verify-evidenceCLI — aicr verify-evidence command for reviewers and CI #753 follow-up.--include-logs/--push-logs— flags were prototyped, then dropped pending validator log-capture rework. The manifest already pre-commits log file digests when logs are produced; the directory emission can be wired later without schema change.stdoutredaction — CTRF stdout currently carries cluster-specific data (hostnames, pod UUIDs, GPU UUIDs). Worth scrubbing in a follow-up; not blocking for the audit + sign + verify flow.Testing
End-to-end run on a real GB200/EKS cluster (6 nodes, 2 GB200, EKS 1.34.7, ubuntu 24.04):
Sign + push outcome:
sha256:80307d35…packed and pushed before OAuth (avoids token-staleness)sha256:56bb53ab…→ main artifact)cosign verify-attestation <ref> --type 'https://aicr.nvidia.com/recipe-evidence/v1' --certificate-identity ... --certificate-oidc-issuer ...→ Verified OK (cosign claims + Rekor offline + cert chain)oras discover <ref>cleanly shows the Referrer relationshipmake qualify # passes locally; coverage gate cleanUnit + integration tests cover:
registry:2NoOpSignerfor offline testsRisk Assessment
pkg/bundler/attestationprimitive (proven byaicr bundle --attest). The default behavior (no--emit-attestation) is completely unchanged. The--emit-attestation(no--push) path is fully offline and safe to ship.Rollout notes: purely additive. No flag renames, no config-schema breaking changes. Operators not using
--emit-attestationsee no behavior change. Therecipes/evidence/<recipe>.yamlconvention is documented but no recipes ship a pointer file in this PR.Checklist
make testwith-race)make lint)docs/user/cli-reference.md,docs/design/007-recipe-evidence.md)git commit -S)