feat(tools): add local Talos cluster + snapshot chainsaw test#762
Conversation
|
🌿 Preview your docs: https://nvidia-preview-feat-talos-local-test.docs.buildwithfern.com/aicr |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 660-665: The Makefile target talos-dev-env passes TALOS_KUBECONFIG
into the script as KUBECONFIG_OUT which is correct but mismatches the README
variable name and can confuse users; update either the Makefile or the
documentation for consistency: either (A) change the environment var passed to
./tools/talos-test/up.sh to use TALOS_KUBECONFIG as the name consumed by the
script, or (B) add a one-line comment above the talos-dev-env target explaining
that TALOS_KUBECONFIG is forwarded into up.sh as KUBECONFIG_OUT, and optionally
update README.md to mention KUBECONFIG_OUT as the internal name; locate the
talos-dev-env target in the Makefile and the up.sh script to align names or add
the clarifying comment referencing KUBECONFIG_OUT, TALOS_KUBECONFIG, and
./tools/talos-test/up.sh.
In `@tests/chainsaw/snapshot/deploy-agent-talos/chainsaw-test.yaml`:
- Around line 78-91: The test uses a hardcoded
WORK="/tmp/chainsaw-snapshot-talos" which can collide in concurrent runs; change
the script that defines WORK to create a unique temp dir (e.g., assign
WORK="$(mktemp -d)" or similar) and update all uses of WORK in the "script"
block where chainsaw assert runs and in the "cleanup" block so the cleanup
removes that same unique directory (ensure mktemp failure is handled or the
script exits if WORK is empty).
In `@tools/talos-test/README.md`:
- Around line 66-84: Update the Troubleshooting section to use consistent
formatting by wrapping each problem statement/error condition in backticks
(e.g., `talosctl: command not found`, `localhost:5001 registry not reachable`,
`Image pull fails inside the Talos node with a TLS error`,
`localhost:5001/aicr:local not found inside the cluster`) and correct the
wording that currently reads "kind path" to "kind cluster" to avoid confusion;
ensure these edits are applied under the "Troubleshooting" heading and preserve
the explanatory text that follows each backticked problem.
- Around line 86-93: Replace the vague phrase "the kind path uses" in the "Why a
separate cluster?" paragraph with a clearer reference to the kind-based
development environment (e.g., "the path used by the kind-based dev environment"
or "the path used by KinD clusters"), so the sentence comparing Talos to kind
explicitly mentions KinD/kind-based dev environments; update the sentence
containing the string "the kind path uses" to use the clearer wording while
keeping the rest of the paragraph intact.
In `@tools/talos-test/up.sh`:
- Around line 47-52: The docker invocation in the up.sh script uses the floating
image tag alpine:3 which can change; update the command that contains "docker
run ... alpine:3 sh -c 'modprobe br_netfilter ...'" to reference a pinned Alpine
tag or digest (e.g., a specific version tag or `@sha256` digest) so the modprobe
check is reproducible in CI; replace the literal "alpine:3" string with the
chosen pinned image reference throughout the script and keep the rest of the
command unchanged.
- Around line 70-75: The talosctl invocation mistakenly includes the deprecated
"docker" subcommand; update the command in the script where talosctl cluster
create is called (referenced by variables CLUSTER_NAME, TALOS_VERSION,
PATCH_FILE, MIRROR_HOST) to remove the "docker" token so it uses the form
talosctl cluster create with flags --name "${CLUSTER_NAME}" --workers 1 --image
"ghcr.io/siderolabs/talos:${TALOS_VERSION}" --config-patch "@${PATCH_FILE}"
(keep the existing flags and variables intact, only remove the literal
"docker").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 5f48d4ab-cea7-425a-a581-a82c58392934
📒 Files selected for processing (12)
Makefiledocs/user/cli-reference.mdpkg/cli/snapshot.gopkg/k8s/agent/job.gopkg/k8s/agent/job_test.gopkg/k8s/agent/types.gopkg/snapshotter/agent.gotests/chainsaw/snapshot/deploy-agent-talos/assert-snapshot-content.yamltests/chainsaw/snapshot/deploy-agent-talos/chainsaw-test.yamltools/talos-test/README.mdtools/talos-test/down.shtools/talos-test/up.sh
mchmarny
left a comment
There was a problem hiding this comment.
Solid PR — the Talos OS=talos pod-shape branch from #714 finally has end-to-end coverage, and the --requests/--limits plumbing is clean (separate types, merge-not-replace semantics, defaults preserved, non-mutating helper, unit tests exercising all three branches).
One medium concern on the --require-gpu vs --limits nvidia.com/gpu=N precedence — current behavior silently overrides the explicit limit and the CLI help text reads ambiguously. Worth deciding whether --require-gpu is "set if unset" or "always force to 1" and asserting it in tests. Plus a few nits on duplicate-key handling, image pinning in dev tooling, multi-arch find logic, and surfacing the namespace PSP relabel in the README. Nothing blocking; CI is still green where complete.
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
…re-gpu PR #762 review (mchmarny): RequireGPU=true silently overwrote any nvidia.com/gpu key the caller put in --limits, because the GPU default was unconditionally written after mergeResourceList. A user invoking --require-gpu --limits nvidia.com/gpu=4 ended up with gpu=1. Now RequireGPU only inserts the default of 1 when the merged limits don't already contain nvidia.com/gpu — caller override wins. Updated the agent.Config.Limits and snapshotter.AgentConfig.Limits doc comments, the --limits CLI flag help, and docs/user/cli-reference.md to match. Added a test case in TestApplyPrivilegedSettings_ResourceOverrides covering the override-wins path; renamed the existing case to make the default-when-absent path explicit.
PR #762 review (mchmarny): parseResourceList silently last-write-wins on duplicate keys (e.g. --requests cpu=1,cpu=2 produced cpu=2 with no warning). Easy footgun with shell-templated invocations and AICR_REQUESTS / AICR_LIMITS env-var composition. Now returns an explicit 'duplicate key %q' error. Added TestParseResourceList in pkg/cli/snapshot_test.go covering happy paths, all existing error paths, and both the bare and whitespace-normalized duplicate-key cases.
PR #762 review (mchmarny): the previous 'find dist -maxdepth 1 -type d -name aicr_*' returned alphabetical-first, so on a multi-platform goreleaser build it could pick the wrong arch (e.g. aicr_darwin_amd64_v3 before aicr_darwin_arm64_v8.0). The chosen binary would then silently fail to exec on the host. Now the find pattern is anchored to 'go env GOOS' / 'go env GOARCH', so the lookup is deterministic regardless of which platforms a prior 'make build' produced. Error message also updated to surface the host-specific path it expected.
PR #762 review (mchmarny): up.sh permanently labels the 'default' namespace with pod-security.kubernetes.io/enforce=privileged (plus audit/warn). That posture change persists past the snapshot test — anything else a developer schedules in 'default' afterward also runs at the privileged baseline. New 'Side effects' section in tools/talos-test/README.md surfaces both posture changes (the namespace relabel and the br_netfilter modprobe on the host VM), explains why each is required, and notes that 'talos-dev-env-clean' does not revert them — restart the host runtime VM to drop br_netfilter; recreate the namespace to drop the PSS labels.
|
@ayuskauskas this PR now has merge conflicts with |
…/AICR_LIMITS PR #762 review (CodeRabbit, after fixup commits): * parseResourceList previously emitted plain fmt.Errorf errors, which violates the project rule that CLI-path validation errors carry pkg/errors codes. Switch all four return sites to errors.New(ErrCodeInvalidRequest, ...) (or errors.Wrap for the resource.ParseQuantity case where preserving the inner error is useful), and switch the two call sites in snapshotCmd's Action from errors.Wrap to errors.PropagateOrWrap so the inner code propagates unmodified instead of being re-wrapped. * docs/user/cli-reference.md's centralized 'Environment Variables' table omitted AICR_REQUESTS and AICR_LIMITS even though both are documented inline on the per-flag table. Add rows for both, pointing at the corresponding --requests / --limits flag and reproducing the --require-gpu precedence note. Tests: go test -run TestParseResourceList ./pkg/cli/... -v # all pass golangci-lint run -c .golangci.yaml ./pkg/cli/... # 0 issues make lint-yaml # pass
Four trivial-but-real cleanups, one per CodeRabbit thread: * tools/talos-test/up.sh: pin alpine:3 -> alpine:3.21 for the br_netfilter modprobe sidecar (matches existing pinning convention in docs/integrator/validator-extension.md). Also updates the troubleshooting hint string to reflect the pinned tag. Resolves: 3190760718 * Makefile: add a one-line comment above the talos-dev-env recipe documenting the TALOS_KUBECONFIG (user-facing) -> KUBECONFIG_OUT (script-internal) variable indirection so a reader of the Makefile alone does not have to dive into up.sh to understand the rename. Resolves: 3190760664 * tests/chainsaw/snapshot/deploy-agent-talos/chainsaw-test.yaml: switch the assert step from a fixed /tmp/chainsaw-snapshot-talos path to a per-run mktemp -d directory with a trap-based cleanup so parallel CI shards on a shared runner cannot collide. Drops the old fixed-path rm -rf from the cleanup block. Resolves: 3190760666 * tools/talos-test/README.md: bold-format the four troubleshooting problem statements for consistency, fix two 'kind path' references to read 'kind-based dev cluster', and replace 'Bring it up' with 'Start it' in the registry section (the LanguageTool flag CodeRabbit surfaced). Resolves: 3190760688, 3190760707
Adds an opt-in, kind-equivalent harness for spinning up a local Talos Linux cluster on a developer laptop and exercising the AICR snapshot agent against it. Closes the integration-test gap left by #714 (Talos collector) — the OS=talos pod-shape branch in pkg/k8s/agent/job.go was unit-tested only. The harness has two independent pieces stitched together by make: * tools/talos-test/{up.sh,down.sh,README.md}: cluster lifecycle. Wraps 'talosctl cluster create docker' (newer talosctl moved provisioners to subcommands), threads a containerd registry-mirror config patch so 'localhost:5001/aicr:local' resolves from inside Talos nodes, and on macOS Docker Desktop / Podman Machine loads br_netfilter via a one-shot privileged sidecar so the default Flannel CNI works. Streams 'talosctl health' progress, derives the apid endpoint and K8s API server URL from the docker-NAT'd host ports, rewrites the kubeconfig to the host-routable form, and relaxes Pod Security Standards on the default namespace so the privileged agent pod can be scheduled. Documented prerequisites and install commands for talosctl in the README. * tests/chainsaw/snapshot/deploy-agent-talos/: chainsaw test. Invokes 'aicr snapshot --os talos -o cm://default/aicr-e2e-snapshot' from a script step; the CLI exercises pkg/k8s/agent.Deployer end-to-end (RBAC creation, OS=talos pod shape, log streaming, Job completion, ConfigMap write, cleanup). Asserts only the surviving ConfigMap content. Uses chainsaw JMESPath filters ((sort([*].type)), (@[?type == 'X'] | [0].subtypes | sort(...))) so the assertion is order-independent at both the measurement-type and per-type subtype levels. Make targets (Makefile): talos-dev-env spin up cluster talos-dev-env-clean destroy cluster talos-snapshot-test build + run chainsaw against the live cluster None are wired into 'make qualify'; Talos is opt-in for now. CLI: The agent container's resource requests and limits were hardcoded (Privileged: 4Gi req / 8Gi lim memory; Restricted: 256Mi / 512Mi). That sizing fits production GPU nodes but blocks scheduling on small dev clusters such as the talosctl Docker provisioner default (2Gi worker), where there is no node-level option to bypass. Adds: --requests 'cpu=500m,memory=1Gi,ephemeral-storage=1Gi' --limits 'cpu=1,memory=2Gi,ephemeral-storage=2Gi' Each accepts a comma-separated 'name=quantity' list. Unspecified keys fall back to the existing per-mode defaults so the production path is unchanged. The nvidia.com/gpu limit injected by --require-gpu is preserved on top of any override. Plumbed through pkg/snapshotter.AgentConfig and pkg/k8s/agent.Config as corev1.ResourceList; applyPrivilegedSettings and applyRestrictedSettings merge overrides over the defaults via a new mergeResourceList helper. Tests: pkg/k8s/agent: TestApplyPrivilegedSettings_ResourceOverrides (no override, partial override, RequireGPU + override), TestApplyRestrictedSettings_ResourceOverrides, TestMergeResourceList (input-immutability invariant). Docs: docs/user/cli-reference.md --requests / --limits flag table entries tools/talos-test/README.md prerequisites, quickstart, customization, troubleshooting Components Affected: CLI (pkg/cli) Agent (pkg/k8s/agent, pkg/snapshotter) Tests/CI (tests/chainsaw/snapshot/deploy-agent-talos, tools/talos-test) Build (Makefile) Docs (docs/user/cli-reference.md) Tested locally end-to-end on macOS Podman Machine (Talos v1.9.0 + talosctl 1.13.0) — bring-up, agent deployment, snapshot assertion, teardown all pass.
…re-gpu PR #762 review (mchmarny): RequireGPU=true silently overwrote any nvidia.com/gpu key the caller put in --limits, because the GPU default was unconditionally written after mergeResourceList. A user invoking --require-gpu --limits nvidia.com/gpu=4 ended up with gpu=1. Now RequireGPU only inserts the default of 1 when the merged limits don't already contain nvidia.com/gpu — caller override wins. Updated the agent.Config.Limits and snapshotter.AgentConfig.Limits doc comments, the --limits CLI flag help, and docs/user/cli-reference.md to match. Added a test case in TestApplyPrivilegedSettings_ResourceOverrides covering the override-wins path; renamed the existing case to make the default-when-absent path explicit.
PR #762 review (mchmarny): parseResourceList silently last-write-wins on duplicate keys (e.g. --requests cpu=1,cpu=2 produced cpu=2 with no warning). Easy footgun with shell-templated invocations and AICR_REQUESTS / AICR_LIMITS env-var composition. Now returns an explicit 'duplicate key %q' error. Added TestParseResourceList in pkg/cli/snapshot_test.go covering happy paths, all existing error paths, and both the bare and whitespace-normalized duplicate-key cases.
PR #762 review (mchmarny): the previous 'find dist -maxdepth 1 -type d -name aicr_*' returned alphabetical-first, so on a multi-platform goreleaser build it could pick the wrong arch (e.g. aicr_darwin_amd64_v3 before aicr_darwin_arm64_v8.0). The chosen binary would then silently fail to exec on the host. Now the find pattern is anchored to 'go env GOOS' / 'go env GOARCH', so the lookup is deterministic regardless of which platforms a prior 'make build' produced. Error message also updated to surface the host-specific path it expected.
PR #762 review (mchmarny): up.sh permanently labels the 'default' namespace with pod-security.kubernetes.io/enforce=privileged (plus audit/warn). That posture change persists past the snapshot test — anything else a developer schedules in 'default' afterward also runs at the privileged baseline. New 'Side effects' section in tools/talos-test/README.md surfaces both posture changes (the namespace relabel and the br_netfilter modprobe on the host VM), explains why each is required, and notes that 'talos-dev-env-clean' does not revert them — restart the host runtime VM to drop br_netfilter; recreate the namespace to drop the PSS labels.
…/AICR_LIMITS PR #762 review (CodeRabbit, after fixup commits): * parseResourceList previously emitted plain fmt.Errorf errors, which violates the project rule that CLI-path validation errors carry pkg/errors codes. Switch all four return sites to errors.New(ErrCodeInvalidRequest, ...) (or errors.Wrap for the resource.ParseQuantity case where preserving the inner error is useful), and switch the two call sites in snapshotCmd's Action from errors.Wrap to errors.PropagateOrWrap so the inner code propagates unmodified instead of being re-wrapped. * docs/user/cli-reference.md's centralized 'Environment Variables' table omitted AICR_REQUESTS and AICR_LIMITS even though both are documented inline on the per-flag table. Add rows for both, pointing at the corresponding --requests / --limits flag and reproducing the --require-gpu precedence note. Tests: go test -run TestParseResourceList ./pkg/cli/... -v # all pass golangci-lint run -c .golangci.yaml ./pkg/cli/... # 0 issues make lint-yaml # pass
Four trivial-but-real cleanups, one per CodeRabbit thread: * tools/talos-test/up.sh: pin alpine:3 -> alpine:3.21 for the br_netfilter modprobe sidecar (matches existing pinning convention in docs/integrator/validator-extension.md). Also updates the troubleshooting hint string to reflect the pinned tag. Resolves: 3190760718 * Makefile: add a one-line comment above the talos-dev-env recipe documenting the TALOS_KUBECONFIG (user-facing) -> KUBECONFIG_OUT (script-internal) variable indirection so a reader of the Makefile alone does not have to dive into up.sh to understand the rename. Resolves: 3190760664 * tests/chainsaw/snapshot/deploy-agent-talos/chainsaw-test.yaml: switch the assert step from a fixed /tmp/chainsaw-snapshot-talos path to a per-run mktemp -d directory with a trap-based cleanup so parallel CI shards on a shared runner cannot collide. Drops the old fixed-path rm -rf from the cleanup block. Resolves: 3190760666 * tools/talos-test/README.md: bold-format the four troubleshooting problem statements for consistency, fix two 'kind path' references to read 'kind-based dev cluster', and replace 'Bring it up' with 'Start it' in the registry section (the LanguageTool flag CodeRabbit surfaced). Resolves: 3190760688, 3190760707
…CR_LIMITS env doc PR #762 review (CodeRabbit, post-rebase batch): * parseResourceList accepted negative quantities like '--requests cpu=-1' or '--limits memory=-1Gi'; the failure surfaced only later when the K8s API rejected the Job. Now rejected at parse time with a structured ErrCodeInvalidRequest carrying the offending entry. Zero is still accepted (legitimate for some resources). Test cases added to TestParseResourceList: negative cpu, negative memory with suffix, negative-in-second-entry, and a positive-zero case for the boundary. Resolves: 3203439877 * Environment Variables table's AICR_LIMITS row in docs/user/cli-reference.md was missing the 'Unspecified resources keep the built-in defaults' clause that the --limits flag's inline doc had. Added it; AICR_REQUESTS already had the analogous wording. Resolves: 3203439871 Tested: go test -run TestParseResourceList -v ./pkg/cli/... # all pass
…ence PR #762 review (CodeRabbit, post-rebase batch): * up.sh's preflight 'for tool in talosctl docker kubectl chainsaw' loop did not include curl, but the next step calls 'curl -sf http://localhost:5001/v2/' for the registry check. On a host without curl that check would have failed and the script would have printed the misleading 'registry not reachable' remediation. Add curl to the preflight list and to the README prerequisites. Resolves: 3203439882 * up.sh writes the cluster's apid endpoint and node IP to the user's default talosctl config (~/.talos/config) via 'talosctl config endpoint/node'. This persists past 'make talos-dev-env-clean', leaving subsequent interactive talosctl commands pointed at a destroyed cluster. Promoted the Side effects section from two to three items, documented the TALOSCONFIG=$(mktemp) workaround for users who want to keep their default config untouched, and pointed at 'talosctl config remove-context aicr-talos' for cleanup after teardown. Resolves: 3203439886
858d3bc to
c73dad1
Compare
Summary
Adds an opt-in, kind-equivalent harness for spinning up a local Talos
Linux cluster on a developer laptop (
tools/talos-test/) and a chainsawtest that exercises the AICR snapshot agent against it
(
tests/chainsaw/snapshot/deploy-agent-talos/). Also adds--requestsand
--limitsflags toaicr snapshotso the agent can be right-sizedon resource-constrained dev clusters.
Motivation / Context
PR #714 added Talos OS support (
pkg/collector/talos/+ theOS=talospod-shape branch in
pkg/k8s/agent/job.go), but the only coverage wasunit tests. There was no integration test running the snapshot agent
against an actual Talos node — and no easy way for a developer on macOS
to exercise that path locally without standing up a remote cluster.
The agent container's resource requests/limits were also hardcoded
(
Privileged: 4Gi req / 8Gi lim memory). That sizing fits productionGPU nodes but blocks scheduling on the talosctl Docker provisioner's
default 2Gi worker, with no way to override at the node level.
Fixes: N/A
Related: #714, #565
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/)tests/chainsaw,tools/,MakefileImplementation Notes
tools/talos-test/{up.sh,down.sh,README.md}— cluster lifecycletalosctl cluster create docker(newer talosctl movedprovisioners to subcommands; the old
--provisionerflag is gone).--config-patchsolocalhost:5001/aicr:localresolves transparentlyfrom inside Talos node containers (the host registry started by
make dev-env).br_netfilterpreflight: the macOS host VM (Docker Desktop'sLinuxKit or Podman Machine's Fedora CoreOS) ships the module but
does not auto-load it, so flannel CNI fails until we modprobe it via
a one-shot privileged sidecar with
/lib/modulesbind-mounted fromthe VM. No-op on Linux hosts.
talosctl health --wait-timeout 8mstreams readiness phases(etcd, apid, kubelet, control plane, all k8s nodes ready, kube-proxy,
CoreDNS). On failure the script prints diagnostic commands the
developer can run to drill into stuck services.
10.5.0.0/24) is not host-routable onmacOS, so the script derives the docker-NAT'd apid host port (50000)
and K8s API host port (6443) from
docker port, passes them as--endpoints, and rewrites the kubeconfig server URL tohttps://127.0.0.1:<host-port>withinsecure-skip-tls-verify=true.defaultnamespace withpod-security.kubernetes.io/enforce=privilegedso the snapshotagent's privileged pod can be scheduled (Talos enforces
restrictedcluster-wide by default).
tests/chainsaw/snapshot/deploy-agent-talos/— chainsaw testaicr snapshot --os talos -o cm://default/aicr-e2e-snapshotfrom a
script:step. The CLI exercisespkg/k8s/agent.Deployerend-to-end (RBAC creation, OS=talos pod shape from
pkg/k8s/agent/job.go, log streaming, Job completion, ConfigMapwrite, cleanup) — no hand-applied YAML fixtures to drift against
job.go(the existing kind chainsaw test's drift risk is capturedfor follow-up in
issues/e2e-update.md, kept local).JMESPath filters (
(sort([*].type)),(@[?type == 'X'] | [0].subtypes | sort([*].subtype))) so theassertion is order-independent at both the measurement-type and
per-type subtype levels — collector goroutine completion order
is non-deterministic.
{GPU, K8s, NodeTopology, OS, SystemD}; per-type subtype names.datavalues are intentionally not pinned to avoidTalos / kernel / kubernetes version churn.
aicr snapshot—--requests/--limitsAdds two CLI flags accepting kubectl-style comma-separated
name=quantitylists:Plumbed through
pkg/snapshotter.AgentConfigandpkg/k8s/agent.Configas
corev1.ResourceList.applyPrivilegedSettingsandapplyRestrictedSettingsmerge overrides on top of the existinghardcoded defaults via a new
mergeResourceListhelper, so unspecifiedkeys keep the production defaults. The
nvidia.com/gpulimit addedby
--require-gpuis preserved on top of any override.Make targets
talos-dev-env,talos-dev-env-clean,talos-snapshot-test. Noneare wired into
make qualify; Talos remains opt-in for now.Testing
End-to-end against a live local cluster (macOS Podman Machine, Talos
v1.9.0, talosctl 1.13.0):
make image IMAGE_REGISTRY=localhost:5001/aicr IMAGE_TAG=local make build make talos-dev-env make talos-snapshot-test # PASS make talos-dev-env-cleanCoverage delta on touched packages:
pkg/k8s/agent: 77.8% (baseline) → ~78% (added tests for resource-override paths)
pkg/cli: ~52.9% (baseline) → comparable (added flag plumbingparseResourceList; covered indirectly)pkg/snapshotter: ~52.2% (baseline) → comparable (1-line structfield + plumbing only)
Risk Assessment
Rollout notes: Production callers of
aicr snapshotare unaffected;both new flags default to empty, in which case the hardcoded
privileged/restricted defaults remain in force exactly as before. The
Talos test harness is a new opt-in surface and is not wired into
make qualifyor any CI workflow.Checklist
make testwith-race)make lint) — onlygo: no such tool covdataoncmd/aicr/cmd/aicrd(a pre-existing local toolchain issue,unrelated to this branch — does not affect CI)
git commit -S)