feat(commands): preserve META by default on talm reset#199
Conversation
Two independently-developed branches both introduced an injection-seam package var named stdinIsTTY — one for the init --update non-tty UX gate, one for the dashboard/edit TUI refusal. The branches landed in separate merges; git's 3-way merge accepted both because the declarations live in different files, but the resulting package fails to build with 'stdinIsTTY redeclared in this block'. Keep the tui_handler.go declaration as canonical (the godoc there spells out the function-type-injection rationale that applies equally to both callers) and drop the init.go copy. The remaining caller in resolveOverwritePolicy resolves against the package-level var unchanged. Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a safer default behavior for the talm reset command, diverging from the upstream talosctl reset by preserving the Talos META partition by default. This change ensures that nodes can self-recover and rejoin the cluster after a reset without manual intervention. The implementation includes a new reset_handler.go to manage flag defaults and usage strings, corresponding unit tests in reset_handler_test.go, and updated documentation in the README and manual test plan. Additionally, unused terminal-related code was removed from init.go. I have no feedback to provide.
Flip the talm reset default away from upstream's destructive
--wipe-mode=all toward the META-preserving recipe
(--system-labels-to-wipe=STATE,EPHEMERAL). The flip only fires when
the operator passed neither --wipe-mode nor --system-labels-to-wipe
on the CLI; explicit operator intent on either axis is honored
unchanged.
- 'talm reset' (no wipe flags): wrapper populates
--system-labels-to-wipe=STATE,EPHEMERAL. The server-side reset
codepath, when SystemPartitionsToWipe is non-empty, takes the
label-driven path and keeps other partitions intact per the
upstream --system-labels-to-wipe flag doc in
cmd/talosctl/cmd/talos/reset.go. META survives; on the next
boot Talos rejoins the cluster from META without operator
intervention (verified on a 3-node v1.12.6 stand: node returns
to etcd with placeholder hostname and a new member id from
META within ~90s of the post-reboot apply).
- 'talm reset --wipe-mode=all' or '--wipe-mode=system-disk':
explicit destructive opt-in. Both values land in the
Mode-driven server-side branch when SystemPartitionsToWipe is
empty and wipe the full system disk including META. The
wrapper does NOT add the safety override; operator intent
wins. --wipe-mode=user-disks is safe — it doesn't touch system
partitions either way.
- 'talm reset --system-labels-to-wipe=STATE': operator's narrower
list is honored byte-for-byte. The wrapper does NOT silently
expand to STATE,EPHEMERAL — operators choosing a narrower
scope are doing so deliberately.
Help-text overrides on both flags spell out the divergence so
'talm reset --help' carries the operator-facing story. The
--wipe-mode override names both destructive opt-out values so
operators picking system-disk to 'be less destructive than all'
don't walk into the same trap.
Talm already diverges from upstream defaults where the friendly
path should be the default — kubeconfig --force=true is the
established precedent. This commit extends that pattern to reset.
Test coverage (pkg/commands/reset_handler_test.go):
- TestWrapResetCommand_NoFlags_AppliesSafeDefault — headline
behaviour change pin.
- TestWrapResetCommand_ExplicitWipeModeAll_SkipsDefault — opt-out
via --wipe-mode=all.
- TestWrapResetCommand_ExplicitWipeModeSystemDisk_SkipsDefault —
opt-out via --wipe-mode=system-disk (value-agnostic Changed()
gate; explicit test guards against a future refactor adding
value-conditional logic).
- TestWrapResetCommand_ExplicitLabels_PreservesOperatorChoice —
operator's narrower list honored byte-for-byte.
- TestWrapResetCommand_BothFlagsChanged_SkipsDefault — mixed
intent no-interference.
- TestWrapResetCommand_HelpTextMentionsMetaPreservation — help
text contract pin (substrings 'preserves META' +
resetSafeDefaultLabels literal + 'system-disk').
- TestWrapResetCommand_RealUpstreamResetDispatched — end-to-end
via taloscommands.Commands -> wrapTalosCommand so a dispatch
refactor that silently drops the wrapper fails this test
while the synthetic-cobra tests above stay green.
README gains a 'talm reset — META-preserving default' subsection
under 'Using talosctl commands' with all three call shapes (safe
default, explicit destructive opt-in, operator-specified narrower
scope) so the divergence is discoverable from operator docs.
docs/manual-test-plan.md section H is rewritten around the new
default. H2 covers the safe-default headline path; H2a-H2d add
boundary cases (explicit destructive --wipe-mode=all or
--wipe-mode=system-disk, narrower operator list,
--graceful=false, modeline-bearing project root). Each carries an
explicit regression-anchor line so future operators running the
plan know what a wrapper-side regression would look like.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
99393d9 to
a5d61ea
Compare
What
Flip the
talm resetdefault away from upstream's destructive--wipe-mode=alltoward the META-preserving recipe (--system-labels-to-wipe=STATE,EPHEMERAL). The flip only fires when the operator passed neither--wipe-modenor--system-labels-to-wipeon the CLI; explicit operator intent on either axis is honored unchanged.Why
Upstream
talosctl resetdefaults to--wipe-mode=all, which wipes the Talos META partition along with STATE and EPHEMERAL. With META gone the node cannot self-recover and comes up in maintenance mode requiring a full re-apply — one of the most common operator-confusion classes ("I reset the node, why doesn't it come back?"). The friendly recipe operators reach for anyway,--system-labels-to-wipe=STATE,EPHEMERAL, leaves META intact and the node rejoins from its META-stored bootstrap config within ~90s on the next boot.Talm already diverges from upstream defaults where the friendly path should be the default —
kubeconfig --force=trueis the established precedent. This PR extends that pattern toresetso the operator-friendly path is the default.Behaviour matrix
talm reset(no wipe flags)--system-labels-to-wipe=STATE,EPHEMERAL; META preserved; node self-recovers.talm reset --wipe-mode=alltalm reset --system-labels-to-wipe=STATEtalm reset --wipe-mode=X --system-labels-to-wipe=YServer-side, when
SystemPartitionsToWipeis non-empty the reset takes the label-driven path and "keep[s] other partitions intact" per upstream's flag doc atcmd/talosctl/cmd/talos/reset.go:193. The issue's manual reproduction confirms META survives empirically.Surface
pkg/commands/reset_handler.go(new) —wrapResetCommand, theresetSafeDefaultLabels="STATE,EPHEMERAL"const, and theresetCmdNameconst. PreRunE chain mirrors the crashdump / rotate-ca wrappers.pkg/commands/talosctl_wrapper.go— one dispatch entry after the TUI block.pkg/commands/reset_handler_test.go(new) — six tests: positive default-applies, negative skip-on-explicit-opt-out (three variations), help-text override pin, end-to-end real-upstream dispatch pin.README.md— operator-facing section under "Using talosctl commands" describing the divergence + three call shapes.docs/manual-test-plan.md— section H rewritten around the new default; H2a/H2b/H2c/H2d added for the boundary cases (explicit destructive, narrower operator list,--graceful=false, modeline-bearing project root).Drive-by
The first commit (
fix(commands): dedupe stdinIsTTY) clears a build break onmainintroduced by the interaction of two independently-developed PRs that both added a package-levelstdinIsTTYinjection-seam (init.go:968from #173 +tui_handler.go:39from #197). Git's 3-way merge accepted both because the declarations live in different files; the resulting package fails to build withstdinIsTTY redeclared in this block. The hot-fix removes theinit.gocopy (canonical lives intui_handler.go); callers inresolveOverwritePolicyresolve against the package-level var unchanged. Filed inline here rather than as a separate PR to unblock this branch's test run.Verification
Local CI (host):
go build ./...cleango test ./... -raceall packages greengolangci-lint run ./...0 issuesGOOS=windows go vet ./...cleanEmpirical smokes against a 3-node OCI Talos v1.12.6 stand follow the regression-anchor matrix in
docs/manual-test-plan.mdsection H2-H2d.Closes #185