Skip to content

feat(commands): preserve META by default on talm reset#199

Merged
lexfrei merged 2 commits into
mainfrom
feat/reset-meta-safe-default
May 12, 2026
Merged

feat(commands): preserve META by default on talm reset#199
lexfrei merged 2 commits into
mainfrom
feat/reset-meta-safe-default

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented May 12, 2026

What

Flip the talm reset default away from upstream's destructive --wipe-mode=all toward the META-preserving recipe (--system-labels-to-wipe=STATE,EPHEMERAL). The flip only fires when the operator passed neither --wipe-mode nor --system-labels-to-wipe on the CLI; explicit operator intent on either axis is honored unchanged.

Why

Upstream talosctl reset defaults to --wipe-mode=all, which wipes the Talos META partition along with STATE and EPHEMERAL. With META gone the node cannot self-recover and comes up in maintenance mode requiring a full re-apply — one of the most common operator-confusion classes ("I reset the node, why doesn't it come back?"). The friendly recipe operators reach for anyway, --system-labels-to-wipe=STATE,EPHEMERAL, leaves META intact and the node rejoins from its META-stored bootstrap config within ~90s on the next boot.

Talm already diverges from upstream defaults where the friendly path should be the default — kubeconfig --force=true is the established precedent. This PR extends that pattern to reset so the operator-friendly path is the default.

Behaviour matrix

Invocation Outcome
talm reset (no wipe flags) Wrapper populates --system-labels-to-wipe=STATE,EPHEMERAL; META preserved; node self-recovers.
talm reset --wipe-mode=all Explicit destructive opt-in — wrapper does NOT add the safety override. Upstream's previous default.
talm reset --system-labels-to-wipe=STATE Operator's narrower list honored byte-for-byte; wrapper does NOT silently expand to STATE,EPHEMERAL.
talm reset --wipe-mode=X --system-labels-to-wipe=Y Both flags stated; wrapper does not interfere with either.

Server-side, when SystemPartitionsToWipe is non-empty the reset takes the label-driven path and "keep[s] other partitions intact" per upstream's flag doc at cmd/talosctl/cmd/talos/reset.go:193. The issue's manual reproduction confirms META survives empirically.

Surface

  • pkg/commands/reset_handler.go (new) — wrapResetCommand, the resetSafeDefaultLabels="STATE,EPHEMERAL" const, and the resetCmdName const. PreRunE chain mirrors the crashdump / rotate-ca wrappers.
  • pkg/commands/talosctl_wrapper.go — one dispatch entry after the TUI block.
  • pkg/commands/reset_handler_test.go (new) — six tests: positive default-applies, negative skip-on-explicit-opt-out (three variations), help-text override pin, end-to-end real-upstream dispatch pin.
  • README.md — operator-facing section under "Using talosctl commands" describing the divergence + three call shapes.
  • docs/manual-test-plan.md — section H rewritten around the new default; H2a/H2b/H2c/H2d added for the boundary cases (explicit destructive, narrower operator list, --graceful=false, modeline-bearing project root).

Drive-by

The first commit (fix(commands): dedupe stdinIsTTY) clears a build break on main introduced by the interaction of two independently-developed PRs that both added a package-level stdinIsTTY injection-seam (init.go:968 from #173 + tui_handler.go:39 from #197). Git's 3-way merge accepted both because the declarations live in different files; the resulting package fails to build with stdinIsTTY redeclared in this block. The hot-fix removes the init.go copy (canonical lives in tui_handler.go); callers in resolveOverwritePolicy resolve against the package-level var unchanged. Filed inline here rather than as a separate PR to unblock this branch's test run.

Verification

Local CI (host):

  • go build ./... clean
  • go test ./... -race all packages green
  • golangci-lint run ./... 0 issues
  • GOOS=windows go vet ./... clean

Empirical smokes against a 3-node OCI Talos v1.12.6 stand follow the regression-anchor matrix in docs/manual-test-plan.md section H2-H2d.

Closes #185

Two independently-developed branches both introduced an injection-seam
package var named stdinIsTTY — one for the init --update non-tty UX
gate, one for the dashboard/edit TUI refusal. The branches landed in
separate merges; git's 3-way merge accepted both because the
declarations live in different files, but the resulting package fails
to build with 'stdinIsTTY redeclared in this block'.

Keep the tui_handler.go declaration as canonical (the godoc there
spells out the function-type-injection rationale that applies
equally to both callers) and drop the init.go copy. The remaining
caller in resolveOverwritePolicy resolves against the package-level
var unchanged.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@lexfrei has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 52 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 625baacf-3aaf-448d-a09c-9876a2d8605e

📥 Commits

Reviewing files that changed from the base of the PR and between a00581e and a5d61ea.

📒 Files selected for processing (6)
  • README.md
  • docs/manual-test-plan.md
  • pkg/commands/init.go
  • pkg/commands/reset_handler.go
  • pkg/commands/reset_handler_test.go
  • pkg/commands/talosctl_wrapper.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/reset-meta-safe-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a safer default behavior for the talm reset command, diverging from the upstream talosctl reset by preserving the Talos META partition by default. This change ensures that nodes can self-recover and rejoin the cluster after a reset without manual intervention. The implementation includes a new reset_handler.go to manage flag defaults and usage strings, corresponding unit tests in reset_handler_test.go, and updated documentation in the README and manual test plan. Additionally, unused terminal-related code was removed from init.go. I have no feedback to provide.

Flip the talm reset default away from upstream's destructive
--wipe-mode=all toward the META-preserving recipe
(--system-labels-to-wipe=STATE,EPHEMERAL). The flip only fires when
the operator passed neither --wipe-mode nor --system-labels-to-wipe
on the CLI; explicit operator intent on either axis is honored
unchanged.

  - 'talm reset' (no wipe flags): wrapper populates
    --system-labels-to-wipe=STATE,EPHEMERAL. The server-side reset
    codepath, when SystemPartitionsToWipe is non-empty, takes the
    label-driven path and keeps other partitions intact per the
    upstream --system-labels-to-wipe flag doc in
    cmd/talosctl/cmd/talos/reset.go. META survives; on the next
    boot Talos rejoins the cluster from META without operator
    intervention (verified on a 3-node v1.12.6 stand: node returns
    to etcd with placeholder hostname and a new member id from
    META within ~90s of the post-reboot apply).
  - 'talm reset --wipe-mode=all' or '--wipe-mode=system-disk':
    explicit destructive opt-in. Both values land in the
    Mode-driven server-side branch when SystemPartitionsToWipe is
    empty and wipe the full system disk including META. The
    wrapper does NOT add the safety override; operator intent
    wins. --wipe-mode=user-disks is safe — it doesn't touch system
    partitions either way.
  - 'talm reset --system-labels-to-wipe=STATE': operator's narrower
    list is honored byte-for-byte. The wrapper does NOT silently
    expand to STATE,EPHEMERAL — operators choosing a narrower
    scope are doing so deliberately.

Help-text overrides on both flags spell out the divergence so
'talm reset --help' carries the operator-facing story. The
--wipe-mode override names both destructive opt-out values so
operators picking system-disk to 'be less destructive than all'
don't walk into the same trap.

Talm already diverges from upstream defaults where the friendly
path should be the default — kubeconfig --force=true is the
established precedent. This commit extends that pattern to reset.

Test coverage (pkg/commands/reset_handler_test.go):

  - TestWrapResetCommand_NoFlags_AppliesSafeDefault — headline
    behaviour change pin.
  - TestWrapResetCommand_ExplicitWipeModeAll_SkipsDefault — opt-out
    via --wipe-mode=all.
  - TestWrapResetCommand_ExplicitWipeModeSystemDisk_SkipsDefault —
    opt-out via --wipe-mode=system-disk (value-agnostic Changed()
    gate; explicit test guards against a future refactor adding
    value-conditional logic).
  - TestWrapResetCommand_ExplicitLabels_PreservesOperatorChoice —
    operator's narrower list honored byte-for-byte.
  - TestWrapResetCommand_BothFlagsChanged_SkipsDefault — mixed
    intent no-interference.
  - TestWrapResetCommand_HelpTextMentionsMetaPreservation — help
    text contract pin (substrings 'preserves META' +
    resetSafeDefaultLabels literal + 'system-disk').
  - TestWrapResetCommand_RealUpstreamResetDispatched — end-to-end
    via taloscommands.Commands -> wrapTalosCommand so a dispatch
    refactor that silently drops the wrapper fails this test
    while the synthetic-cobra tests above stay green.

README gains a 'talm reset — META-preserving default' subsection
under 'Using talosctl commands' with all three call shapes (safe
default, explicit destructive opt-in, operator-specified narrower
scope) so the divergence is discoverable from operator docs.

docs/manual-test-plan.md section H is rewritten around the new
default. H2 covers the safe-default headline path; H2a-H2d add
boundary cases (explicit destructive --wipe-mode=all or
--wipe-mode=system-disk, narrower operator list,
--graceful=false, modeline-bearing project root). Each carries an
explicit regression-anchor line so future operators running the
plan know what a wrapper-side regression would look like.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the feat/reset-meta-safe-default branch from 99393d9 to a5d61ea Compare May 12, 2026 18:14
@lexfrei lexfrei self-assigned this May 12, 2026
@lexfrei lexfrei merged commit 020d209 into main May 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

talm reset default --wipe-mode=all destroys META; selective wipe is the friendlier default

1 participant