Skip to content

feat: generic wizard + /config hub, TUI overhaul, structural refactor, e2e uplift + docs sync#66

Merged
agjs merged 58 commits into
mainfrom
feat/generic-wizard
Jul 5, 2026
Merged

feat: generic wizard + /config hub, TUI overhaul, structural refactor, e2e uplift + docs sync#66
agjs merged 58 commits into
mainfrom
feat/generic-wizard

Conversation

@agjs

@agjs agjs commented Jul 3, 2026

Copy link
Copy Markdown
Owner

This PR grew well past its original scope (generic wizard + /config) — it now carries four related workstreams, each validated green (typecheck + lint + ~1,930 unit tests + 5 real-PTY e2e suites) before landing:

1. Generic wizard + /config settings hub (original scope)

  • Reusable wizard primitive (single/multi-select, text fields, review step) driving /setup and the new /config settings hub: every setting shows a one-line description + live value — model (switch/add), mode, gate, editable scope, web tools, TDD.
  • /help capability browser; palette-run commands no longer ghost their names into the input.

2. TUI overhaul

  • Gradient TSFORGE banner + clean-slate boot; compact startup hint.
  • Hybrid conversation UI: rounded user bubbles + left-rail agent cards; display-width-accurate rail wrapping (emoji/CJK), streaming-safe (makeAgentRail); persistent input gutter; styled plan-mode chips.
  • Resize debounce (no stranded status bars), plan-first mode by default with approval gating.

3. Structural refactor (external review, all 6 items)

  • detect-gate.ts (1,049 lines) → src/gate/* modules; staged web gate with per-stage ━━ label ━━ banners and stop-on-first-failure.
  • Web interfaces are bare PascalCase (core keeps I-prefix); settleGate decomposed into tested steps; ILoopCtx grouped into ctx.tool/ctx.gate; env-gated trace() on silent degrade paths.
  • Found en route: --version/--help used to boot a session with the flag as the task — now print-and-exit.

4. Cleanup wave: e2e uplift + god-file splits + docs sync

  • Shared Python e2e harnesses (scripts/lib/ptyharness.py, itermharness.py): dedupes ~180 lines across 7 scripts, buffer-aware polling instead of blind sleeps, guaranteed iTerm2 window teardown, TSFORGE_E2E_MODEL override.
  • New real-terminal coverage: e2e-editor-pty.py (typing/backspace/Alt-Enter, bracketed paste must-not-submit, @ picker filter→select→submit, long-line wrap) + two new live-iTerm2 scenarios (paste, @ picker interaction). iTerm2 suite verified live: 25/25 + 6/6.
  • Splits (behaviour-preserving, all external exports preserved): cli.ts 2,938→562 across 7 cli/* modules; Session sheds loop/staged-build.ts (host-interface seam, unit-tested) + loop/model-call.ts (plan mode's read-only tool filter now has direct tests); editor @-completion state machine extracted + tested without stdin.
  • Docs: flags.mdx now documents every env var (verified against read sites), /trace + /setup in the command table, staged-gate section, orphaned page re-linked; Astro build green (46 pages).

Testing: every task committed only after full bun run validate (includes the 5 PTY suites); the editor/REPL refactors additionally verified in real iTerm2 against a live model. Test count 1,890 → 1,931.

agjs added 5 commits July 3, 2026 15:17
…nal review, title)

render/wizard.ts was coupled to setup. Generalize it (its pure model + hardened
alt-screen/raw-mode driver stay):
- new `text` step kind: default/placeholder, secret masking, validate() that
  blocks confirm; char/erase in the pure reducer; caret + inline error render
- parameterized `title` (was hardcoded "tsforge setup")
- optional review screen (`review:false` applies on the last step's confirm)
- results now include `text` (+ `textValue` helper); overview shows text answers
- driver takes an options object {title, review, extra, out}; `b`/`q` are
  back/cancel except on a text field (where they're literal input)
…ons object

Both callers pass {title, extra} instead of positional args — setup keeps its
'tsforge setup' header; scaffold now gets a correct 'tsforge scaffold' header
(it previously inherited the hardcoded setup title). Behavior-preserving for setup.
b/q and printable chars now decode as text input ({char}); the driver maps b/q to
back/cancel only on non-text steps. Backspace decodes as erase.
… gate

Spawns the wizard in a real pty, picks a single-select, erases the default and
types into a text field, confirms; asserts frames + final {single, text}. Wired
into e2e:pty so it runs on every validate/CI.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request generalizes the wizard primitive to support a new text step kind, parameterized titles, and an optional review screen, refactoring /setup and /scaffold to use it. It also adds a real-PTY end-to-end test harness. The review feedback highlights two key issues in the interactive driver: first, pressing the Spacebar on a text step is incorrectly handled as a toggle action instead of typing a space; second, the text step hints misleadingly suggest using b and q for navigation and cancellation, which are instead captured as literal text inputs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/core/src/render/wizard.ts Outdated
Comment thread packages/core/src/render/wizard.ts Outdated
agjs added 2 commits July 3, 2026 15:35
… review)

Gemini review:
- HIGH: space on a text step decoded to "toggle" (a no-op) — you couldn't type a
  space. On a text step, EVERY printable ASCII char (0x20–0x7e) is now literal
  input, incl. space/b/q. Bounded at 0x7e so DEL (backspace) still decodes as erase.
- MEDIUM: text-step hints falsely showed "b back / q cancel" (those are typed).
  Hints now read "type to edit · ← back · enter continue · esc cancel"; ← is a real
  back key for text steps (left-arrow is otherwise unused while editing).

Guards: reducer test (space types into a field), actionFor DEL boundary, and the
real-pty e2e now types a value WITH a space ("x y") to lock the regression.
…c wizard

First consumer of the generic wizard: an in-harness settings menu so users never
hand-edit ~/.tsforge/models.json.

- cli/config-menu.ts: settings surface built from wizard steps — buildConfigMenu
  (switch/add), buildModelPickStep, buildAddModelSteps (name/baseUrl/model/apiKey;
  apiKey masked + optional; required-field validation), draftToEntry + addModel
  (pure). Persists via saveModelsConfig/setActiveModel; hot-swaps the provider via
  an injected reconfigure. Mode / feature-toggle groups slot in later.
- cli.ts: /config → runConfigCommand, extracted to handleConfig for the complexity
  cap; suspends the REPL editor's stdin around the wizard via a repl-scoped
  editorControl (mirrors resizeEditor); applies the result live.
- commands.ts: /config in the registry.

Tests: config-menu pure builders/validators/addModel; a real-pty e2e
(scripts/e2e-config-pty.py, in the gate) drives the add-model flow end to end and
asserts models.json persisted + active + provider hot-swapped.
@agjs

agjs commented Jul 3, 2026

Copy link
Copy Markdown
Owner Author

Added /config — the first consumer of this wizard (0248358).

  • In-harness settings menu so users never hand-edit ~/.tsforge/models.json: switch the active model, or add a model via the new text-step wizard (name / baseUrl / model / masked+optional apiKey, required-field validation). Persists via saveModelsConfig/setActiveModel and hot-swaps the running provider.
  • runConfigCommand suspends the REPL editor's stdin around the wizard (new repl-scoped editorControl, mirroring resizeEditor); extracted to handleConfig to stay under the complexity cap.
  • v1 scope = the Model group; mode + feature-toggles + docs-from-registry are follow-ups (registry pattern makes them one entry each).

Testing: pure builders/validators/addModel units + a real-pty e2e (scripts/e2e-config-pty.py, in the gate) that drives the actual add-model flow (menu → 4 fields → review → apply) and asserts models.json persisted + active + provider hot-swapped.

bun run validate1854 pass / 0 fail, all three pty e2e suites ALL PASS.

@agjs agjs changed the title feat(wizard): generic reusable wizard primitive (text steps, optional review, title) feat(wizard+cli): generic wizard primitive + /config settings menu Jul 3, 2026
agjs added 20 commits July 3, 2026 16:47
…ncel

Rebuild /config as a single owned-stdin menu loop (no nested overlays), which
fixes the reported bugs and makes it a real settings hub.

Bug fixed (reported): a REPL-launched wizard called stdin.pause() on exit —
because the editor owns stdin via a `data` listener (no keypress listeners), the
wizard wrongly thought it owned raw mode. pause() emptied the event loop and
QUIT tsforge on cancel/back/apply. Fix: runWizard gains `manageInput` (default
true); REPL callers (/config, and /setup from the editor) pass false so they
never seize/pause stdin. Also removes the 'b'-leaks-into-input class (no nesting
+ clear the editor buffer on resume).

/config is now a hub (cli/config-menu.ts) — one keypress session, grouped
settings, each with a one-line description + live value, applied immediately:
- Model: switch active (cycles models.json) + add a model (inline text fields,
  masked optional apiKey, validation) -> saveModelsConfig + live reconfigure
- Behavior: mode (plan/normal), gate command, editable scope (session)
- Tools: web / TDD / script toggles (env, live for subsequent turns)
Pure helpers unit-tested; the interactive loop covered by a REAL-REPL pty e2e
(open /config via the palette; cancel-doesn't-quit, live mode toggle, add-model
persist). Removed the obsolete standalone harness. Docs updated. validate green
(1858 pass; all three pty suites pass).
…ive)

- The auto-detected gate command is a huge multi-line tsc+eslint+test string; it
  rendered verbatim and blew the whole /config menu layout out. Menu rows now clamp
  each value to one line (whitespace-collapsed, ellipsis) via oneLine().
- Web tools now default ON in the interactive REPL (an assistant that can't look
  things up is silly). Only a default — an explicit TSFORGE_WEB (incl. "0") wins,
  and one-shot/headless/eval never run repl() so they stay offline+deterministic.

Test: oneLine unit test (truncate + collapse newlines); validate green (1859 pass,
all pty suites).
/config is now the single home for what a human actually configures, each
setting with its own visible one-line description (the config screen IS the
docs). Removed nonsensical toggles — nobody disables code navigation or git
context — those stay env-only for eval/CI.

- config-menu: per-setting descriptions rendered under every row; add Update
  check toggle; Web tools default-on (interactive) surfaced live.
- Delete experimental ENV flags + all consumers/tests: TSFORGE_LEGACY_FEEDBACK,
  TSFORGE_NO_ASTGREP, TSFORGE_FORCE_TOOLS, TSFORGE_SIMPLICITY, TSFORGE_CONTRACT.
- Graduate to always-on (flag deleted): hashline, TTSR, LSP write feedback.
- Remove the now-dead yield_status machinery (only the deleted forced-tools
  experiment advertised it): tool spec, dispatch stub, policy class, session
  resolveYieldCalls.
- Eval sweep dims trimmed to live flags (git/script/web).
- Docs: flags.mdx points feature toggles at /config; purge deleted flags across
  uplift/eval/greenfield/lsp/quality/web docs.

Verified: bun run validate green (typecheck+lint+format, 1842 tests, 3 pty
suites) + real iTerm2 GUI drive of /config.
Fixes the double-typed text when entering values in /config (e.g. Add a model).
The palette launches /config via a fire-and-forget runLine then resume()s the
editor in its `finally`, re-activating it UNDERNEATH the overlay so every key was
echoed into the editor's pinned input row on top of the overlay's own render.
Add an `inert` input-gate to the editor that resume() does NOT clear; the /config
overlay sets it, so the stray resume can't re-activate the editor. Regression
tests in the real-PTY e2e: typed text renders once, and the editor works again
after /config closes.

Also trims /config to only genuine human choices:
- Remove Script tool + Update check toggles (eval/kill-switch knobs, not settings).
- Update check now ALWAYS runs (interactive, non-CI; respects NO_UPDATE_NOTIFIER);
  TSFORGE_NO_UPDATE_CHECK deleted. TSFORGE_NO_SCRIPT kept as an env kill-switch.
- e2e scripts switched from TSFORGE_NO_UPDATE_CHECK to NO_UPDATE_NOTIFIER offline.
- Docs updated: /config = model, mode, gate, scope, web, TDD; eval/CI-only knobs
  (NO_LSP_TOOLS/NO_GIT_TOOL/NO_SCRIPT) documented separately.

Verified: bun run validate green (typecheck+lint+format, tests, 3 pty suites) +
isolated pty repro (marker renders 1x, was 2x).
Audited all 43 doc pages against the current source. Fixes:
- plan-mode / interactive: `--plan` accurately described (forces plan for an
  interactive session, overriding a repo's autonomous policy.mode; ignored by
  one-shot/headless) — was overstated/ambiguous.
- model-agent: add the `script` tool (programmatic tool calling) to the tool table.
- spec-runner / commands: eval sweep examples used the removed `ttsr,hashline`
  dimensions → live dims (`git`/`script`).
- validation: web-build turn cap 180 → 400 (loop.constants.ts webMaxTurns).
- rule-packs: `generic-ts` is an always-on pack (core TS safety), not a
  "detection label only" — moved into the always-on table.
- flags: document TSFORGE_BOOT_URL/TIMEOUT defaults (http://localhost:3000/, 15000ms).
- roadmap: "shipped through 0.18" → 0.27; Road-to-1.0 sweep example uses live dims.

30+ pages verified clean. Docs build green (46 pages).
Design for making tsforge's capabilities discoverable in-session: /help becomes
an actionable capability browser over a self-describing registry; scaffold
(boringstack/astro/vite) + recipes brought into the REPL; an anti-drift test that
fails if a command or model tool ships without a discovery home.
…menus

Replace /config's alt-screen menu with a compact inline dropdown above the
input row (matching the @file picker pattern). The new `inline-menu.ts` module
provides a reusable FLAT menu driver + formatter:

- formatMenuRows(rows, cursor, columns, color) returns a complete overlay
  block: windowed ≤8 rows with scroll indicators, divider, selected row's
  description, and footer hint. No alt-screen, no raw-mode toggle.

- runInlineMenu(rows, deps) owns keypress and navigates ↑/↓, Enter to
  select, Esc to close. Resolves to row index or null.

- Config-menu migrated to use inline-menu via IConfigMenuView callbacks
  (render/close), injected by cli.ts handleConfig. Edit sub-view uses the
  same overlay pattern inline.

- All behavior preserved: toggles stay open + re-render (cursor keeps row),
  text fields inline with validation, editor suspend/resume + inert gate
  (no double-typed text), model persistence to models.json.

Tests: formatMenuRows windowing test added, config-menu 13 pass, e2e 15/15.
Two coordinated changes to the capability browser:

1. Remove model-tools + passive machinery:
   - Delete toolCapabilities() and TOOL_METADATA entirely
   - Remove "passive" from CapabilityKind (now "command" | "wizard")
   - Remove detail field from ICapability
   - buildCapabilities returns only command + wizard rows (no tools)

2. Migrate /help to inline-menu (same as /config):
   - Replace owned-menu driver with inline-menu + formatMenuRows
   - capabilityRows now returns IMenuRowData (label, hint, describe)
   - Remove showDetail from ICapabilityMenuDeps
   - handleHelp follows handleConfig pattern: suspend→runCapabilityMenu→resume
   - Uses statusBar.setOverlay/clearOverlay for rendering

3. Tests updated:
   - Delete "every model tool has a discovery home" anti-drift test
   - Keep "every slash command has a discovery home"
   - capability-menu tests use formatMenuRows instead of owned-menu

Note: owned-menu.ts remains (still used by repl-recipe.ts).

All tests pass; e2e config script: 15/15 PASS.
…, add title

The inline menus (/help, /config, recipes) had three rendering bugs:
- STACKING: the overlay could exceed the terminal height, so the status bar's
  relative-redraw couldn't climb to the (scrolled-off) region top to clear it and
  each redraw left a copy. Now the visible-row count is bounded to the terminal
  height, and EVERY overlay line is clipped to the column width (an unclipped
  describe line wrapped, desyncing the row bookkeeping and compounding it).
- STYLING: every row was painted bold (then, worse, all-blue). Now only the
  SELECTED row is brand+bold; all other rows are plain default text (legible).
- LAYOUT: added a bold title at the top; the selected row's description stays at
  the bottom.

Verified in a REAL 14-row terminal (scripts/e2e-help-menu-pty.py, wired into
e2e:pty): no stacking, exactly one styled row, title on top. /config e2e 15/15.
…h on cancel

The / command palette was the last menu on the alternate screen. It now renders as
the same compact inline overlay as /help and the @ picker, reusing formatMenuRows:
command names as rows, the selected command's summary at the bottom, and the live
query as the overlay title (/co). No alt-screen.

Also fixes the reported bug where cancelling the palette (Esc or backspace-past-
empty) left the trigger '/' stuck in the editor — the cancel path now clears it.

clampIndex moved to inline-menu (the menu core) with a re-export from command-menu
so existing importers are untouched (avoids an import cycle).

Verified in a real terminal: inline (no ESC[?1049h), filters, Esc closes cleanly,
no stray '/', editor live after (7/7). config e2e 15/15, help e2e 6/6, unit green.
e2e palette-open markers updated for the inline title.
agjs added 25 commits July 5, 2026 09:22
The manageInput:false path (REPL-launched wizard) was untested — the exact
invariant that keeps a wizard from pausing stdin and quitting the process.
Extract the inline decision into a pure exported wizardOwnsRawMode() and unit
test it: manageInput:false / non-TTY / pre-existing keypress listeners / no
setRawMode all yield false; only a standalone TTY owns raw mode.

Regression: wizard.test.ts (wizardOwnsRawMode ownership rule).
The greenfield section still described the removed contract feature
(contracts/<id>.md basename guard, TSFORGE_CONTRACT, greenfield-contract.test),
and the tools section listed the renamed yield_status tool. Bring the manifest
back in line with the code it contracts.
Startup redesign:
- Replace the anvil emblem with a large ANSI-Shadow TSFORGE wordmark painted
  with a per-column cyan→indigo→violet gradient (new truecolor() helper).
- Clear the screen + scrollback before the banner so it never renders on top of
  leftover shell output (env dumps, prior command noise).
- Drop the cryptic cwd/scope/gate/session block (those live in /config); show a
  single compact hint bar + styled no-config / plan-mode nudges instead.

Input prompt:
- The › prompt now persists while typing: it's painted as a hanging gutter in
  front of the editor block (was only on the pre-typing placeholder row). The
  editor reserves PROMPT_COLS so wrapping matches the visible width and no row
  exceeds the terminal.

Tests: banner gradient + wordmark; status-bar prompt-in-editor-mode; editor-e2e
and render-e2e cursor/wrap math updated for the 2-col gutter.
Redesign the conversation transcript (the user asked for bubbles, not blue text
+ a left bar):
- USER messages render as a full rounded bubble (╭─ you ─╮ / │ … │ / ╰──╯),
  sized to content and capped at the terminal width (word-wrapped).
- AGENT messages render as a left-accent card: a rounded ╭ <model> cap, every
  body line prefixed with a │ rail, a ╰ cap when the turn ends. Streaming-
  friendly (any width; code blocks/tables render cleanly inside the rail).

Gap fix: the live stream previously stacked a label newline + the stream
separator + the model's own leading blanks, leaving a big empty block before
each answer. railAgentChunk now swallows leading blank lines until real content,
so the answer starts right under the cap.

Shared helpers (userBubble, agentCardTop/Bottom, agentBar, agentCardBody,
wrapToWidth) power both the settled/replay path (renderMessage) and the live
streaming path (cli.ts). Regression: tests/message-render.test.ts.
… rail

Two bugs in the bubble/prompt rendering:
- Plan-mode flow wrote its hints via process.stdout.write, bypassing the pinned
  StatusBar region — corrupting the input row and stranding a › in scrollback.
  Route all four plan-flow writes through echo() (→ statusBar.writeStream).
- Streamed agent text wrapped at the terminal edge, so continuation rows escaped
  the card's │ rail. railAgentChunk now soft-wraps at the card's inner width
  (columns − rail), ANSI-escape-aware, so text can never spill past the rail.
The soft-wrap used a naive 1-col-per-char count and filled the last terminal
column, so wide chars (emoji/CJK) and auto-margin terminals still wrapped the row
themselves — dropping the │ rail on the continuation. Wrap now:
- counts each char by displayWidth (emoji/CJK = 2 cols), and
- leaves the last column empty (columns − rail − 1) so the terminal never wraps.
Also guards a missing/zero stdout.columns with an 80-col fallback.
The rail-wrap logic was an untestable inline closure in cli.ts. Extract it to
render/agent-rail.ts as makeAgentRail(rail, innerWidth) — a stateful streaming
wrapper (state persists across token chunks) that prefixes every visual line with
the │ rail, swallows the leading gap, keeps the rail on interior blanks, and
soft-wraps at the card's inner width (display-width-accurate; ANSI escapes pass
through free). Content budget now leaves rail + 2 spare columns so no terminal
wraps a row and drops the rail.

Regression: tests/agent-rail.test.ts — rail on every wrapped row + width bound at
80/92/120 cols incl. emoji/CJK, gap-swallow, interior-blank rail, split-chunk.
…breaks)

drive()'s finally now calls closeAgentTurn(), so the ╰ bottom cap is written the
moment streaming ends. Post-turn hints (plan-mode notice, PLAN review, folding
changes) then land BELOW the sealed card instead of inside it — which had left
the hint un-railed between the last body line and the cap, visually breaking the
│ rail. Idempotent with the existing close in runLine's finally.
The post-turn plan hint was plain full-width text that read like a debug line.
Replace it with a compact styled chip matching the startup plan line: brand ◆ plan
(or ◆ plan ready), dim helper text, green approve. Two variants driven by whether
the agent has proposed a plan yet.
…m 5)

Add src/lib/trace.ts — trace(scope, err) gated by TSFORGE_TRACE/TSFORGE_DEBUG
(file path or stderr; no-op when unset). Wire the 10 silent degrade catches in
turn.ts (6), session.ts, run.ts, detect-gate.ts (2) to trace the swallowed error
instead of vanishing — keeps the degrade behaviour, adds observability. Prod stays
silent; TSFORGE_TRACE=1 surfaces what quietly failed (e.g. buildTsService).

Regression: tests/trace-util.test.ts.
…tem 1)

Pure move, no logic change — behaviour-preserving (all existing gate/loop/session
tests pass unchanged). detect-gate.ts split into focused src/gate/ modules:
  types.ts        IGate, IFileLintProblem, FileLinter
  tool-paths.ts   resolveToolBin + the bundled BIN/CONFIG/CHECK paths
  tsconfig.ts     strict tsconfig overlays + ensureWebGateTsconfig + tscPart
  shell.ts        shSingleQuote + packEnvPrefix
  test-discovery.ts  discoverTestCommand + webTestProbe
  linter.ts       makeFileLinter + formatFile + prettierWriteCommand
  core-gate.ts    buildGate + buildCoreFix
  web-gate.ts     buildWebGate/TypeGate/TscCheck/Fix + WEB_FRAMEWORKS/WEB_PACKS
  index.ts        public barrel
Web scaffolding (scaffoldWeb/installWebDeps/webGuidance/BUILD_PREAMBLE) → src/scaffold/web-scaffold.ts.

Path constants moved a directory deeper, so import.meta.dir joins gained a level
(verified STRICT_CONFIG/STRICT_WEB_CONFIG/BROWSER_CHECK resolve to the package
root). Updated all 18 import sites (4 src, 3 scripts, 11 tests).
The web gate was one opaque `build && tsc && lint && stubs && format && tests &&
render` chain — a failure buried WHICH stage broke in a wall of mixed output.

Add scripts/staged-gate.ts: a bundled runner (mirrors browser-check/stub-check)
that takes a base64 JSON stage list, runs each stage sequentially via the shared
runShellCommand, prints a `━━ <label> ━━` banner + streams output live, and on the
first failure prints `✗ <label> FAILED (exit N)` and stops with that exit code.
buildWebGate now emits `bun staged-gate.ts <payload>` with the SAME commands in the
SAME order (type-aware lint is its own stage) — identical stop-on-first-failure
semantics, legible per-stage feedback. base64 keeps the quoted/&&/env-prefixed
stage commands intact through the shell with zero escaping; onChunk forwards both
stdout and stderr so the gate parser still sees every error.

Regression: tests/staged-gate.test.ts (banners, stop-on-fail, exit-code preserved,
stderr forwarded, malformed payload → exit 2). Web-gate tests decode the payload.

Verified end-to-end: a real web gate on a depless dir prints the vite-build banner
then ✗ vite build FAILED.
The web profile enforced I-prefixed interfaces (IButtonProps) — non-standard vs
React/shadcn/TanStack, so the model fought its training data every scaffold and
burned turns 'correcting' idiomatic names. Web interfaces now need only be
PascalCase: bare 'ButtonProps' AND 'IButtonProps' both pass. Core/library code is
unchanged — it still requires the I-prefix.

- namingRule (eslint-conventions.ts): web surface emits bare PascalCase (no prefix,
  no Register filter needed — bare already permits 'Register'). Covers both the gate
  and the write-time linter.
- strict.web.eslint.config.mjs: static fallback + header comment updated (resolves
  the review's contradiction — the comment claimed 'no I-prefix' while the rule
  enforced it; now both say bare).
- BUILD_PREAMBLE + web-templates guidance: instruct/illustrate bare names.

Regression: eslint-conventions.test.ts (web bare, core still I-prefix) +
gate-conventions.test.ts (real eslint: web accepts bare 'interface User', core
rejects the same file).
…iew item 4)

The ~220-line settleGate mixed auto-fix, gate execution, meta-rules, three
convergence guards, and feedback injection in one body. Extract each seam:
  autoFixStep(ctx) → string[]          (janitor fixers + what they changed)
  runGateStep(ctx, turn)               (validate + live-stream flush)
  runMetaRulesStep(ctx) → violations   (best-effort, change-scoped)
  checkStuck(ctx, state, errs, turn)   (the 3 guards; shared stuckResult shape)
  injectFeedback(...)                  (red-gate feedback + auto-fix notice)
settleGate is now a thin orchestrator; signature + IRunResult|null contract are
unchanged, so both drivers (run.ts / session.ts) are untouched.

Regression: tests/settle-steps.test.ts — checkStuck (converging run never trips,
persistent single error stops, unchanged whole set stops) + autoFixStep (no-op ⇒
[], a real task.fix rewrite ⇒ reported). Guard internals stay covered by
same-persist-guard.test.ts; existing loop/session suites pass unchanged.
ILoopCtx had grown into an 18-field grab-bag, and toolContextFor spread eight
fields one-by-one (...(x === undefined ? {} : {x})). Reshape:
  flat core     task, cwd, tsService, report, messages
  ctx.tool      signal, setupWeb, readOnly, policyMode, policyRules,
                interactive, mcpRegistry, touched   (ILoopCtxTool)
  ctx.gate      parse, lintFile, stackProfile, ruleOverrides, onGateChunk
                (ILoopCtxGate)
ctx.tool groups exactly the optional fields IToolContext threads through, so
toolContextFor is now { …identity, ...ctx.tool } — one spread, touched still
shared by reference. Sub-objects are always-present and mutable (the Session
flips policyMode/readOnly/signal/setupWeb/lintFile mid-run). Both construction
sites (session.ts, run.ts + policyCtxFields) nest the fields; write-guard and
all loop accessors updated.

Existing loop/session/tool-accounting suites pass unchanged — behaviour-preserving.
…tor)

Neither flag was recognized: an unknown --flag fell through as a POSITIONAL, so
`tsforge --version` booted a session whose task was the literal string
"--version" — while install.sh's post-install message advertises `tsforge
--help  show flags`. Add --version/-V (prints `tsforge <version>` via the
existing currentVersion()) and --help/-h (a new pure cliUsage() kept next to the
flag tables in cli/args.ts), dispatched first in main().

Regression: cli.test.ts — both flags parse as flags (never a task); usage text
covers the advertised surface. Verified live: `tsforge --version` → tsforge 0.27.1.
Extract the duplicated read_until/stub-server/spawn/reap/alive blocks from the
four PTY e2e scripts into scripts/lib/ptyharness.py (~180 LOC of divergent
copies -> one module), and replace blind time.sleep() settles with buffer-aware
drain()/wait_for() so alive-checks fail fast and no redraw bytes are lost.

Scripts keep their scenarios verbatim; assertions unchanged. Flake gate:
bun run e2e:pty green 3 consecutive runs; full validate green (1890 pass).
… wrap)

New scripts/e2e-editor-pty.py (on ptyharness): the four editor surfaces that
previously had only in-process tests now run against the REAL binary in a real
pseudo-terminal — typing+backspace+Alt-Enter multiline into a submitted bubble,
bracketed paste with embedded CRs landing as ONE message (no per-line submits),
@ file-picker open→filter→select→submit, and 200-char wrap with exactly one
status bar in the final frame. Wired into e2e:pty (now 5 scripts).

20/20 checks, green 3 consecutive runs; full validate green.
…n scenarios

New scripts/lib/itermharness.py (osascript plumbing, model-aware BAR regex,
window() context manager) shared by the three iTerm2 scripts:
- every window is now closed via try/finally — no stranded GUI windows on a
  failing scenario
- model under test overridable via TSFORGE_E2E_MODEL (BAR regex derives from it)
- two NEW real-terminal scenarios in e2e-iterm-tui.py: bracketed paste with an
  embedded CR (must not submit), and @ picker filter+select landing the path in
  the input row

Verified live: 25/25 TUI checks + 6/6 plan-mode lifecycle against the real
model in real iTerm2; PTY suite unaffected (all green).
- staged-gate.test.ts resolved the runner via process.cwd(), so the suite
  failed when bun test ran from packages/core instead of the repo root —
  resolve relative to the test file instead (latent bug, found by running
  the suite from a different cwd)
- settle-steps' task.fix test sleeps 1s to move mtime; give it an explicit
  30s timeout so a loaded machine can't flake it past bun's 5s default
Pure move + re-import — same symbols, no logic changes:
- cli/repl.ts        the interactive REPL (repl, initReplSession, approvals)
- cli/model-setup.ts provider config/factory, /model, context-window probe
- cli/logging.ts     spinner, terminal Reporter, --log ledger, log paths
                     (module state now behind setInteractiveStream())
- cli/banner.ts      welcome banner, startup hint, plan chip, resume replay
- cli/gate-setup.ts  gate resolution (resumed > --accept > --web > auto);
                     BROWSER_CHECK now reused from gate/tool-paths
- cli/repl-commands.ts  /sessions /map /review /trace + metrics line
- cli/web-setup.ts   web scaffold + deps install progress

cli.ts keeps main(), runOnce(), and the one-shot modes (559 lines, was
2938). External import paths preserved via re-exports (providerConfig,
isApproval/isPlanApproval, spinner). Parity test now reads cli/repl.ts.

Full validate green (1890 pass + 5 PTY suites); live REPL boot smoke-tested
on the real binary.
- loop/staged-build.ts: the design→implement phase orchestration behind a
  narrow IStagedBuildHost seam (gate/tool swap, one send, one gate probe,
  one raw completion); Session methods become thin delegates. Now unit-
  tested against a fake host: gate+tools save/restore ordering, the
  interrupted short-circuit, the green-skip of phase 2, plan-note
  injection, and the from-disk type-contract re-injection.
- loop/model-call.ts: the two pure per-call decisions in askModel —
  selectThinking (forced > repairing > per-send > config precedence) and
  offeredToolsFor (plan mode's read-only tool filter + MCP ride-along) —
  extracted and pinned by direct unit tests (the read-only guarantee had
  none).
- Fixed en route: the staged-build design/implement prompts still told the
  model to I-prefix web interfaces, contradicting the scaffold guidance
  (web is bare PascalCase since a1bf032) — prompts now agree.

session.ts 2118 → ~1930 lines. Full validate green (1921 tests + 5 PTY
suites).
editor/completion.ts owns the anchor/query tracking, dropdown navigation,
selection clamping, and accept-replaces-query-keeps-@ behaviour behind
createCompletion(); the controller feeds it key names and re-queries it
after edits. IEditorCompletionSource moved with it (re-exported from the
controller for existing importers).

Now directly unit-tested WITHOUT stdin (tests/editor-completion.test.ts,
9 tests over a real EditorBuffer): whitespace closes the mention, cursor-
before-anchor closes, clamped selection, empty-list accept is a no-op, and
the picked path lands as '@<path> '. The controller keeps the stdin loop,
key dispatch (already a data-driven table), history, and lifecycle.

Note: the planned key-dispatch rework was based on a stale review claim of
a 387-line builder — the table is ~80 lines and already declarative, so it
stays. Full validate green (1930 tests + 5 PTY suites incl. the real-PTY
@-picker interaction).
…le-ops)

Wire the env-gated trace() into the silent catches the review flagged:
cli/model-setup (hostOf, detectContextWindow, warnDefaultModelOnRemote),
cli/logging (newestLogFile), runNotify, file-ops currentFileView, and the
editor's fire-and-forget palette/picker opens. Degrade behaviour unchanged
— TSFORGE_TRACE now shows WHAT quietly failed. Regression test: an
unreachable model endpoint leaves a scoped [cli.detectContextWindow] line
while still returning undefined.

Also fixed en route: agent.constants.ts still described the script tool as
'Opt-in (TSFORGE_SCRIPT)' — it has been ON by default with TSFORGE_NO_SCRIPT
as the kill-switch since 0.23.0.

Dropped from the plan (claims disproven while verifying): the hashline
parse swallow doesn't exist, and the snapshotMtimes/changedSince 'duplicate'
lives only in turn.ts (already traced) — no lib/file-diff.ts needed.

Full validate green (1931 tests + 5 PTY suites).
- reference/flags.mdx: document the missing env vars, each verified against
  its read site — TSFORGE_TRACE/TSFORGE_DEBUG/TSFORGE_EDITOR_DEBUG (new
  'Debug & tracing' section), TSFORGE_A11Y + TSFORGE_SCREENSHOTS (gate
  oracles), TSFORGE_WEB + TSFORGE_TDD env equivalents,
  TSFORGE_SCRIPT_MAX_CALLS/TIMEOUT_MS, TSFORGE_STATUS (--notify),
  TSFORGE_BASIC_INPUT, TSFORGE_PROPTEST_TIMEOUT_MS; clarify
  NO_UPDATE_NOTIFIER is the cross-tool standard. (TSFORGE_RPC_* stays
  undocumented: internal to the script tool's RPC bridge.)
- cli/interactive.mdx: add the missing /trace and /setup rows; note /quit.
- loop/gate-floor.mdx: new 'Staged gate progress & failures' section
  quoting the real runner strings (━━ banners, ✗ <label> FAILED (exit N),
  stop-on-first-failure).
- scaffold/web.mdx: web interfaces are bare PascalCase now, not I-prefixed.
- astro.config.mjs: input-editor.mdx was orphaned — added under Reference.

Docs build green (46 pages).
@agjs agjs force-pushed the feat/generic-wizard branch from 494a34e to af656e1 Compare July 5, 2026 07:23
@agjs agjs changed the title feat(wizard+cli): generic wizard primitive + /config settings menu feat: generic wizard + /config hub, TUI overhaul, structural refactor, e2e uplift + docs sync Jul 5, 2026
agentCardBody joined rail-prefixed lines with no width awareness, so a long
replayed assistant line spilled past the rail on resume (the live stream was
already wrap-safe via makeAgentRail). It now feeds the settled text through
the SAME ANSI-aware, display-width wrapper the live path uses; renderMessage
threads the real terminal columns through. Tests: long replay wraps with
every body row railed and within width; CJK counts as 2 columns.

Full validate green (1933 tests + 5 PTY suites).
@agjs agjs merged commit fedf14f into main Jul 5, 2026
8 checks passed
@agjs agjs deleted the feat/generic-wizard branch July 5, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant