plan: Add agent-friendly reliability baseline implementation plan#4
Merged
davidabram merged 30 commits intomainfrom Mar 6, 2026
Merged
plan: Add agent-friendly reliability baseline implementation plan#4davidabram merged 30 commits intomainfrom
davidabram merged 30 commits intomainfrom
Conversation
Add the sce-cli reliability-baseline plan with locked assumptions, success criteria and constraints for deterministic output, errors, and setup behavior.
Add a new config service and wire it into CLI parsing, dispatch, and help so sce config show|validate resolves runtime values with explicit precedence (flags > env > config file > defaults). Enforce strict config-file validation and provide stable text/json output contracts with parser, dispatch, and precedence-focused test coverage.
Map top-level CLI failures to stable parse, validation, runtime, and dependency exit-code classes so automation can branch without parsing stderr text. Add targeted tests for each failure class and update task documentation to reflect the implemented contract.
Add an env-configurable logger with deterministic level/format parsing, stable lifecycle event IDs, and severity-based filtering in app dispatch paths. Preserve stdout command payload contracts by emitting observability records to stderr only, and align observability contract/plan docs with the implemented baseline.
Introduce opt-in OpenTelemetry exporter bootstrap with deterministic env controls and validation, then run command dispatch under a tracing subscriber context so lifecycle events can be exported without changing stdout payload behavior. Mirror existing lifecycle logger events into tracing events and update CLI docs/contracts to document the new OTEL controls.
Resolve default config files from ${state_root}/sce/config.json
and .sce/config.json when no explicit config path is set,
applying deterministic global-then-local key overrides.
Expose loaded config paths and per-value config source metadata
in show/validate output, and add resolver coverage for merge precedence.
Support optional SCE_LOG_FILE mirroring with deterministic truncate-by-default behavior and validated append mode. Enforce owner-only (0600) permissions on Unix, keep stdout contracts unchanged via stderr diagnostics on file-write failures, and add observability tests for mode/file-sink behavior.
Add a shared RetryPolicy/run_with_retry wrapper with bounded timeout and capped backoff so transient Turso operations fail deterministically with actionable guidance. Apply it to sync smoke checks and Agent Trace schema bootstrap to reduce flaky failures while preserving explicit retry observability.
Apply shared sensitive-value redaction to app errors, observability output, and setup git diagnostics. Canonicalize and validate `sce setup --hooks --repo <path>` targets, and add deterministic directory write-permission probes before setup writes.
Add a --non-interactive setup flag that fails fast unless exactly one target flag is provided, and update usage/TTY guidance to document the explicit non-interactive flow. Expand parser and setup-mode tests to cover valid/invalid combinations and preserve mutual-exclusion behavior.
Return payload strings from command dispatch so app-level stream routing owns final output, keeping success text on stdout and redacted diagnostics on stderr with unchanged exit-code classes. Add stream-routing regression coverage and align reliability-baseline contract docs with the implemented behavior.
Route `doctor`, `mcp`, `hooks`, and `sync` through explicit help-command variants so `--help`/`-h` returns command-local usage output. Expand usage text examples for setup and service commands, add parser/runtime tests for help routing, and record task completion state updates.
…utput Wire version into command parsing/dispatch and command-surface help metadata. Add services::version parsing/rendering for --format <text|json> with stable runtime-identification fields. Extend tests for parser routing, help discoverability, and output contract stability. Update version-command contract and related architecture/overview/glossary plan-state documentation to match implemented behavior.
Implement stable class-based diagnostic codes for CLI failures and render errors as `Error [<code>]: ...` on stderr. Append class-default `Try:` remediation only when missing, preserve existing guidance when present, and lock behavior with parser/dependency diagnostics tests.
Centralize `--format <text|json>` parsing in a shared `OutputFormat` service and reuse it across `config` and `version`. Standardize invalid-format validation to include command-specific `--help` guidance while preserving existing command output behavior.
Unify setup parsing and dispatch through SetupRequest so one invocation can run target config install and required hook install together. Allow --hooks with a single target flag, keep target flags mutually exclusive, make default interactive setup install hooks in the same run, and update CLI/setup tests and help text accordingly.
Route `sce doctor` through a typed request so output format is explicit and validated. Add deterministic JSON rendering (with stable top-level fields) while preserving existing text output and extend parser/shape tests.
Add request parsing and `--format <text|json>` support for `mcp` and `sync` so placeholder commands return deterministic machine-readable payloads while preserving stable text summaries. Route parsed requests through command dispatch, add JSON field-contract tests, and update context/task status to reflect the completed output-contract baseline.
Replace generic usage-only parse/validation failures with explicit Try: guidance across top-level parsing plus setup/hooks argument validation. This gives copyable recovery commands and keeps tests aligned with the new error contract.
Expand top-level and setup help text with copy-ready non-interactive setup, hook installation, and JSON-output flows for automation. Align README usage examples and help-contract tests with the updated guidance, and mark T18 complete in the reliability baseline plan.
…e and JSON services Lock parser error text and JSON output determinism for config, doctor, mcp, sync, and version paths. This reduces accidental contract drift across repeated runs and preserves agent-facing reliability guarantees.
Defines the canonical scenario matrix and deterministic assertion policy for setup integration tests executed via the compiled sce binary under a Nix entrypoint.
Add binary-driven Rust integration tests for setup targets and hook installation using isolated temp repos and scoped state-home paths. Wire a deterministic `nix run .#cli-integration-tests` app plus `cli-setup-integration` flake checks, and document the new local verification entrypoint in CLI docs.
Remove explicit branches: [main] filters under push in the staged workflow files Remove pull_request trigger blocks from those workflows so they run under the simplified trigger configuration only
Define plan sce-cli-setup-integration-test-improvements with nine tasks expanding setup integration coverage for --repo path canonicalization, failure contracts, PTY interactive flows, hook update/backup edge cases, permission scenarios, and cross-platform validation.
Add comprehensive integration test scenarios for setup command Extend CI matrix to include macOS alongside Ubuntu for cross-platform validation. Add portable-pty, regex, and libc test dependencies to support PTY interaction and platform-aware assertions.
Canonicalize the harness repo root before asserting hooks and backup paths so expectations match resolved output paths. Use a shared helper to assert the Turso `agent-trace/local.db` location with OS-specific rules, improving cross-platform test determinism.
d93e167 to
d7986ed
Compare
d7986ed to
e5779f1
Compare
Run the integration test workflow on windows-latest using cargo test while keeping the existing Nix path for Unix runners. Update setup integration harness environment variables and expected local DB path to use Windows app-data locations consistently.
e5779f1 to
e60f473
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add the sce-cli reliability-baseline plan with locked assumptions,
success criteria and constraints for deterministic output, errors,
and setup behavior.