Skip to content

Add exec-harness safety scenarios for Dogfood Parity 1 #407

@shiny-code-bot

Description

@shiny-code-bot

Finish Line

The exec harness covers the first high-signal Every Code exec safety regressions on the Codex-base code exec --json path.

Current Status

State: Ready for implementation under #404.

Context:

  • PR Make exec harness a dogfood gate #406 established just harness-smoke as the deterministic no-live-token gate.
  • The next Dogfood Parity 1 risk is not another docs pass; it is proving real tool execution behavior with fake Responses API turns.
  • Old Every Code evidence named stuck_exec.rs, exec_completion_test.rs, tool_hooks.rs, git_mutation_guard.rs, and sandbox/write-denial probes as high-signal.

Acceptance Criteria

  • Add deterministic fake-response scenarios that make the model invoke the current shell tool and assert JSONL command events.
  • Cover at least one successful shell command and one write-denial/sandbox or git-safety behavior that can run hermetically.
  • Extend harness assertions only where needed, keeping them generic for future scenarios.
  • Include the new scenarios in just harness-smoke when they are deterministic and fast.
  • Validate with ./build-fast.sh and just harness-smoke.

Out Of Scope

  • Full Auto Drive, Code Bridge/browser, and Auto Review parity.
  • Porting old Every Code implementation modules wholesale.
  • Platform-specific keyboard/clipboard probes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planDurable planning issueplan:activeCurrent active plan

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions