You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The next Dogfood Parity 1 risk is not another docs pass; it is proving real tool execution behavior with fake Responses API turns.
Old Every Code evidence named stuck_exec.rs, exec_completion_test.rs, tool_hooks.rs, git_mutation_guard.rs, and sandbox/write-denial probes as high-signal.
Acceptance Criteria
Add deterministic fake-response scenarios that make the model invoke the current shell tool and assert JSONL command events.
Cover at least one successful shell command and one write-denial/sandbox or git-safety behavior that can run hermetically.
Extend harness assertions only where needed, keeping them generic for future scenarios.
Include the new scenarios in just harness-smoke when they are deterministic and fast.
Validate with ./build-fast.sh and just harness-smoke.
Out Of Scope
Full Auto Drive, Code Bridge/browser, and Auto Review parity.
Porting old Every Code implementation modules wholesale.
Finish Line
The exec harness covers the first high-signal Every Code exec safety regressions on the Codex-base code exec --json path.
Current Status
State: Ready for implementation under #404.
Context:
just harness-smokeas the deterministic no-live-token gate.stuck_exec.rs,exec_completion_test.rs,tool_hooks.rs,git_mutation_guard.rs, and sandbox/write-denial probes as high-signal.Acceptance Criteria
just harness-smokewhen they are deterministic and fast../build-fast.shandjust harness-smoke.Out Of Scope