Harden e2e CI tests and copy job schema at MCP startup by nhorton · Pull Request #252 · Unsupervisedcom/deepwork

nhorton · 2026-03-04T00:08:10Z

Summary

MCP server copies job.schema.json to .deepwork/job.schema.json on startup so agents always have a stable reference path regardless of install location. Overwrites stale copies. Added formal requirement (JOBS-REQ-001.1.13) and 6 tests.
Added schema instructions to common_job_info in deepwork_jobs/job.yml — agents are now told to read the JSON schema before creating/editing any job.yml, with explicit callouts for commonly-misused fields (oneOf inputs, no type/path fields, etc.)
Fixed 9 stale description field references in define.md and implement.md that referenced a removed root-level field (now common_job_info_provided_to_all_steps_at_runtime)
Improved repair hint in tools.py to tell agents to fix files directly rather than starting the repair workflow
Added .deepreview rule (job_schema_instruction_compatibility) to catch future schema-instruction drift
Prompt clarity improvements in define.md and implement.md based on review feedback
E2e CI hardening: --dangerously-skip-permissions, max turns lowered from 30→20, explicit output path guidance, model upgraded to Sonnet 4.6, conditional PR runs, stream-json debugging

Test plan

6 new unit tests for schema copy behavior (tests/unit/jobs/mcp/test_server.py)
All existing tests pass (uv run pytest)
Ruff lint + format clean
DeepWork reviews pass (job_definition, prompt_best_practices, python_code_review, python_lint, schema_compatibility, doc_sync, suggest_new_reviews, requirements_traceability)
E2e CI passes in merge queue

🤖 Generated with Claude Code

The claude-code-e2e job has been failing since Mar 2 — Claude exits after ~60s with no output, likely due to AskUserQuestion in --print mode. This adds stream-json output for tool call visibility, --max-turns 30 to prevent early exit, stronger prompt guardrails (no AskUserQuestion, must complete all tool calls), the missing go_to_step MCP permission, and PR-level runs when the workflow file itself changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The merge queue failure showed the agent hitting schema validation errors during job creation, then switching to the "repair" workflow instead of fixing the schema and resubmitting. Add explicit instruction to never start repair/learn workflows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The old hint said "This project likely needs /deepwork:repair" first, which led agents to start the repair workflow even when they had just created the file and could fix it themselves. Reword to lead with "fix it directly if you edited it" and only suggest repair as a fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The define.md and implement.md instruction files still referenced a root-level "description" field that was replaced by "common_job_info_provided_to_all_steps_at_runtime" in the schema. This caused e2e CI agents to create invalid job.yml files with additionalProperties errors. Fixed 4 references in define.md and 5 in implement.md. Also added a .deepreview rule (job_schema_instruction_compatibility) that reviews deepwork_jobs instruction files against the job schema whenever either changes, preventing this drift from recurring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Address prompt best practices review findings: - Define <job_dir> and [job_dir] placeholders upfront before first use - Consolidate overlapping guideline sections in implement.md - Make completion checklist verifiable with concrete criteria - Nest sub-sections under Step 2 (H4) to fix step numbering hierarchy - Move mandatory review requirement to top of Step 4 in define.md - Make "rich context" guideline specific about what to include - Add bridging note connecting patterns to Q&A flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The example showed adding a changelog field to job.yml, but changelog was removed from the schema. The fix_jobs repair step already instructs removing changelog sections — this aligns the iterate example. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The MCP server now copies job.schema.json to .deepwork/job.schema.json on every startup, giving agents a stable reference path regardless of where DeepWork is installed. The common_job_info in deepwork_jobs/job.yml now includes rigorous instructions to read the schema before creating or editing any job.yml, with explicit callouts for commonly-misused fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds formal requirement that the MCP server copies job.schema.json to .deepwork/job.schema.json on startup (overwriting stale copies). Includes 6 tests covering: copy behavior, overwrite of existing files, directory creation, graceful failure with warning, content fidelity, and integration with create_server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Observed turn counts from last 2 successful runs: 10-12 for job creation, 7-10 for workflow execution. Lowering from 30 to 20 catches runaway agents faster while leaving headroom. Adding --dangerously-skip-permissions removes permission prompts that waste turns in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent was writing output files to .deepwork/jobs/fruits/ instead of ./fruits/ relative to project root. Add explicit instruction to write outputs relative to the working directory, not inside .deepwork/jobs/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nhorton and others added 4 commits March 3, 2026 17:07

Update workflows README to reflect conditional PR e2e runs

a6f834a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add --verbose flag required for stream-json output in print mode

ed9e49b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Upgrade e2e CI model from Sonnet 4.5 to Sonnet 4.6

599cf65

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nhorton added this pull request to the merge queue Mar 4, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2026

nhorton added this pull request to the merge queue Mar 4, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2026

nhorton and others added 6 commits March 4, 2026 14:21

nhorton force-pushed the claude/fix-e2e-ci-failures-stream-json branch from 2addf99 to 00b4be1 Compare March 4, 2026 23:40

nhorton force-pushed the claude/fix-e2e-ci-failures-stream-json branch from 00b4be1 to afe3387 Compare March 4, 2026 23:41

nhorton and others added 2 commits March 4, 2026 15:55

nhorton enabled auto-merge March 5, 2026 00:20

nhorton changed the title ~~Fix failing e2e CI tests with debugging and prompt hardening~~ Harden e2e CI tests and copy job schema at MCP startup Mar 5, 2026

nhorton added this pull request to the merge queue Mar 5, 2026

Merged via the queue into main with commit 10b2f58 Mar 5, 2026
5 checks passed

nhorton deleted the claude/fix-e2e-ci-failures-stream-json branch March 5, 2026 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden e2e CI tests and copy job schema at MCP startup#252

Harden e2e CI tests and copy job schema at MCP startup#252
nhorton merged 13 commits intomainfrom
claude/fix-e2e-ci-failures-stream-json

nhorton commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nhorton commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nhorton commented Mar 4, 2026 •

edited

Loading