Harden e2e CI tests and copy job schema at MCP startup#252
Merged
Conversation
The claude-code-e2e job has been failing since Mar 2 — Claude exits after ~60s with no output, likely due to AskUserQuestion in --print mode. This adds stream-json output for tool call visibility, --max-turns 30 to prevent early exit, stronger prompt guardrails (no AskUserQuestion, must complete all tool calls), the missing go_to_step MCP permission, and PR-level runs when the workflow file itself changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The merge queue failure showed the agent hitting schema validation errors during job creation, then switching to the "repair" workflow instead of fixing the schema and resubmitting. Add explicit instruction to never start repair/learn workflows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old hint said "This project likely needs /deepwork:repair" first, which led agents to start the repair workflow even when they had just created the file and could fix it themselves. Reword to lead with "fix it directly if you edited it" and only suggest repair as a fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The define.md and implement.md instruction files still referenced a root-level "description" field that was replaced by "common_job_info_provided_to_all_steps_at_runtime" in the schema. This caused e2e CI agents to create invalid job.yml files with additionalProperties errors. Fixed 4 references in define.md and 5 in implement.md. Also added a .deepreview rule (job_schema_instruction_compatibility) that reviews deepwork_jobs instruction files against the job schema whenever either changes, preventing this drift from recurring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address prompt best practices review findings: - Define <job_dir> and [job_dir] placeholders upfront before first use - Consolidate overlapping guideline sections in implement.md - Make completion checklist verifiable with concrete criteria - Nest sub-sections under Step 2 (H4) to fix step numbering hierarchy - Move mandatory review requirement to top of Step 4 in define.md - Make "rich context" guideline specific about what to include - Add bridging note connecting patterns to Q&A flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The example showed adding a changelog field to job.yml, but changelog was removed from the schema. The fix_jobs repair step already instructs removing changelog sections — this aligns the iterate example. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MCP server now copies job.schema.json to .deepwork/job.schema.json on every startup, giving agents a stable reference path regardless of where DeepWork is installed. The common_job_info in deepwork_jobs/job.yml now includes rigorous instructions to read the schema before creating or editing any job.yml, with explicit callouts for commonly-misused fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2addf99 to
00b4be1
Compare
Adds formal requirement that the MCP server copies job.schema.json to .deepwork/job.schema.json on startup (overwriting stale copies). Includes 6 tests covering: copy behavior, overwrite of existing files, directory creation, graceful failure with warning, content fidelity, and integration with create_server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
00b4be1 to
afe3387
Compare
Observed turn counts from last 2 successful runs: 10-12 for job creation, 7-10 for workflow execution. Lowering from 30 to 20 catches runaway agents faster while leaving headroom. Adding --dangerously-skip-permissions removes permission prompts that waste turns in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent was writing output files to .deepwork/jobs/fruits/ instead of ./fruits/ relative to project root. Add explicit instruction to write outputs relative to the working directory, not inside .deepwork/jobs/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
job.schema.jsonto.deepwork/job.schema.jsonon startup so agents always have a stable reference path regardless of install location. Overwrites stale copies. Added formal requirement (JOBS-REQ-001.1.13) and 6 tests.common_job_infoindeepwork_jobs/job.yml— agents are now told to read the JSON schema before creating/editing anyjob.yml, with explicit callouts for commonly-misused fields (oneOfinputs, notype/pathfields, etc.)descriptionfield references indefine.mdandimplement.mdthat referenced a removed root-level field (nowcommon_job_info_provided_to_all_steps_at_runtime)tools.pyto tell agents to fix files directly rather than starting the repair workflow.deepreviewrule (job_schema_instruction_compatibility) to catch future schema-instruction driftdefine.mdandimplement.mdbased on review feedback--dangerously-skip-permissions, max turns lowered from 30→20, explicit output path guidance, model upgraded to Sonnet 4.6, conditional PR runs, stream-json debuggingTest plan
tests/unit/jobs/mcp/test_server.py)uv run pytest)🤖 Generated with Claude Code