Skip to content

software-factory: retire structured update tools — agent writes raw JSON:API#4652

Open
jurgenwerk wants to merge 3 commits intomainfrom
retire-structured-update-tools
Open

software-factory: retire structured update tools — agent writes raw JSON:API#4652
jurgenwerk wants to merge 3 commits intomainfrom
retire-structured-update-tools

Conversation

@jurgenwerk
Copy link
Copy Markdown
Contributor

Finishes the CS-10883 retirement that was deferred from PR #4633. The five wrapper tools that used to construct tracker-card JSON for the agent — update_project, update_issue, create_knowledge, create_catalog_spec, add_comment — are removed. The agent now writes those .json files via native Write (Claude) or write_file (OpenRouter) the same way it already writes .gts definitions and content-card instances. One workspace fs surface, no special cases for tracker cards.

What changed

The five tools are gone

buildFactoryTools no longer registers them; factory-tool-builder.ts loses ~330 lines (the five build*Tool functions, the readPatchDocument helper, the cardTypeSchemas config, per-type schema resolution, the now-orphaned buildCardDocument import).

Schemas live in the skill, not in tool params

software-factory-bootstrap/SKILL.md is rewritten to:

  • Spell out the JSON:API envelope (data / type: "card" / attributes / relationships / meta.adoptsFrom)
  • Give per-type adoptsFrom names (Project, Issue, KnowledgeArticle)
  • List required + optional attributes for each type with examples
  • Document Issue invariants the wrappers used to enforce silently — description is immutable; status may only transition to blocked or backlog; the orchestrator owns done / in_progress
  • Show a complete concrete example (Issue card)

software-factory-operations/SKILL.md adds matching guidance for the implementation phase: how to update tracker cards (Read → patch → Write back), how to append comments to an existing Issue, and the catalog-Spec shape (which adopts from https://cardstack.com/base/spec, not the tracker module).

Live tracker module URL flows into the prompt

AgentContext gains a darkfactoryModuleUrl? field. The wiring already computes it (it's <origin>/software-factory/darkfactory) and now plumbs it through ContextBuilder.buildForIssue into the system prompt template via prompts/system.md. The agent reads the URL from there and pastes it verbatim into meta.adoptsFrom.module.

Wiring slimmed

factory-issue-loop-wiring.ts drops the loadDarkFactorySchemas step at startup (one less prerender-pool roundtrip per factory boot, ~5 fewer seconds of cold-start). The cardTypeSchemas field on ToolBuilderConfig is gone.

Tests

  • factory-tool-builder.test.ts: drops the schemas-and-document-assembly module, the add_comment module, and the structured-tool path-arg tests. ~800 lines lighter. Adds a notOk(toolNames.includes(retired)) assertion so the regression doesn't silently come back.
  • factory-tool-executor.spec.ts: the Playwright spec was almost entirely test cases for the structured tools. Trimmed to its remaining unregistered-tool sanity check. ~390 lines lighter.

Verification

End-to-end factory run on a fresh realm (my-test-tier-1-1-new):

  • Outcome: all_issues_done
  • Outer cycles: 2 (bootstrap + sticky-note)
  • Validations: all 5 green (parse / lint / evaluate / instantiate / test)
  • QUnit: 11/11 tests passing
  • Bootstrap iteration: emits Project + 3 Knowledge Articles + Issue via Write with the right adoptsFrom. No structured tools invoked.
  • Sticky-note iteration: writes card def, test, 3 instances, Spec the same way.

pnpm lint (lint:js + lint:format + lint:types) clean. Unit suite green.

Tool surface after this PR

Claude backend MCP catalog: 7 tools (5 validators + 2 signals).
OpenRouter backend FactoryTool[]: read_file / write_file / edit_file (already present), search_realm, fetch_transpiled_module, run_command, 5 validators, 2 signals.

Known issue (separate ticket)

CS-11033 — the canUseTool workspace-scoping hook from PR #4633 is not actually enforced under permissionMode: 'dontAsk' + tools in allowedTools. The agent can Write to absolute paths outside the factory workspace and the SDK accepts it. Surfaced by this PR's E2E run when the model invented a /Users/jurgen/code/factory-tier-1/... path. The agent self-corrected after run_parse couldn't find the file, but that's accidental recovery, not enforcement. Tracking the fix separately.

Test plan

  • pnpm factory:go --agent claude --brief-url … --target-realm-url … --debug completes with all_issues_done
  • pnpm factory:go --agent openrouter --brief-url … --target-realm-url … --debug completes with all_issues_done
  • Bootstrap iteration writes Project / Knowledge Articles / Issue via Write (Claude) / write_file (OpenRouter) with data.meta.adoptsFrom.module matching the value named in the system prompt
  • Spec card adopts from https://cardstack.com/base/spec / Spec, with linkedExamples pointing at sample instances
  • No tool calls to update_project, update_issue, create_knowledge, create_catalog_spec, or add_comment in either run

🤖 Generated with Claude Code

jurgenwerk and others added 2 commits May 5, 2026 10:38
…SON:API

Drop the five structured wrapper tools that used to construct tracker-card
JSON for the agent: `update_project`, `update_issue`, `create_knowledge`,
`create_catalog_spec`, `add_comment`. CS-10883 always called for retiring
these once the skill teaches the JSON:API + adoptsFrom shape directly;
the partial PR (#4633) kept them while the skill rewrite caught up. This
finishes that work.

The agent now writes those `.json` files via native `Write` (Claude
backend) or `write_file` (OpenRouter backend) — same workspace fs surface
it already uses for `.gts` definitions and content-card instances. The
schemas the wrappers used to embed in tool parameters are documented in
the `software-factory-bootstrap` skill (envelope, attribute tables per
type, relationship shapes); the tracker module URL is plumbed through
`AgentContext.darkfactoryModuleUrl` into the system prompt so the agent
puts the right `meta.adoptsFrom.module` on each tracker card.

Issue invariants the wrappers used to enforce silently — `description`
immutability, restricted status transitions (`blocked` / `backlog` only)
— are now spelled out in the bootstrap skill as rules the agent must
follow itself. The validators (parse / instantiate / test) catch
malformed JSON; the immutability/transition rules don't have a
structural backstop yet, so they're prompt-enforced.

Wiring side effects:

- `factory-tool-builder.ts` loses ~330 lines (the five `build*Tool`
  functions, the `readPatchDocument` helper, `cardTypeSchemas` config,
  per-type schema resolution).
- `factory-issue-loop-wiring.ts` no longer fetches card-type schemas
  via `loadDarkFactorySchemas` at startup (one less prerender-pool
  roundtrip per factory boot, ~5 fewer seconds of warm-up).
- `factory-tool-builder.test.ts` drops the schemas-and-document-assembly
  module and the structured-tool path-arg tests (~800 lines).
- `factory-tool-executor.spec.ts` shrinks to its remaining
  unregistered-tool sanity check (~390 lines).

End-to-end run on a fresh realm: `outcome=all_issues_done` over 2 cycles,
all 5 validations green, 11/11 QUnit tests passing. Bootstrap iteration
emits Project + 3 Knowledge Articles + Issue via `Write` directly with
the right `adoptsFrom`; sticky-note iteration writes the card def, test,
3 instances, and Spec the same way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…edTools

The `canUseTool` hook from PR #4633 was dead code: empirical testing
(scripts/canusetool-repro.ts) shows the SDK auto-approves any tool in
`allowedTools` and never invokes the hook. Because Write/Edit/etc were
in allowedTools under `permissionMode: 'dontAsk'`, the path-scoping
safety net never fired and the model could write to absolute paths
outside the factory workspace.

Fix:
  - permissionMode: 'dontAsk' → 'default' (only mode that actually
    invokes canUseTool for tools outside allowedTools)
  - Remove NATIVE_FS_TOOLS from allowedTools (they stay in `tools` so
    they remain available, just no longer auto-approved)
  - MCP factory tools stay in allowedTools — auto-approve is fine for
    our own in-process implementations

Verification:
  - scripts/canusetool-fix-verify.ts: hook fires for every Write call;
    in-workspace Write succeeds, absolute out-of-workspace Write is
    rejected
  - New unit test locks in both structural conditions (native fs out
    of allowedTools + permissionMode='default') so any future
    regression on either knob fails at unit-test time

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jurgenwerk jurgenwerk marked this pull request as ready for review May 5, 2026 09:34
@jurgenwerk jurgenwerk requested review from a team, Copilot and habdelra May 5, 2026 09:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fca56b85b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +152 to +154
// immutability, and so on. CS-10883 retired all five; the agent now
// writes those `.json` files directly via `Write` (Claude) /
// `write_file` (OpenRouter). The shapes and invariants are taught in
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update Claude guidance when retiring tracker wrapper tools

Retiring the five tracker wrapper tools here leaves the Claude backend with conflicting instructions: buildSystemPrompt() still says “do NOT use Write or Edit for tracker-schema cards” and claims dedicated tools like update_issue/create_catalog_spec exist (packages/software-factory/src/factory-agent/claude-code.ts:430-436). After this change those tools are no longer registered, so the model is told both that it must not write tracker JSON directly and that it must not call non-existent tools, which can block bootstrap/issue updates in real runs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a.

The Claude-specific tool addendum in claude-code.ts:367-440 was stale — it still carried a "Critical rule — do NOT use Write or Edit for tracker-schema cards" + a list of retired tools (update_project / update_issue / etc.). Dropped the whole block; the addendum now explicitly says use native fs for every workspace file, including Projects/*.json, Issues/*.json, Knowledge Articles/*.json, Spec/*.json — pointing at the bootstrap and operations skills for the JSON:API shapes and Issue invariants. Verified with the 6 ClaudeCodeFactoryAgent tests + a real factory:go --agent claude end-to-end run that completed outcome=all_issues_done writing all tracker cards via native Write.

Comment on lines +98 to +101
"body": "...comment text, markdown allowed...",
"author": "factory-agent",
"createdAt": "<ISO timestamp>"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the schema’s datetime field for appended comments

The new comment-shape example instructs agents to append createdAt, but the actual Comment card schema uses datetime (packages/software-factory/realm/darkfactory.gts:245-250). With this guidance, agent-written issue comments will miss the expected timestamp field, so comment metadata won’t match the tracker schema and UI/date rendering logic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a.

Confirmed against realm/darkfactory.gts:245-249 — the Comment field schema is { body, author, datetime }, no createdAt. Updated the operations skill to use datetime and added a note calling out that the Issue card itself has separate createdAt / updatedAt attributes (which the bootstrap skill correctly references) — the two are not the same field.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the software-factory structured-tool retirement by removing the tracker-card wrapper tools and teaching the agent to write raw JSON:API tracker cards (Project / Issue / KnowledgeArticle / Spec) directly via the workspace filesystem, with the live darkfactoryModuleUrl surfaced in prompts.

Changes:

  • Removed the five structured tracker-card tools from buildFactoryTools and deleted the associated schema/document assembly logic.
  • Plumbed darkfactoryModuleUrl through the issue loop into the system prompt so the agent can set meta.adoptsFrom.module correctly.
  • Slimmed and updated test coverage to focus on the remaining tool surface and the canUseTool gating behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/software-factory/tests/factory-tool-executor.spec.ts Removes integration coverage for retired structured tools; keeps unregistered-tool guard test.
packages/software-factory/tests/factory-tool-builder.test.ts Drops schema/tool tests for retired tools; adds regression assertion that retired tool names are absent.
packages/software-factory/tests/factory-agent-claude-code.test.ts Updates Claude agent options expectations for allowedTools/permissionMode and adds regression guard for CS-11033.
packages/software-factory/src/issue-loop.ts Adds darkfactoryModuleUrl to loop/context builder plumbing.
packages/software-factory/src/factory-tool-builder.ts Removes structured tracker update tool builders and schema-based parameter logic.
packages/software-factory/src/factory-prompt-loader.ts Passes darkfactoryModuleUrl into the system prompt template.
packages/software-factory/src/factory-issue-loop-wiring.ts Removes schema preloading step; forwards darkfactoryModuleUrl into the issue loop config.
packages/software-factory/src/factory-context-builder.ts Adds optional darkfactoryModuleUrl to AgentContext construction.
packages/software-factory/src/factory-agent/types.ts Extends AgentContext with darkfactoryModuleUrl.
packages/software-factory/src/factory-agent/openrouter.ts Includes darkfactoryModuleUrl in system prompt template variables.
packages/software-factory/src/factory-agent/claude-code.ts Changes allowedTools/permissionMode to ensure canUseTool gates native fs tools; includes darkfactoryModuleUrl in prompt vars.
packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Updates smoke test tool-builder config after removing structured-tool requirements.
packages/software-factory/scripts/canusetool-repro.ts Adds a repro harness documenting canUseTool behavior across permission modes.
packages/software-factory/scripts/canusetool-fix-verify.ts Adds a faster verification harness for the CS-11033 gating fix.
packages/software-factory/prompts/system.md Adds explicit tracker schema module URL guidance for meta.adoptsFrom.module.
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md Rewrites tracker-card workflow docs to “write JSON directly”; documents update/merge flow and spec shape.
packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md Rewrites bootstrap docs to teach JSON:API envelope + adoptsFrom details and issue invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{
"body": "...comment text, markdown allowed...",
"author": "factory-agent",
"createdAt": "<ISO timestamp>"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a (same fix as the Codex P2 comment above) — the Comment field schema in darkfactory.gts:249 is datetime, not createdAt. Operations skill updated to use datetime, with a note distinguishing it from the Issue card's own createdAt / updatedAt attributes (those are correct in the bootstrap skill).

Comment on lines +35 to +45
# Tracker schema module URL

When you create a Project, Issue, or KnowledgeArticle JSON file, set
`data.meta.adoptsFrom.module` to **`{{darkfactoryModuleUrl}}`** — this
is the live module URL for the tracker schema in this run. The exact
`adoptsFrom.name` per file (`Project` / `Issue` / `KnowledgeArticle`)
and the JSON:API document shape are documented in the
software-factory-bootstrap and software-factory-operations skills.

Catalog Spec cards adopt from `https://cardstack.com/base/spec` /
`Spec` instead, regardless of the target realm.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a — both halves:

  1. add_comment reference removed: prompts/system.md:21-22 no longer points at the retired tool. The new wording directs the agent to append a new entry to the issue's attributes.comments[] array (Read the issue, push a Comment, Write the document back), with a pointer to the operations skill for the comment shape.

  2. Empty darkfactoryModuleUrl now fails fast: added requireDarkfactoryModuleUrl(context) in factory-prompt-loader.ts that throws if the field is missing or empty. Wired it into assembleSystemPrompt, claude-code.ts:339, and openrouter.ts:334 — replacing the silent ?? '' fallback. In production the value is always set by factory-issue-loop-wiring.ts via inferDarkfactoryModuleUrl(targetRealmUrl); a missing value indicates a plumbing bug, so failing the system-prompt render is the correct response. Test fixtures updated to populate the field.

darkfactoryModuleUrl: context.darkfactoryModuleUrl ?? '',
skills,
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a — same fix as the Codex P1 comment above. The Claude-specific addendum in claude-code.ts no longer says "do NOT use Write or Edit for tracker-schema cards" or references the retired structured tools. It now explicitly tells the agent to use native fs for every workspace file including Projects/*.json / Issues/*.json / Knowledge Articles/*.json / Spec/*.json, with pointers to the bootstrap and operations skills for the JSON:API shapes and Issue invariants the agent must enforce itself.

Comment on lines 391 to 395
return loader.load('system', {
targetRealmUrl: context.targetRealmUrl,
darkfactoryModuleUrl: context.darkfactoryModuleUrl ?? '',
skills,
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c0dfb5a — same fix as the prompt-section comment above. The silent ?? '' fallback is gone. Replaced by requireDarkfactoryModuleUrl(context) in factory-prompt-loader.ts which throws if the field is missing or empty. All three call sites (assembleSystemPrompt, claude-code.ts, openrouter.ts) now go through the helper, so a missing value fails the prompt render with a clear error instead of producing tracker cards with empty adoptsFrom.module. Test fixtures populate the field with a stable URL so the wired guard doesn't break unit tests.

Codex + Copilot reviews flagged three real bugs introduced by the Tier 1
retirement:

1. Stale Claude-side prompt addendum in claude-code.ts still said "do
   NOT use Write or Edit for tracker-schema cards" and listed retired
   tools (update_project / update_issue / create_knowledge /
   create_catalog_spec / add_comment). Drop the entire structured-tool
   block; rewrite the addendum to direct the agent toward native fs for
   every workspace file (including tracker JSON).

2. prompts/system.md Rules section still pointed at the retired
   add_comment tool. Replace with the new comments[]-array workflow.

3. Comment shape doc in software-factory-operations skill used
   `createdAt`, but the Comment field schema in darkfactory.gts is
   `datetime`. Following that guidance produced invalid comment
   objects. Fix to `datetime` and call out the contrast with the Issue
   card's own createdAt/updatedAt fields.

Plus a fail-fast for `darkfactoryModuleUrl`: the prior `?? ''` fallback
silently produced bad `meta.adoptsFrom.module` values when the field
was missing. Add `requireDarkfactoryModuleUrl()` helper that throws,
and use it in claude-code.ts, openrouter.ts, and assembleSystemPrompt.
Test fixtures updated to populate the field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@lukemelia lukemelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there / should there be a tool to get the schema for a card def type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants