diff --git a/packages/software-factory/.agents/skills-orchestrator/software-factory-bootstrap/SKILL.md b/packages/software-factory/.agents/skills-orchestrator/software-factory-bootstrap/SKILL.md new file mode 100644 index 00000000000..8d8319e22d6 --- /dev/null +++ b/packages/software-factory/.agents/skills-orchestrator/software-factory-bootstrap/SKILL.md @@ -0,0 +1,255 @@ +--- +name: software-factory-bootstrap +description: Use when processing a bootstrap issue — covers how to create Project, IssueTracker, KnowledgeArticle, and implementation Issue cards from a brief. +--- + +# Software Factory Bootstrap + +Use this skill when the current issue has `issueType: bootstrap`. Your job is +to read the brief, create project artifacts, and set up the issue backlog for +the implementation phase. + +## How to write tracker-schema cards + +Project, KnowledgeArticle, and Issue cards are plain `.json` files in +the workspace. Use the native `Write` tool with the exact JSON:API +document shape documented below for each card type. + +**The system prompt names the live tracker module URL** (the value you +should put in `data.meta.adoptsFrom.module` for Project / Board / Issue / +KnowledgeArticle cards). Use that URL verbatim — do not try to derive it. + +| File | adoptsFrom.name | +| ---------------------------------- | ------------------ | +| `Projects/.json` | `Project` | +| `Boards/.json` | `IssueTracker` | +| `Knowledge Articles/-*.json` | `KnowledgeArticle` | +| `Issues/-.json` | `Issue` | + +For each card, the document is: + +```json +{ + "data": { + "type": "card", + "attributes": { ... per the live schema (see below) ... }, + "relationships": { ... per the live schema (see below) ... }, + "meta": { + "adoptsFrom": { + "module": "", + "name": "" + } + } + } +} +``` + +### Fetch the live schema before writing + +Do **not** memorize attribute names, enum values, or relationship keys +for these cards — they evolve. Before writing a Project / Issue / +KnowledgeArticle JSON file, call: + +``` +get_card_schema({ module: "", name: "Project" }) +``` + +(and the same for `Issue` / `KnowledgeArticle`). The tool returns the +live `{ attributes, relationships? }` JSON Schema introspected from the +real `CardDef` — including the allowed enum values for fields like +`status` / `priority` / `issueType` / `articleType` / `projectStatus` +and the relationship keys (`project`, `relatedKnowledge`, +`knowledgeBase`, `blockedBy`, etc.). Use the field names, types, and +enums it returns verbatim. Schemas are cached per-process, so repeated +calls are cheap. + +Catalog Spec cards (`Spec/.json`) are different — they adopt from +`https://cardstack.com/base/spec` / `Spec`. Fetch their schema the same +way: `get_card_schema({ module: "https://cardstack.com/base/spec", +name: "Spec" })`. The catalog spec workflow is documented in the +software-factory-operations skill. + +## Naming Conventions + +Derive names from the brief title: + +- **slug**: lowercase, replace non-alphanumeric runs with hyphens, strip leading/trailing hyphens + - `"Sticky Note"` → `sticky-note` +- **projectCode**: 2-4 uppercase initials from the title words + - `"Sticky Note"` → `SN` + - `"Employee Handbook"` → `EH` + - `"Customer Relationship Manager"` → `CRM` (first 3-4 words) + +## Card Authoring Guidance + +The field/relationship shapes for each card type are fetched at runtime +via `get_card_schema` (see "Fetch the live schema before writing" +above). This section covers what is **not** in the schema: what to put +in those fields and how to organize the bootstrap output. + +### Project Card + +**Path:** `Projects/.json` +**adoptsFrom.name:** `Project` + +Fetch the schema, then populate the attributes from the brief: + +- The project-name attribute → the brief's title (e.g. `"Sticky Note"`). +- The project-code attribute → 2–4 uppercase initials from the title (e.g. `"SN"`). +- The objective / scope / technical-context / success-criteria attributes → derive from the brief content. Use markdown. +- The status attribute → use one of the enum values returned by the schema for an active project (typically the "active" / starting state — the schema's enum is the source of truth, never guess). + +**Relationships:** the schema names the array relationship that links a +project to its knowledge articles. Populate one entry per article you +create (paths like `../Knowledge Articles/-`). + +- `board` → `{ links: { self: "../Boards/" } }` +- `knowledgeBase.0` → `{ links: { self: "../Knowledge Articles/-" } }` (one entry per article) + +### IssueTracker Card + +**Path:** `Boards/.json` +**adoptsFrom:** `{ module: "", name: "IssueTracker" }` + +Create one board per bootstrapped project. It is the canonical board for that +project's issues and should be linked both ways with the Project card. + +| Field | Type | Example | +| ------------------ | ------- | ----------------- | +| `boardTitle` | String | `" Board"` | +| `hideEmptyColumns` | Boolean | `false` | + +**Relationships:** + +- `project` → `{ links: { self: "../Projects/<slug>" } }` + +### KnowledgeArticle Card + +**Paths:** `Knowledge Articles/<slug>-<article-slug>.json` (as many as needed) +**adoptsFrom.name:** `KnowledgeArticle` + +Always create at least two articles: + +- **Brief Context** (`<slug>-brief-context`) — full brief content and background. +- **Agent Onboarding** (`<slug>-agent-onboarding`) — how to work on this project. + +Add more as the brief warrants (e.g., detailed visual design, deep +domain knowledge). Keep each article cohesive with a clear guiding +principle. Use the `articleType` enum values returned by the schema to +classify each one (e.g., one for context, one for onboarding); use the +schema's enum literally. + +### Issue Card — Organized by Entry-Point Card + +**Paths:** `Issues/<slug>-<card-name-slug>.json` (one per entry-point card, named after the card) +**adoptsFrom.name:** `Issue` + +Organize implementation issues around **entry-point cards** — the top-level cards users interact with directly and that should be discoverable in the catalog. Create **one issue per entry-point card**, named after that card. + +Each issue covers the full scope of its entry-point card: + +- Card definition (`.gts`) and any interior/support cards it depends on +- QUnit tests (`.test.gts`) for the entry-point card **and** all its support cards +- Catalog Spec (`Spec/<card-name>.json`) with realistic example instances via `linkedExamples` + +Interior cards (field cards, helper cards, linked supporting types) are +implemented as part of their entry-point card's issue. They need tests +but do **not** need their own catalog specs or separate issues. + +**Populating attributes** (consult the schema for the exact field names and enum values): + +- The issue-id attribute → `"<projectCode>-<N>"` (sequential). +- The summary attribute → `"Implement <card name> card"`. +- The description attribute → markdown describing the card to create, its fields, support cards, tests, spec, and examples. **Immutable after creation** — see Issue Invariants below. +- The issue-type / status / priority attributes → use the enum values returned by the schema. For a fresh bootstrap, all issues start in the "backlog" state with issue-type "feature". Mark the first issue as the highest non-critical priority and the rest a step lower. +- An `order` field (sequential integer 1, 2, 3, …) for the scheduler. +- The acceptance-criteria attribute → markdown checklist: card def, support cards, tests, spec, examples. +- Timestamp fields → ISO timestamps. Note that on Issues these are top-level attributes (e.g. `createdAt` / `updatedAt`) — distinct from the timestamp field inside individual `comments[]` entries (which is named `datetime`, see operations skill). The schema is the source of truth. + +**Relationships for each issue** (the schema names the keys): + +- A `project` link → `../Projects/<slug>`. +- Knowledge-article links — one entry per knowledge article you want loaded into the agent's context (typically the brief-context and agent-onboarding articles you created above). +- A blocked-by relationship for any issues that must complete first. + +**Dependency ordering:** If one entry-point card depends on another +(e.g., card B uses card A as a field type or linked card), order the +issues so the depended-upon card is implemented first. Set `order` +values accordingly (dependency-free cards get lower order numbers) and +wire the blocked-by relationship so consuming cards cannot start until +their dependencies are done. + +If the brief describes only one entry-point card, create one issue. If it describes multiple, create one per entry-point card ordered so dependency cards come first. + +## Issue Invariants — read carefully + +The orchestrator depends on three rules about Issue cards. Before this +skill rewrite they were enforced by a wrapper tool that stripped / +ignored disallowed fields automatically. Now that you write the JSON +directly, you must enforce them yourself: + +1. **`description` is immutable after creation.** Never modify an + Issue's `description` once the card exists. To add post-creation + context (blocked reasons, validation failures, progress notes), use + the `comments` array instead — see "Adding a comment to an existing + issue" in the operations skill. +2. **`status` transitions are restricted to the agent.** You may set + `status` to `"blocked"` (cannot proceed) or `"backlog"` (unblock). + Never set `status` to `"done"` or `"in_progress"` — those are owned + by the orchestrator based on `signal_done` + validation results. +3. **Read before write for updates.** When updating an existing Issue + (or any tracker card), `Read` the file first, modify only the + attributes you intend to change, then `Write` (or `Edit`) the merged + document back. Do not overwrite the whole file with only the new + fields — you'll silently drop existing attributes the file had. + +## Why Relationships Matter + +The `project` and `relatedKnowledge` relationships on implementation issues are +how the orchestrator loads context for the agent. When the agent picks up an +issue, `ContextBuilder.buildForIssue()` traverses these relationships to +load the Project card and Knowledge Articles into the agent's context. Without +these relationships, the agent would have no project scope or brief content. + +## Document Envelope + +All three card types share the same JSON:API envelope — only the +`attributes`, `relationships`, and `adoptsFrom.name` differ. The +attribute names, enum values, and relationship keys come from the +schema you fetched with `get_card_schema`; the envelope is fixed: + +```json +{ + "data": { + "type": "card", + "attributes": { + // populate per the schema returned by: + // get_card_schema({ module: "<darkfactoryModuleUrl>", name: "Issue" }) + }, + "relationships": { + // each relationship key from the schema points at a sibling + // card via a relative path; arrays use indexed keys (key.0, key.1, …) + "<project-relationship-key>": { + "links": { "self": "../Projects/<slug>" } + } + }, + "meta": { + "adoptsFrom": { + "module": "<darkfactoryModuleUrl from system prompt>", + "name": "Issue" + } + } + } +} +``` + +Use relative paths (`../`) for `links.self` since cards live in sibling +directories within the workspace. The same envelope applies to Project +and KnowledgeArticle — only the `adoptsFrom.name` and the +schema-derived attributes/relationships change. + +## Completion + +After creating all artifacts, call `signal_done()`. The orchestrator manages +issue status transitions — do NOT set the issue status to "done" yourself. +The orchestrator will mark the issue as done after validation passes. diff --git a/packages/software-factory/.agents/skills-orchestrator/software-factory-operations/SKILL.md b/packages/software-factory/.agents/skills-orchestrator/software-factory-operations/SKILL.md new file mode 100644 index 00000000000..bb9ed5d37a9 --- /dev/null +++ b/packages/software-factory/.agents/skills-orchestrator/software-factory-operations/SKILL.md @@ -0,0 +1,478 @@ +--- +name: software-factory-operations +description: Use when implementing cards in a target realm through the factory execution loop — covers the tool-use workflow for searching, writing, testing, and updating issues via factory tools. +--- + +# Software Factory Operations + +Use this skill when operating inside the factory execution loop. Workspace +files (card definitions, instances, tests) live in a **local workspace +mirror of the target realm** that the orchestrator syncs back to the realm +between iterations. Realm-side operations (search, host commands, runtime +validators) and control signals are **factory tools** the agent invokes +directly. + +## Realm Roles + +- **Source realm** (`packages/software-factory/realm`) + Publishes shared modules, briefs, templates, and tracker schema. Never write to this realm. +- **Target realm** (user-specified, passed to `factory:go`) + Receives all generated artifacts: Project, Issue, KnowledgeArticle, card definitions, card instances, Catalog Spec cards, and QUnit test files. + +## Workspace files (local mirror of target realm) + +The agent's working directory is the workspace — the local mirror of the +target realm that the orchestrator syncs back between iterations. Use the +**native** `Read`, `Write`, `Edit`, `Glob`, `Grep`, and `Bash` tools on +these files; the workspace `cwd` is set for you, so realm-relative paths +resolve directly. + +These files live in the workspace: + +- Card definitions: `*.gts` +- Card tests: `*.test.gts` +- Content card instances under `<CardType>/<id>.json` (the user data the + cards represent — e.g. `StickyNote/note-1.json`) +- Tracker-schema cards: `Projects/<slug>.json`, `Issues/<slug>.json`, + `Knowledge Articles/<slug>.json`, `Spec/<slug>.json` + +`Bash` is also available for `boxel` CLI commands: + +- Read-only inspection: `boxel status`, `boxel history`, `boxel search`, + `boxel read-transpiled`. +- `boxel run-command` — dispatches to whatever host command you specify. + Most specifiers are read-only inspection commands (`get-card-type-schema`, + `evaluate-module`, `instantiate-card`), but the surface itself is generic; + treat it as "as safe as the named command." + +See the **Realm-side reads** section below for the full usage. + +**Inspect before writing.** Read or grep the file you plan to change, and +glob for sibling files (e.g. existing card definitions in the same +directory) before creating new ones. + +## Tracker-schema cards — write JSON directly + +Project, IssueTracker, Issue, KnowledgeArticle, and Spec cards are plain +`.json` files in the workspace. Use `Write` to create them; to update one, +`Read` it, then either `Edit` the relevant attributes or `Write` the +merged document back — same workspace fs surface as `.gts` files. + +| File path | adoptsFrom | +| -------------------------------- | -------------------------------------------------------------- | +| `Projects/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "Project"}` | +| `Boards/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "IssueTracker"}` | +| `Issues/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "Issue"}` | +| `Knowledge Articles/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "KnowledgeArticle"}` | +| `Spec/<slug>.json` | `{module: "https://cardstack.com/base/spec", name: "Spec"}` | + +`<darkfactoryModuleUrl>` is named in the system prompt — use that value +verbatim. + +**Always fetch the live schema before writing.** Field names, enum values, +and relationship keys for each card type are introspected at runtime — +never hard-coded in this skill. Call +`get_card_schema({ module, name })` for the card you're about to write +and use the returned `{ attributes, relationships? }` JSON Schema to shape +the document. The bootstrap skill covers the bootstrap-specific attribute +population guidance; this skill covers the operational patterns +(read-before-write, comments, invariants) that layer on top. + +**Read before write.** When updating any tracker card, `Read` the file +first, change only the attributes you intend to update, then write the +merged document back. Don't overwrite the whole file with only your new +fields — you'll silently drop the existing attributes. + +**Issue invariants you must enforce yourself** (these used to be enforced +by a wrapper tool; they aren't anymore): + +- **`description` is immutable** after the issue is created. If you need + to add context — blocked reasons, progress notes, validation failures, + clarification requests — append to the `comments` array instead. See + "Adding a comment to an existing issue" below. +- **Status transitions are restricted.** You may set `status` to + `"blocked"` (cannot proceed) or `"backlog"` (unblock). Never set + `status` to `"done"` or `"in_progress"` — those are owned by the + orchestrator based on `signal_done` + validation results. + +### Adding a comment to an existing issue + +Issue cards carry a containsMany comments array on `attributes`. To +append a comment: + +1. Call `get_card_schema({ module: "<darkfactoryModuleUrl>", name: "Issue" })` + if you don't already have the Issue schema cached. The comments array + entry is itself an object with its own field shape — use the field + names returned by the schema (the body / author / timestamp fields) + verbatim. The timestamp field on a comment is **not** the same as the + Issue's own top-level `createdAt` / `updatedAt` attributes; the schema + disambiguates them. +2. `Read` the issue's `.json`. +3. Append a new entry to the comments array on `data.attributes`, + populating the body (markdown comment text), the author (e.g. + `"factory-agent"` or `"orchestrator"`), and the comment-timestamp + field (ISO timestamp). +4. `Write` (or `Edit`) the document back. **Do not modify the description + or any other attribute** — comments are append-only. + +### Catalog Spec card shape + +Spec cards (`Spec/<slug>.json`) adopt from +`https://cardstack.com/base/spec` / `Spec`, **not** from the tracker +module. Fetch the live schema before writing: + +``` +get_card_schema({ module: "https://cardstack.com/base/spec", name: "Spec" }) +``` + +Use the returned `{ attributes, relationships? }` to shape the document. +What the schema does **not** tell you and you must supply yourself for +entry-point cards: + +- A display title and short description suitable for the catalog. +- The spec-type field set to the enum value the schema returns for + card-style specs (vs. apps, fields, etc.). +- A code-ref attribute pointing at your `.gts` definition, formatted as + `{module: "../<slug>", name: "<PascalClass>"}` (relative path, no + `.gts` extension) so the spec resolves the definition relative to + itself. +- A markdown usage guide for the catalog page. +- A linked-examples relationship populated with one or more sample + instances: + + ```json + "<linked-examples-key>": [ + { "links": { "self": "../<CardType>/<instance-id>" } }, + ... + ] + ``` + + (The schema names the relationship key.) + +The full document envelope is the same as for tracker cards (`data` / +`type: "card"` / `attributes` / `relationships` / `meta.adoptsFrom`), +just with the `https://cardstack.com/base/spec` adoptsFrom. + +## Realm-side reads (via `boxel` CLI) + +For operations that need to reach the realm runtime — searching the +indexed cards, fetching transpiled JS, running host commands — shell out +via `Bash` to the `boxel` CLI. These never go through the workspace fs. + +- **Search the target realm** for cards using a structured query + (filter, sort, page). Use this to check for existing cards, find + duplicates, or inspect project state. + ``` + boxel search --realm <target-realm-url> --query '<json>' --json + ``` + Single-quote the entire JSON object so the shell does not expand or + split it; keep keys and string values double-quoted inside. Pipe + through `jq` to project. **For the full query syntax (filter / eq / + contains / range / every / any / not / sort / page, CodeRef matching, + common mistakes) see the `boxel-api` skill.** +- **Fetch the transpiled JavaScript** for a `.gts` module — used only + when an eval/instantiate error reports a line/column number, since + those numbers reference the transpiled output, not your `.gts` source. + ``` + boxel read-transpiled <realm-relative-path> --realm <target-realm-url> + ``` + The `.gts` extension is optional. Pipe through `sed -n '<line>p'` (or + wrap with `awk`) to inspect a single line. See the **Debugging + Runtime Evaluation Errors** section below for when to reach for this. +- **Run any other host command** in the realm's prerendered runtime + (module evaluation, card instantiation, anything else exposed at + `@cardstack/boxel-host/commands/<name>/default`): + ``` + boxel run-command <command-specifier> --realm <target-realm-url> --input '<json>' --json + ``` + Most agent tasks won't need this — the validators below already wrap + the common host commands. See the `boxel-command` skill for the + programmatic surface and failure modes. + +### Fetching live card-type schemas + +`get_card_schema({ module, name })` returns the live JSON Schema +(`{ attributes, relationships? }`) for any `CardDef`, introspected from +the actual class via the realm server's prerenderer (the same path the +AI Bot uses for its patch-tool schemas). Always call this before writing +a tracker card (Project / Issue / KnowledgeArticle), a Spec card, or any +other card whose shape you need to know. Schemas are cached per-process, +so repeated calls with the same code ref are free. + +## Self-Validation (optional, in-memory results) + +All five validators are factory tools, safe to call repeatedly mid-turn. +They return in-memory result objects and **do not persist any durable +validation cards** — the orchestrator still runs the full validation +pipeline (which persists `TestRun` / `LintResult` / `ParseResult` / +`EvalResult` / `InstantiateResult` cards) after `signal_done`, so calling +any of these mid-turn is optional. + +**Side effect to know about:** the realm-touching validators +(`run_evaluate`, `run_instantiate`, `run_tests`) sync your workspace to +the realm before invoking the prerenderer, so they push whatever you've +just written. That's the same write the orchestrator's between-iteration +sync would have done — it's not destructive, but it does mean calling +these tools is the moment your local writes hit the realm. The lighter +validators (`run_lint`, `run_parse`) run entirely in-process and don't +touch the realm. + +- `run_lint({ path? })` — Run ESLint + Prettier (with `@cardstack/boxel` + rules) and return an in-memory `RunLintResult` with `status`, + `filesChecked`, `filesWithErrors`, `errorCount`, `warningCount`, + `durationMs`, `lintableFiles`, and per-violation `{ rule, file, line, +column, message, severity }`. Without `path`, lints every `.gts` / + `.gjs` / `.ts` / `.js` file in the target realm. With `path` + (realm-relative file path), lints **only that one file** — prefer this + right after writing or editing a single file. +- `run_tests()` — Run the realm's QUnit suite and receive an in-memory + result object `{ status, passedCount, failedCount, skippedCount, +durationMs, testFiles, failures, errorMessage? }`. Use it when you + want feedback before signalling done. +- `run_parse({ path? })` — Parse and type-check files in the target + realm and return an in-memory `RunParseResult` with `status`, + `filesChecked`, `filesWithErrors`, `errorCount`, `durationMs`, + `parseableFiles`, and per-error `{ file, line, column, message }`. + Without `path`, runs glint (ember-tsc) over every `.gts` / `.gjs` / + `.ts` file in the realm AND validates every `.json` file listed as a + Spec `linkedExample` (same discovery as the parse validation step). + With `path` (realm-relative file path), parses **only that one file** + — `.gts` / `.gjs` / `.ts` runs through glint; `.json` is parsed and + checked for card document structure. The extension is required; + `parseableFiles` entries are always returned in the `.json` / `.gts` + / `.gjs` / `.ts` form, so you can feed any of them straight back into + `path`. Prefer the single-file form right after writing or editing one + file. +- `run_evaluate({ path? })` — Evaluate ESM modules (`.gts` / `.gjs` / + `.ts` / `.js`) in the target realm via the prerenderer sandbox and + return a `RunEvaluateResult` (status, module counts, per-failure + `{ path, error, stackTrace? }`). Without `path`, evaluates every + non-test evaluable module. With `path`, evaluates only that single + realm-relative file — handy for a quick self-check right after writing + one module. Test files (`*.test.*`) are rejected — the test runner + validates those. The tool bound-polls past the brief read-after-write + window where the realm has the source on disk but indexing hasn't + populated the module map yet, so a returned failure is a real failure + — don't retry on the agent side. When a failure reports a line/column, + those numbers refer to the transpiled module — pair with + `boxel read-transpiled` (see Realm-side reads above) to locate the + offending source construct, then fix the `.gts` source (never copy + transpiled patterns back into source). +- `run_instantiate({ path? })` — Instantiate card example instances in + the target realm via the prerenderer sandbox and return a + `RunInstantiateResult` (status, instance counts, per-failure `{ path, +cardName, error, stackTrace? }`). Without `path`, searches the realm + for Spec cards and instantiates every `linkedExample` on every + card/app Spec; specs with no `linkedExamples` still get a bare + instantiation to exercise the card class. With `path`, instantiates + only that single realm-relative `.json` example file — its + `meta.adoptsFrom` supplies the module + card name, and spec discovery + is skipped entirely so you can self-check one instance in isolation. + The `path` argument must end in `.json`. `instanceFiles` only contains + real `.json` example paths (bare-instantiation fallbacks are filtered + out) so any entry can be fed straight back into `path`. If a bare + instantiation fails, its failure entry has `path: ''` and a populated + `cardName` — identify the spec by `cardName` and do NOT pass the empty + path back into `path`. The tool bound-polls past the brief + read-after-write window where the realm has the source on disk but + indexing hasn't populated the module map yet, so a returned failure + is a real failure — don't retry on the agent side. When a failure + reports a line/column, those numbers refer to the transpiled module — + pair with `boxel read-transpiled` (see Realm-side reads above) to + locate the offending source construct, then fix the `.gts` source + (never copy transpiled patterns back into source). + +## Control Flow + +- `signal_done()` — Signal that the current issue is complete. Call this + only after all implementation and test files have been written. +- `request_clarification({ message })` — Signal that you cannot proceed + and need human input. Describe what is blocking. + +## Required Flow + +1. **Inspect before writing.** Search the target realm for existing + cards (`boxel search --realm <url> --query '<json>'` via `Bash` — + see Realm-side reads above, with full syntax in the `boxel-api` + skill). Read or grep the workspace files you plan to change (or + sibling files in the same directory) before creating or modifying + anything. +2. **Write card definitions** (`.gts`) into the workspace. +3. **Write `.test.gts` test files** co-located with card definitions. + Every issue must have at least one test file. **Write tests + immediately after the card definition, before any instances or + catalog specs.** +4. **Write card instances** (`.json`) into the workspace. +5. **Write a Catalog Spec card** (`Spec/<card-name>.json`) — adoptsFrom + `https://cardstack.com/base/spec` / `Spec`. Link sample instances via + `relationships.linkedExamples`. +6. **(Optional) Call `run_tests()`** to self-validate before signalling + done. This returns test results in-memory without writing any realm + artifacts. Iterating on your own work with `run_tests` is faster than + round-tripping through the orchestrator pipeline. +7. **Call `signal_done()`** when all implementation and test files are + written. The orchestrator runs the full validation pipeline (which + persists a `TestRun` card, among other artifacts) automatically after + this. +8. **If tests fail**, the orchestrator feeds failure details back. + Re-read the affected workspace files, fix them, and call + `signal_done()` again. +9. **Record progress** by appending to the issue's `comments` array + (Read + Edit the issue JSON). Never modify the issue's `description`. + +## Target Realm Artifact Structure + +``` +target-realm/ +├── card-name.gts # Card definition +├── card-name.test.gts # QUnit test (co-located) +├── CardName/ +│ └── sample-instance.json # Card instance +├── Spec/ +│ └── card-name.json # Catalog Spec card +├── Validations/ +│ ├── test_issue-slug-1.json # TestRun card (test results) +│ └── lint_issue-slug-1.json # Lint result card +├── Projects/ +│ └── project-name.json # Project card +├── Issues/ +│ └── issue-slug.json # Issue card +└── Knowledge Articles/ + └── article-name.json # KnowledgeArticle card +``` + +## Debugging Runtime Evaluation Errors + +Eval-step and instantiate-step validation failures surface line/column +references that point to the **transpiled** JavaScript output, not the +`.gts` source you wrote. The realm compiles `.gts` to JS before execution +and runtime errors reference the compiled output. + +When a validation error contains text like +`(error occurred in '/.../sticky-note.gts' @ line 66 : column 32)`, the +line number is for the transpiled module. Fetch the transpiled output +and read the reported line to see what compiled construct raised the +error — then reason back to the `.gts` source construct that produced +it. + +``` +boxel read-transpiled sticky-note.gts --realm <target-realm-url> +``` + +Pipe through `sed -n '60,70p'` (or similar) to focus on a window around +the reported line. + +For example, `" is not a valid character within attribute names: (error +occurred in '/.../sticky-note.gts' @ line 66 : column 32)` typically +points inside a `precompileTemplate(...)` block in the transpiled +output. The actual fault in the source is often in a CSS comment or a +template expression — line 66 in your `.gts` source is unrelated. +Reading the transpiled line is what connects the error back to the +source. + +### The transpiled output is for DEBUGGING ONLY — never for implementation + +**Scope:** the transpiled fetch (`boxel read-transpiled`) is only for +investigating **runtime errors in `.gts` modules you have already +written** — when an eval or instantiate validation failure points to a +line/column in the transpiled output and you need to map that +coordinate back to your source. It is not for learning how to write +cards, not for understanding Boxel patterns, and not a general +reference. + +- **Do not copy patterns, imports, or shapes from the transpiled output + into your `.gts` source.** The transpiler emits artifacts like + `setComponentTemplate(...)`, `precompileTemplate(...)`, wire-format + template arrays, base64 CSS imports (`./file.gts.CiAg...`), and other + compiler internals. None of those belong in source code. +- **Do not write `.gts` that "looks like" the compiled JS.** Always + write clean, idiomatic Ember / `<template>`-tag / CardDef / FieldDef + source. If you find yourself tempted to hand-write a + `setComponentTemplate(...)` call or a wire-format template, stop — + you're modeling the wrong layer. +- **Always edit the `.gts` source, never the transpiled output.** The + realm regenerates the transpiled JS on every write, so any edit + there is silently discarded. +- **When in doubt, favor idiomatic card development practices.** The + `boxel-development` skill and existing cards in the target realm are + the right references — not what the compiler happens to emit. + +Use the transpiled fetch the way a developer uses a source map: to +translate a runtime line number back to a source construct in the code +**you wrote**, then close the transpiled view and fix the source +idiomatically. + +## Writing QUnit Card Tests + +Test files are `.test.gts` files co-located with card definitions in the +target realm. Each test file exports a `runTests()` function that +registers QUnit modules and tests. + +### Example Test + +```typescript +// sticky-note.test.gts — co-located with sticky-note.gts +import { module, test } from 'qunit'; +import { setupCardTest } from '@cardstack/host/tests/helpers'; +import { renderCard } from '@cardstack/host/tests/helpers/render-component'; +import { getService } from '@universal-ember/test-support'; + +let cardModuleUrl = new URL('./sticky-note', import.meta.url).href; + +export function runTests() { + module('StickyNote', function (hooks) { + setupCardTest(hooks); + + test('renders title in fitted view', async function (assert) { + let loader = getService('loader-service').loader; + let { StickyNote } = await loader.import(cardModuleUrl); + let note = new StickyNote({ title: 'Test Note', body: 'Hello' }); + await renderCard(loader, note, 'fitted'); + assert.dom('[data-test-title]').hasText('Test Note'); + }); + }); +} +``` + +### Key Points + +- Tests are `.test.gts` files co-located with the card definition (e.g., + `sticky-note.gts` and `sticky-note.test.gts`) +- Each test file must export a `runTests()` function +- Use `import.meta.url` to resolve card definitions relative to the test + file — never hardcode realm URLs +- Use `setupCardTest(hooks)` for rendering context, then + `renderCard(loader, card, format)` for DOM assertions +- No external realm writes during tests — all test data lives in browser + memory +- Use `data-test-*` attributes for DOM selectors when testing rendered + output +- Use QUnit assertions: `assert.dom()`, `assert.strictEqual()`, + `assert.ok()` +- **Never use `QUnit.skip()` or `QUnit.todo()`.** All tests must + actually execute. Skipped/todo tests are flagged as `skipped` in the + TestRun card and treated as a failure when no tests actually ran. The + orchestrator will reject a TestRun where every test is skipped. + +## Important Rules + +- **Never write to the source realm.** All generated artifacts go to the + target realm via the workspace mirror. +- **Stay inside the workspace.** Workspace fs operations are scoped to + the local mirror of the target realm. Use realm-relative paths + (`sticky-note.gts`, `StickyNote/note-1.json`) — never absolute paths + outside the workspace, never the user's home directory, never the + source realm. +- **Don't drive sync yourself.** The orchestrator owns `boxel sync` / + `boxel push`. Read-only `boxel` commands (`boxel status`, + `boxel history`) are fine for inspection, but never run sync, push, + or any command that mutates the realm directly. +- **Write source code, not compiled output.** When writing `.gts` files, + write clean idiomatic source — never compiled JSON blocks or base64- + encoded content. +- **Use absolute `adoptsFrom.module` URLs** when referencing definitions + that live in a different realm (e.g., the source realm's tracker + schema). +- **Start small and iterate.** Write the smallest working implementation + first, then add the test. If tests fail, read the failure output + carefully before making targeted fixes. diff --git a/packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md b/packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md index 8d8319e22d6..156d1c446bb 100644 --- a/packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md +++ b/packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md @@ -1,23 +1,136 @@ --- name: software-factory-bootstrap -description: Use when processing a bootstrap issue — covers how to create Project, IssueTracker, KnowledgeArticle, and implementation Issue cards from a brief. +description: Use when processing a bootstrap Issue inside an interactive Claude Code session — read the brief, create the target realm if needed, and write the Project, IssueTracker, Knowledge Article, and implementation Issue cards into the workspace. Pair with `software-factory-scheduling` (status transitions) and `software-factory-operations` (the per-issue workflow that picks up after bootstrap completes). --- # Software Factory Bootstrap -Use this skill when the current issue has `issueType: bootstrap`. Your job is -to read the brief, create project artifacts, and set up the issue backlog for -the implementation phase. +Use this skill when the Issue you just picked up has +`attributes.issueType: "bootstrap"`. Your job is to read the brief, +make sure the target realm exists, and seed the project artifacts +that drive the rest of the run: a Project card, an IssueTracker +board, Knowledge Article cards with the brief context, and one +implementation Issue per entry-point card the brief describes. + +## When you reach this skill + +The user has handed you (or the seed Issue carries) two URLs: + +- A **brief URL**, e.g. + `http://localhost:4201/software-factory/Wiki/sticky-note`. The + brief is a card in the source realm. +- A **target realm URL**, e.g. + `http://localhost:4201/<username>/<realm-name>/`. The target may + or may not exist yet. + +You start in `packages/software-factory/` (where Claude Code was +launched so the `.claude/skills` symlink is picked up). The local +workspace mirror of the target realm is **not** your cwd at +session start — you create it during bootstrap as a fresh temp +directory and `cd` into it. See "Set up the workspace" below. + +## First: verify `boxel` is installed + +The user is expected to have `@cardstack/boxel-cli` installed +(see the runbook prerequisites). Verify it's on PATH and exposes +the commands this skill needs: + +```bash +boxel --version +help_output="$(boxel --help)" +for cmd in lint parse test; do + echo "$help_output" | grep -qE "^[[:space:]]+$cmd[[:space:]]" || { + echo "boxel --help is missing the \`$cmd\` subcommand." + echo "Ask the user to install or upgrade @cardstack/boxel-cli:" + echo " pnpm i -g @cardstack/boxel-cli" + exit 1 + } +done +``` -## How to write tracker-schema cards +If verification fails, **stop and report**. Don't try to install +`boxel` yourself. + +## Creating the target realm + +If the target realm does not already exist, create it before +writing anything into the workspace. Confirm by attempting to read +the realm or its `_mtimes` endpoint; a 404 means "not yet created." + +The realm-creation command is a native boxel-cli subcommand +(not `boxel run-command`): + +```bash +boxel realm create <realm-name> "<Display Name>" +# e.g. boxel realm create factory-test-stickynote "Factory Test Sticky Note" +``` + +**`<realm-name>` is just the realm's slug** — must match +`^[a-z0-9-]+$` (lowercase letters, numbers, hyphens). **Do not pass +a path with a slash** (e.g. `user/my-realm`); the regex will reject +it. The realm server prepends the user-namespace segment +automatically based on the active profile's identity. Given the +target realm URL the user wants, derive the slug from the final +path segment: + +- target URL: `http://localhost:4201/user/factory-test-stickynote-2/` +- realm-name to pass: `factory-test-stickynote-2` +- server returns: `http://localhost:4201/user/factory-test-stickynote-2/` + +You may see a warning like +`⚠️ Detected local realm directories at legacy local paths`. +That's an informational notice about an unrelated directory layout +issue in your cwd — it does **not** affect realm creation. Ignore +it unless the command itself exits non-zero. + +See the `realm-sync` skill for the full surface (auth, icon URL, +etc.). + +## Set up the workspace + +Once the realm exists, create a fresh temp directory as the local +workspace mirror, pull the (empty) realm into it, and `cd` so +realm-relative paths resolve. After this step, every subsequent +file operation in this skill (and in `software-factory-operations`) +runs from inside the workspace: + +```bash +WORKSPACE="$(mktemp -d)" +boxel realm pull <target-realm-url> "$WORKSPACE" +cd "$WORKSPACE" +pwd # confirm cwd is the temp workspace +``` + +A freshly-created realm is empty, so the pull is a no-op except to +establish the local-dir ↔ realm mapping. All subsequent writes +happen in the workspace and propagate via +`boxel realm push <local-dir> <realm-url>` when you sync. + +## Discover the tracker module URL + +The tracker schema (Project / IssueTracker / Issue / +KnowledgeArticle) is published by the source realm at +**`<realm-server-origin>/software-factory/darkfactory`**. Build the +URL from the target realm's origin and confirm it's reachable: -Project, KnowledgeArticle, and Issue cards are plain `.json` files in -the workspace. Use the native `Write` tool with the exact JSON:API -document shape documented below for each card type. +```bash +boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef": {"module": "<tracker-module-url>", "name": "Project"}}' +``` + +If that returns a schema, you have the right URL. If it 404s, check +that the source realm is published at the same realm server (the +brief the user gave you is in a realm on the same origin). -**The system prompt names the live tracker module URL** (the value you -should put in `data.meta.adoptsFrom.module` for Project / Board / Issue / -KnowledgeArticle cards). Use that URL verbatim — do not try to derive it. +Cache the tracker module URL for the rest of the bootstrap — every +tracker card you write references it in `meta.adoptsFrom.module`. + +## How to write tracker-schema cards + +Project, IssueTracker, KnowledgeArticle, and Issue cards are plain +`.json` files in the workspace. Use the native `Write` tool with the +JSON:API document shape below. | File | adoptsFrom.name | | ---------------------------------- | ------------------ | @@ -26,68 +139,73 @@ KnowledgeArticle cards). Use that URL verbatim — do not try to derive it. | `Knowledge Articles/<slug>-*.json` | `KnowledgeArticle` | | `Issues/<slug>-<card-slug>.json` | `Issue` | -For each card, the document is: +For each card, the document envelope is: ```json { "data": { "type": "card", - "attributes": { ... per the live schema (see below) ... }, - "relationships": { ... per the live schema (see below) ... }, + "attributes": { + /* per the live schema (see below) */ + }, + "relationships": { + /* per the live schema (see below) */ + }, "meta": { "adoptsFrom": { - "module": "<darkfactoryModuleUrl from system prompt>", - "name": "<Project | Issue | KnowledgeArticle>" + "module": "<tracker-module-url>", + "name": "Project" } } } } ``` -### Fetch the live schema before writing +### Fetch the live schema before writing each tracker card -Do **not** memorize attribute names, enum values, or relationship keys -for these cards — they evolve. Before writing a Project / Issue / -KnowledgeArticle JSON file, call: +Do **not** memorize attribute names, enum values, or relationship +keys for these cards — they evolve. Before writing a Project / +IssueTracker / Issue / KnowledgeArticle JSON, call: -``` -get_card_schema({ module: "<darkfactoryModuleUrl>", name: "Project" }) +```bash +boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef": {"module": "<tracker-module-url>", "name": "Project"}}' ``` -(and the same for `Issue` / `KnowledgeArticle`). The tool returns the -live `{ attributes, relationships? }` JSON Schema introspected from the -real `CardDef` — including the allowed enum values for fields like -`status` / `priority` / `issueType` / `articleType` / `projectStatus` -and the relationship keys (`project`, `relatedKnowledge`, -`knowledgeBase`, `blockedBy`, etc.). Use the field names, types, and -enums it returns verbatim. Schemas are cached per-process, so repeated -calls are cheap. +(Repeat per card type.) The schema returns the live +`{ attributes, relationships? }` JSON Schema introspected from the +actual `CardDef`, including the allowed enum values for fields like +`status` / `priority` / `issueType` / `articleType` / +`projectStatus`, and the relationship keys (`project`, +`relatedKnowledge`, `knowledgeBase`, `blockedBy`, etc.). Use the +field names, types, and enums it returns verbatim. Schemas are +cached on the realm server; repeated calls are cheap. -Catalog Spec cards (`Spec/<slug>.json`) are different — they adopt from -`https://cardstack.com/base/spec` / `Spec`. Fetch their schema the same -way: `get_card_schema({ module: "https://cardstack.com/base/spec", -name: "Spec" })`. The catalog spec workflow is documented in the -software-factory-operations skill. +Catalog Spec cards (`Spec/<slug>.json`) are different — they adopt +from `https://cardstack.com/base/spec` / `Spec`. Spec authoring is +covered in the `software-factory-operations` skill. -## Naming Conventions +## Naming conventions Derive names from the brief title: -- **slug**: lowercase, replace non-alphanumeric runs with hyphens, strip leading/trailing hyphens +- **slug**: lowercase, replace non-alphanumeric runs with hyphens, + strip leading/trailing hyphens. - `"Sticky Note"` → `sticky-note` -- **projectCode**: 2-4 uppercase initials from the title words +- **projectCode**: 2–4 uppercase initials from the title words. - `"Sticky Note"` → `SN` - `"Employee Handbook"` → `EH` - - `"Customer Relationship Manager"` → `CRM` (first 3-4 words) + - `"Customer Relationship Manager"` → `CRM` -## Card Authoring Guidance +## Card authoring guidance -The field/relationship shapes for each card type are fetched at runtime -via `get_card_schema` (see "Fetch the live schema before writing" -above). This section covers what is **not** in the schema: what to put -in those fields and how to organize the bootstrap output. +The field/relationship shapes for each card type come from +`get-card-type-schema`. This section covers what is **not** in the +schema: what to put in those fields and how to organize the +bootstrap output. -### Project Card +### Project card **Path:** `Projects/<slug>.json` **adoptsFrom.name:** `Project` @@ -97,126 +215,162 @@ Fetch the schema, then populate the attributes from the brief: - The project-name attribute → the brief's title (e.g. `"Sticky Note"`). - The project-code attribute → 2–4 uppercase initials from the title (e.g. `"SN"`). - The objective / scope / technical-context / success-criteria attributes → derive from the brief content. Use markdown. -- The status attribute → use one of the enum values returned by the schema for an active project (typically the "active" / starting state — the schema's enum is the source of truth, never guess). +- The status attribute → use the enum value the schema returns for an active project (typically the "active" or starting state — the schema's enum is the source of truth, never guess). -**Relationships:** the schema names the array relationship that links a -project to its knowledge articles. Populate one entry per article you -create (paths like `../Knowledge Articles/<slug>-<article-slug>`). +**Relationships:** the schema names the array relationship that +links a project to its knowledge articles. Populate: -- `board` → `{ links: { self: "../Boards/<slug>" } }` -- `knowledgeBase.0` → `{ links: { self: "../Knowledge Articles/<slug>-<article-slug>" } }` (one entry per article) +- `knowledgeBase.<n>` → `{ links: { self: "../Knowledge Articles/<slug>-<article-slug>" } }` (one entry per article) -### IssueTracker Card +The Project card itself does **not** carry a `board` relationship — +the link is one-way IssueTracker → Project (see below). Don't add a +`board` field to the Project document. + +### IssueTracker card **Path:** `Boards/<slug>.json` -**adoptsFrom:** `{ module: "<darkfactoryModuleUrl>", name: "IssueTracker" }` +**adoptsFrom:** `{ module: "<tracker-module-url>", name: "IssueTracker" }` -Create one board per bootstrapped project. It is the canonical board for that -project's issues and should be linked both ways with the Project card. +Create one board per bootstrapped project. It links **back** to the +Project via its `project` relationship; the Project does not link +forward to the board (see the Project section above). | Field | Type | Example | | ------------------ | ------- | ----------------- | | `boardTitle` | String | `"<title> Board"` | | `hideEmptyColumns` | Boolean | `false` | +(Confirm field names against the live schema before writing.) + **Relationships:** - `project` → `{ links: { self: "../Projects/<slug>" } }` -### KnowledgeArticle Card +### KnowledgeArticle cards **Paths:** `Knowledge Articles/<slug>-<article-slug>.json` (as many as needed) **adoptsFrom.name:** `KnowledgeArticle` Always create at least two articles: -- **Brief Context** (`<slug>-brief-context`) — full brief content and background. -- **Agent Onboarding** (`<slug>-agent-onboarding`) — how to work on this project. +- **Brief Context** (`<slug>-brief-context`) — full brief content + and background. +- **Agent Onboarding** (`<slug>-agent-onboarding`) — how to work on + this project, key conventions specific to the brief. Add more as the brief warrants (e.g., detailed visual design, deep domain knowledge). Keep each article cohesive with a clear guiding -principle. Use the `articleType` enum values returned by the schema to -classify each one (e.g., one for context, one for onboarding); use the -schema's enum literally. +principle. Use the `articleType` enum values returned by the schema +to classify each one (e.g., one for context, one for onboarding); +use the schema's enum literally. -### Issue Card — Organized by Entry-Point Card +### Issue cards — one per entry-point card -**Paths:** `Issues/<slug>-<card-name-slug>.json` (one per entry-point card, named after the card) +**Paths:** `Issues/<slug>-<card-name-slug>.json` (one per entry-point +card, named after the card) **adoptsFrom.name:** `Issue` -Organize implementation issues around **entry-point cards** — the top-level cards users interact with directly and that should be discoverable in the catalog. Create **one issue per entry-point card**, named after that card. +Organize implementation Issues around **entry-point cards** — the +top-level cards users interact with directly and that should be +discoverable in the catalog. Create **one Issue per entry-point +card**, named after that card. -Each issue covers the full scope of its entry-point card: +Each Issue covers the full scope of its entry-point card: - Card definition (`.gts`) and any interior/support cards it depends on - QUnit tests (`.test.gts`) for the entry-point card **and** all its support cards -- Catalog Spec (`Spec/<card-name>.json`) with realistic example instances via `linkedExamples` +- Catalog Spec (`Spec/<card-name>.json`) with realistic example + instances via `linkedExamples` -Interior cards (field cards, helper cards, linked supporting types) are -implemented as part of their entry-point card's issue. They need tests -but do **not** need their own catalog specs or separate issues. +Interior cards (field cards, helper cards, linked supporting types) +are implemented as part of their entry-point card's Issue. They +need tests but do **not** need their own Catalog Specs or separate +Issues. -**Populating attributes** (consult the schema for the exact field names and enum values): +**Populating attributes** (consult the schema for exact field names +and enum values): - The issue-id attribute → `"<projectCode>-<N>"` (sequential). - The summary attribute → `"Implement <card name> card"`. -- The description attribute → markdown describing the card to create, its fields, support cards, tests, spec, and examples. **Immutable after creation** — see Issue Invariants below. -- The issue-type / status / priority attributes → use the enum values returned by the schema. For a fresh bootstrap, all issues start in the "backlog" state with issue-type "feature". Mark the first issue as the highest non-critical priority and the rest a step lower. -- An `order` field (sequential integer 1, 2, 3, …) for the scheduler. -- The acceptance-criteria attribute → markdown checklist: card def, support cards, tests, spec, examples. -- Timestamp fields → ISO timestamps. Note that on Issues these are top-level attributes (e.g. `createdAt` / `updatedAt`) — distinct from the timestamp field inside individual `comments[]` entries (which is named `datetime`, see operations skill). The schema is the source of truth. - -**Relationships for each issue** (the schema names the keys): +- The description attribute → markdown describing the card, its + fields, support cards, tests, spec, and examples. **Immutable + after creation** — see Issue invariants below. +- The issue-type / status / priority attributes → use the enum + values the schema returns. For a fresh bootstrap, all Issues + start in the `backlog` state with `issueType` `feature`. Mark + the first Issue as the highest non-critical priority and the + rest a step lower. +- An `order` field (sequential integer 1, 2, 3, …) for the + scheduler. +- The acceptance-criteria attribute → markdown checklist: card def, + support cards, tests, spec, examples. +- Timestamp fields → ISO timestamps. On Issues these are top-level + attributes (e.g. `createdAt` / `updatedAt`) — distinct from the + timestamp field inside individual `comments[]` entries (which is + named `datetime`; see the operations skill). The schema is the + source of truth. + +**Relationships for each Issue** (the schema names the keys): - A `project` link → `../Projects/<slug>`. -- Knowledge-article links — one entry per knowledge article you want loaded into the agent's context (typically the brief-context and agent-onboarding articles you created above). -- A blocked-by relationship for any issues that must complete first. +- Knowledge-article links — one entry per knowledge article you + want loaded into the agent's context when this Issue is picked + up (typically the brief-context and agent-onboarding articles). +- A `blockedBy` relationship for any Issues that must complete + first. -**Dependency ordering:** If one entry-point card depends on another -(e.g., card B uses card A as a field type or linked card), order the -issues so the depended-upon card is implemented first. Set `order` -values accordingly (dependency-free cards get lower order numbers) and -wire the blocked-by relationship so consuming cards cannot start until -their dependencies are done. +**Dependency ordering.** If one entry-point card depends on another +(e.g., card B uses card A as a field type or linked card), order +the Issues so the depended-upon card is implemented first. Set +`order` values accordingly (dependency-free cards get lower order +numbers) and wire the `blockedBy` relationship so consuming cards +cannot start until their dependencies are done. The scheduling +skill describes how the next-issue picker uses these. -If the brief describes only one entry-point card, create one issue. If it describes multiple, create one per entry-point card ordered so dependency cards come first. +If the brief describes only one entry-point card, create one Issue. +If it describes multiple, create one per entry-point card, ordered +so dependency cards come first. -## Issue Invariants — read carefully +## Issue invariants — read carefully -The orchestrator depends on three rules about Issue cards. Before this -skill rewrite they were enforced by a wrapper tool that stripped / -ignored disallowed fields automatically. Now that you write the JSON -directly, you must enforce them yourself: +The orchestrator used to enforce these via a wrapper tool. You +enforce them yourself: 1. **`description` is immutable after creation.** Never modify an - Issue's `description` once the card exists. To add post-creation - context (blocked reasons, validation failures, progress notes), use - the `comments` array instead — see "Adding a comment to an existing - issue" in the operations skill. -2. **`status` transitions are restricted to the agent.** You may set - `status` to `"blocked"` (cannot proceed) or `"backlog"` (unblock). - Never set `status` to `"done"` or `"in_progress"` — those are owned - by the orchestrator based on `signal_done` + validation results. -3. **Read before write for updates.** When updating an existing Issue - (or any tracker card), `Read` the file first, modify only the - attributes you intend to change, then `Write` (or `Edit`) the merged - document back. Do not overwrite the whole file with only the new - fields — you'll silently drop existing attributes the file had. - -## Why Relationships Matter - -The `project` and `relatedKnowledge` relationships on implementation issues are -how the orchestrator loads context for the agent. When the agent picks up an -issue, `ContextBuilder.buildForIssue()` traverses these relationships to -load the Project card and Knowledge Articles into the agent's context. Without -these relationships, the agent would have no project scope or brief content. - -## Document Envelope - -All three card types share the same JSON:API envelope — only the + Issue's `description` once the card exists. To add + post-creation context (blocked reasons, validation failures, + progress notes), use the `comments` array instead — see "Adding + a comment to an existing Issue" in the operations skill. +2. **Status transitions are restricted** — see the + `software-factory-scheduling` skill for the rules. For the + Issues you create in bootstrap, leave `status` at `backlog`. The + agent that picks each one up will flip it to `in_progress` (and + eventually `done` or `blocked`). +3. **Read before write for updates.** When updating an existing + Issue (or any tracker card), `Read` the file first, modify only + the attributes you intend to change, then `Write` (or `Edit`) + the merged document back. Do not overwrite the whole file with + only the new fields — you'll silently drop the existing + attributes. + +## Why relationships matter + +The `project` and `relatedKnowledge` relationships on implementation +Issues are how the next agent that picks up the Issue loads context. +When you pick up an Issue (per `software-factory-scheduling`), you +traverse these relationships to load the Project card and Knowledge +Articles into your working context. **Without these relationships, +the Issue would arrive without any project scope or brief +content** — the agent would have to re-derive everything from the +brief URL. + +## Document envelope + +All four card types share the same JSON:API envelope — only the `attributes`, `relationships`, and `adoptsFrom.name` differ. The attribute names, enum values, and relationship keys come from the -schema you fetched with `get_card_schema`; the envelope is fixed: +schema you fetched with `get-card-type-schema`; the envelope is +fixed: ```json { @@ -224,7 +378,9 @@ schema you fetched with `get_card_schema`; the envelope is fixed: "type": "card", "attributes": { // populate per the schema returned by: - // get_card_schema({ module: "<darkfactoryModuleUrl>", name: "Issue" }) + // boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + // --realm <target-realm-url> \ + // --input '{"codeRef":{"module":"<tracker-module-url>","name":"Issue"}}' }, "relationships": { // each relationship key from the schema points at a sibling @@ -235,7 +391,7 @@ schema you fetched with `get_card_schema`; the envelope is fixed: }, "meta": { "adoptsFrom": { - "module": "<darkfactoryModuleUrl from system prompt>", + "module": "<tracker-module-url>", "name": "Issue" } } @@ -243,13 +399,82 @@ schema you fetched with `get_card_schema`; the envelope is fixed: } ``` -Use relative paths (`../`) for `links.self` since cards live in sibling -directories within the workspace. The same envelope applies to Project -and KnowledgeArticle — only the `adoptsFrom.name` and the -schema-derived attributes/relationships change. +Use relative paths (`../`) for `links.self` since cards live in +sibling directories within the workspace. The same envelope applies +to Project, IssueTracker, and KnowledgeArticle — only the +`adoptsFrom.name` and the schema-derived attributes/relationships +change. + +## The bootstrap-seed Issue + +For visual parity with the SDK orchestrator's output and to give the +human a clear "this is where the factory started" anchor in the +realm UI, write **`Issues/bootstrap-seed.json`** alongside the +implementation Issues. This Issue represents the bootstrap step +itself, not the work it produced. + +Attribute shape (introspect the live `Issue` schema; common fields +shown here): + +- `issueId`: `"<projectCode>-0"` (use sequence `0` so it sorts above + the SN-1 / SN-2 / … implementation Issues) +- `summary`: `"Bootstrap: read the brief and create project artifacts"` +- `description`: short markdown — the brief URL, what was created + (Project / IssueTracker / N Knowledge Articles / M Issues). + Immutable after creation, like any other Issue description. +- `issueType`: `"bootstrap"` (use the enum value the schema returns; + introspect if `"bootstrap"` isn't in the enum and pick the closest + match) +- `status`: `"done"` — set directly when you write the seed Issue, + since the work it represents (this bootstrap pass) is by + definition complete by the time you're persisting it +- `priority`: `"critical"` (puts it above the implementation Issues + even if a scheduler ever picks it up) +- `order`: `0` +- `createdAt` / `updatedAt`: now + +Relationships: + +- `project` → `../Projects/<slug>` +- knowledge-article links — the brief-context and agent-onboarding + articles (so the seed Issue carries the same context any + future-you would need to understand what the bootstrap saw) ## Completion -After creating all artifacts, call `signal_done()`. The orchestrator manages -issue status transitions — do NOT set the issue status to "done" yourself. -The orchestrator will mark the issue as done after validation passes. +When you've written every artifact (Project, IssueTracker, +Knowledge Articles, implementation Issues, and the bootstrap-seed +Issue): + +1. **Push the workspace** to the target realm. + + ```bash + boxel realm push <local-dir> <target-realm-url> + # e.g. boxel realm push . http://localhost:4201/user/my-realm/ + ``` + +2. **Continue with implementation.** Hand off to the + `software-factory-scheduling` skill: search the realm for the + next unblocked Issue (the bootstrap-seed Issue is already + `done`, so the scheduler picks one of the freshly-created + implementation Issues), flip its status to `in_progress`, then + follow `software-factory-operations` to write the card and run + validators. Loop until no eligible Issues remain. The + scheduling skill describes the full status lifecycle and the + loop-termination rules. + +You do not stop after bootstrap. Bootstrap is one phase of a single +end-to-end run; once the artifacts are pushed, scheduling takes +over in the same session. + +## See also + +- `software-factory-scheduling` — picking the next Issue once + bootstrap is done; status-transition rules. +- `software-factory-operations` — what the next agent does inside + each freshly-created implementation Issue (write `.gts` / + `.test.gts` / instances / Spec, run validators, fix failures, + mark done). +- `boxel-development` — `.gts` card authoring patterns. +- `realm-sync` — `boxel realm push` / `boxel realm pull` / `boxel realm sync` / realm-creation + surface. diff --git a/packages/software-factory/.agents/skills/software-factory-operations/SKILL.md b/packages/software-factory/.agents/skills/software-factory-operations/SKILL.md index bb9ed5d37a9..81fd62150a0 100644 --- a/packages/software-factory/.agents/skills/software-factory-operations/SKILL.md +++ b/packages/software-factory/.agents/skills/software-factory-operations/SKILL.md @@ -1,326 +1,501 @@ --- name: software-factory-operations -description: Use when implementing cards in a target realm through the factory execution loop — covers the tool-use workflow for searching, writing, testing, and updating issues via factory tools. +description: Use when implementing a software factory Issue inside an interactive Claude Code session — covers the per-issue workflow of inspecting realm state, writing `.gts` card definitions, tests, instances, and Catalog Spec cards, running the validator CLIs (`boxel lint` / `boxel parse` / `boxel test` / `boxel run-command evaluate-module` / `boxel run-command instantiate-card`), and recording progress on the Issue. Pair with `software-factory-scheduling` (how to pick the next Issue + status transitions) and `software-factory-bootstrap` (what to do when the Issue's `issueType` is `bootstrap`). --- # Software Factory Operations -Use this skill when operating inside the factory execution loop. Workspace -files (card definitions, instances, tests) live in a **local workspace -mirror of the target realm** that the orchestrator syncs back to the realm -between iterations. Realm-side operations (search, host commands, runtime -validators) and control signals are **factory tools** the agent invokes -directly. - -## Realm Roles - -- **Source realm** (`packages/software-factory/realm`) - Publishes shared modules, briefs, templates, and tracker schema. Never write to this realm. -- **Target realm** (user-specified, passed to `factory:go`) - Receives all generated artifacts: Project, Issue, KnowledgeArticle, card definitions, card instances, Catalog Spec cards, and QUnit test files. - -## Workspace files (local mirror of target realm) +Use this skill when you have **picked up an implementation Issue** +(per the `software-factory-scheduling` skill) and need to deliver +the artifacts the Issue describes — a card definition, tests, sample +instances, and a Catalog Spec. There is no orchestrator: you write +files in the workspace mirror of the target realm, push them, run +validators against the realm, fix anything they catch, and flip the +Issue status when you're done. + +## Prerequisite: dev `boxel` CLI must be wired up + +Every command in this skill uses `boxel <subcommand>`. Phase 1 does +**not** assume `boxel` is on PATH — the CLI is run from the +in-monorepo dev source. Before running any `boxel <cmd>` below, +complete the wiring block in the `software-factory-scheduling` +skill (or the equivalent block in `software-factory-bootstrap`). +The block is idempotent; running it again here is safe and cheap. + +## Realm roles + +- **Source realm** (`packages/software-factory/realm/` published at + `<realm-server-origin>/software-factory/`) + Publishes the brief, the tracker schema (Project / IssueTracker + / Issue / KnowledgeArticle), and shared modules. **Never write + to this realm.** +- **Target realm** (user-specified, mirrored by your workspace) + Receives every artifact you generate: card definitions, instances, + tests, Catalog Specs, and the tracker cards (Project / Issue / + KnowledgeArticle). + +## Workspace files (local mirror of the target realm) + +Your `cwd` should be the local mirror of the target realm — a temp +directory the bootstrap step created via `mktemp -d` and `cd`'d +into. If you're picking up an Issue in a fresh session where +bootstrap already ran (so the workspace doesn't exist yet), set +it up now: + +```bash +WORKSPACE="$(mktemp -d)" +boxel realm pull <target-realm-url> "$WORKSPACE" +cd "$WORKSPACE" +``` -The agent's working directory is the workspace — the local mirror of the -target realm that the orchestrator syncs back between iterations. Use the -**native** `Read`, `Write`, `Edit`, `Glob`, `Grep`, and `Bash` tools on -these files; the workspace `cwd` is set for you, so realm-relative paths -resolve directly. +Use the **native** `Read`, `Write`, `Edit`, `Glob`, `Grep`, and +`Bash` tools on these files; realm-relative paths resolve directly. +After you write, push the workspace to the realm with +`boxel realm push <local-dir> <target-realm-url>` — the validators +that touch the realm need the realm to reflect your latest writes. -These files live in the workspace: +Files that live in the workspace: - Card definitions: `*.gts` -- Card tests: `*.test.gts` -- Content card instances under `<CardType>/<id>.json` (the user data the - cards represent — e.g. `StickyNote/note-1.json`) -- Tracker-schema cards: `Projects/<slug>.json`, `Issues/<slug>.json`, +- Card tests: `*.test.gts` (co-located with the definition) +- Card instances under `<CardType>/<id>.json` (the user-data the + cards represent, e.g. `StickyNote/note-1.json`) +- Tracker-schema cards: `Projects/<slug>.json`, + `Boards/<slug>.json`, `Issues/<slug>.json`, `Knowledge Articles/<slug>.json`, `Spec/<slug>.json` -`Bash` is also available for `boxel` CLI commands: - -- Read-only inspection: `boxel status`, `boxel history`, `boxel search`, - `boxel read-transpiled`. -- `boxel run-command` — dispatches to whatever host command you specify. - Most specifiers are read-only inspection commands (`get-card-type-schema`, - `evaluate-module`, `instantiate-card`), but the surface itself is generic; - treat it as "as safe as the named command." - -See the **Realm-side reads** section below for the full usage. - -**Inspect before writing.** Read or grep the file you plan to change, and -glob for sibling files (e.g. existing card definitions in the same -directory) before creating new ones. +**Inspect before writing.** Read or grep the file you plan to +change, and glob for sibling files (e.g. existing card definitions +in the same directory) before creating new ones. The +`boxel-development` skill has the authoring patterns; this skill +just covers the loop around them. ## Tracker-schema cards — write JSON directly -Project, IssueTracker, Issue, KnowledgeArticle, and Spec cards are plain -`.json` files in the workspace. Use `Write` to create them; to update one, -`Read` it, then either `Edit` the relevant attributes or `Write` the -merged document back — same workspace fs surface as `.gts` files. - -| File path | adoptsFrom | -| -------------------------------- | -------------------------------------------------------------- | -| `Projects/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "Project"}` | -| `Boards/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "IssueTracker"}` | -| `Issues/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "Issue"}` | -| `Knowledge Articles/<slug>.json` | `{module: "<darkfactoryModuleUrl>", name: "KnowledgeArticle"}` | -| `Spec/<slug>.json` | `{module: "https://cardstack.com/base/spec", name: "Spec"}` | - -`<darkfactoryModuleUrl>` is named in the system prompt — use that value -verbatim. - -**Always fetch the live schema before writing.** Field names, enum values, -and relationship keys for each card type are introspected at runtime — -never hard-coded in this skill. Call -`get_card_schema({ module, name })` for the card you're about to write -and use the returned `{ attributes, relationships? }` JSON Schema to shape -the document. The bootstrap skill covers the bootstrap-specific attribute -population guidance; this skill covers the operational patterns -(read-before-write, comments, invariants) that layer on top. - -**Read before write.** When updating any tracker card, `Read` the file -first, change only the attributes you intend to update, then write the -merged document back. Don't overwrite the whole file with only your new -fields — you'll silently drop the existing attributes. - -**Issue invariants you must enforce yourself** (these used to be enforced -by a wrapper tool; they aren't anymore): - -- **`description` is immutable** after the issue is created. If you need - to add context — blocked reasons, progress notes, validation failures, - clarification requests — append to the `comments` array instead. See - "Adding a comment to an existing issue" below. -- **Status transitions are restricted.** You may set `status` to - `"blocked"` (cannot proceed) or `"backlog"` (unblock). Never set - `status` to `"done"` or `"in_progress"` — those are owned by the - orchestrator based on `signal_done` + validation results. - -### Adding a comment to an existing issue - -Issue cards carry a containsMany comments array on `attributes`. To -append a comment: - -1. Call `get_card_schema({ module: "<darkfactoryModuleUrl>", name: "Issue" })` - if you don't already have the Issue schema cached. The comments array - entry is itself an object with its own field shape — use the field - names returned by the schema (the body / author / timestamp fields) - verbatim. The timestamp field on a comment is **not** the same as the - Issue's own top-level `createdAt` / `updatedAt` attributes; the schema - disambiguates them. -2. `Read` the issue's `.json`. -3. Append a new entry to the comments array on `data.attributes`, - populating the body (markdown comment text), the author (e.g. - `"factory-agent"` or `"orchestrator"`), and the comment-timestamp - field (ISO timestamp). -4. `Write` (or `Edit`) the document back. **Do not modify the description - or any other attribute** — comments are append-only. +Project, IssueTracker, Issue, KnowledgeArticle, and Spec cards are +plain `.json` files. Use `Write` to create them; to update, `Read` +first, then `Edit` (or `Write` the merged document back) — see +"Read before write" below. + +| File path | adoptsFrom | +| -------------------------------- | ------------------------------------------------------------ | +| `Projects/<slug>.json` | `{module: "<tracker-module-url>", name: "Project"}` | +| `Boards/<slug>.json` | `{module: "<tracker-module-url>", name: "IssueTracker"}` | +| `Issues/<slug>.json` | `{module: "<tracker-module-url>", name: "Issue"}` | +| `Knowledge Articles/<slug>.json` | `{module: "<tracker-module-url>", name: "KnowledgeArticle"}` | +| `Spec/<slug>.json` | `{module: "https://cardstack.com/base/spec", name: "Spec"}` | + +`<tracker-module-url>` is derived from the target realm's origin +(`<origin>/software-factory/darkfactory`) — see the +`software-factory-scheduling` skill for the discovery rule. + +### Always fetch the live schema before writing + +Field names, enum values, and relationship keys evolve. Before +writing any tracker JSON file (Project / Issue / KnowledgeArticle / +Board) or a Spec card, introspect the live schema: + +```bash +boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef": {"module": "<module-url>", "name": "<card-name>"}}' +``` + +The command returns `{ attributes, relationships? }` JSON Schema for +the card class. Use the field names, types, and enum values it +returns verbatim. Schemas are cached on the realm server, so +repeated calls with the same code ref are cheap. + +### Read before write + +When updating any tracker card (or any `.json` you didn't just +write), `Read` the file first, modify only the attributes you intend +to change, then `Write` (or `Edit`) the merged document back. Do +not overwrite the entire file with only the new fields — you'll +silently drop existing attributes, comments, and relationships. + +### Issue invariants you must enforce yourself + +The orchestrator used to enforce these via a wrapper tool that's no +longer in the loop. You enforce them: + +- **`description` is immutable after creation.** Never modify an + Issue's `description` once the card exists. To add context — + blocked reasons, progress notes, validation failures, + clarification requests — append to the `comments` array instead. + See "Adding a comment to an existing Issue" below. +- **Status transitions are restricted** to the values documented in + `software-factory-scheduling`. Briefly: set to `"in_progress"` on + pickup; `"done"` after validators pass and the sync is clean; + `"blocked"` (with a comment) when you can't make progress. The + Issue schema enums the set — introspect it if you're unsure. + +### Adding a comment to an existing Issue + +Issue cards carry a `comments` array on `attributes`. To append: + +1. `Read` the Issue's `.json`. +2. Append a new entry to `data.attributes.comments[]`, populating + the body / author / timestamp fields the schema names. The + comment-timestamp field on an entry is **not** the same as the + Issue's top-level `createdAt` / `updatedAt`; the schema + disambiguates them — call `get-card-type-schema` once if you're + unsure, then reuse the field names. +3. `Write` (or `Edit`) the document back. **Do not modify + `description` or any other top-level attribute** — comments are + append-only. ### Catalog Spec card shape Spec cards (`Spec/<slug>.json`) adopt from -`https://cardstack.com/base/spec` / `Spec`, **not** from the tracker -module. Fetch the live schema before writing: +`https://cardstack.com/base/spec` / `Spec`. Fetch the live schema: -``` -get_card_schema({ module: "https://cardstack.com/base/spec", name: "Spec" }) +```bash +boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef": {"module": "https://cardstack.com/base/spec", "name": "Spec"}}' ``` -Use the returned `{ attributes, relationships? }` to shape the document. -What the schema does **not** tell you and you must supply yourself for +What the schema does **not** tell you and you must supply for entry-point cards: - A display title and short description suitable for the catalog. -- The spec-type field set to the enum value the schema returns for - card-style specs (vs. apps, fields, etc.). -- A code-ref attribute pointing at your `.gts` definition, formatted as - `{module: "../<slug>", name: "<PascalClass>"}` (relative path, no - `.gts` extension) so the spec resolves the definition relative to - itself. +- The spec-type field set to the enum value for card-style specs + (vs. apps, fields, etc.) — use the value the schema returns. +- A code-ref attribute pointing at your `.gts` definition, formatted + as `{module: "../<slug>", name: "<PascalClass>"}` (relative path, + no `.gts` extension) so the spec resolves the definition relative + to itself. - A markdown usage guide for the catalog page. - A linked-examples relationship populated with one or more sample instances: ```json "<linked-examples-key>": [ - { "links": { "self": "../<CardType>/<instance-id>" } }, - ... + { "links": { "self": "../<CardType>/<instance-id>" } } ] ``` (The schema names the relationship key.) -The full document envelope is the same as for tracker cards (`data` / -`type: "card"` / `attributes` / `relationships` / `meta.adoptsFrom`), +The full document envelope is the same as for tracker cards (`data` +/ `type: "card"` / `attributes` / `relationships` / `meta.adoptsFrom`), just with the `https://cardstack.com/base/spec` adoptsFrom. ## Realm-side reads (via `boxel` CLI) For operations that need to reach the realm runtime — searching the -indexed cards, fetching transpiled JS, running host commands — shell out -via `Bash` to the `boxel` CLI. These never go through the workspace fs. +indexed cards, fetching transpiled JS, running host commands — shell +out via `Bash` to `boxel`. These never go through the workspace +filesystem. - **Search the target realm** for cards using a structured query (filter, sort, page). Use this to check for existing cards, find - duplicates, or inspect project state. - ``` + duplicates, or inspect project state: + + ```bash boxel search --realm <target-realm-url> --query '<json>' --json ``` - Single-quote the entire JSON object so the shell does not expand or - split it; keep keys and string values double-quoted inside. Pipe - through `jq` to project. **For the full query syntax (filter / eq / - contains / range / every / any / not / sort / page, CodeRef matching, - common mistakes) see the `boxel-api` skill.** -- **Fetch the transpiled JavaScript** for a `.gts` module — used only - when an eval/instantiate error reports a line/column number, since - those numbers reference the transpiled output, not your `.gts` source. - ``` + + Single-quote the entire JSON so the shell doesn't expand or split + it; keep keys and string values double-quoted inside. Pipe + through `jq` to project. **For the full query syntax (filter / eq + / contains / range / every / any / not / sort / page, CodeRef + matching, common mistakes) see the `boxel-api` skill.** + +- **Fetch transpiled JavaScript** for a `.gts` module — only when an + eval/instantiate error reports a line/column number, since those + numbers reference the transpiled output, not your `.gts` source: + + ```bash boxel read-transpiled <realm-relative-path> --realm <target-realm-url> ``` - The `.gts` extension is optional. Pipe through `sed -n '<line>p'` (or - wrap with `awk`) to inspect a single line. See the **Debugging - Runtime Evaluation Errors** section below for when to reach for this. + + The `.gts` extension is optional. Pipe through `sed -n '<line>p'` + to inspect a single line. See "Debugging Runtime Evaluation + Errors" below for when to reach for this. + - **Run any other host command** in the realm's prerendered runtime (module evaluation, card instantiation, anything else exposed at `@cardstack/boxel-host/commands/<name>/default`): + + ```bash + boxel run-command <command-specifier> \ + --realm <target-realm-url> \ + --input '<json>' --json ``` - boxel run-command <command-specifier> --realm <target-realm-url> --input '<json>' --json - ``` - Most agent tasks won't need this — the validators below already wrap - the common host commands. See the `boxel-command` skill for the - programmatic surface and failure modes. - -### Fetching live card-type schemas - -`get_card_schema({ module, name })` returns the live JSON Schema -(`{ attributes, relationships? }`) for any `CardDef`, introspected from -the actual class via the realm server's prerenderer (the same path the -AI Bot uses for its patch-tool schemas). Always call this before writing -a tracker card (Project / Issue / KnowledgeArticle), a Spec card, or any -other card whose shape you need to know. Schemas are cached per-process, -so repeated calls with the same code ref are free. - -## Self-Validation (optional, in-memory results) - -All five validators are factory tools, safe to call repeatedly mid-turn. -They return in-memory result objects and **do not persist any durable -validation cards** — the orchestrator still runs the full validation -pipeline (which persists `TestRun` / `LintResult` / `ParseResult` / -`EvalResult` / `InstantiateResult` cards) after `signal_done`, so calling -any of these mid-turn is optional. - -**Side effect to know about:** the realm-touching validators -(`run_evaluate`, `run_instantiate`, `run_tests`) sync your workspace to -the realm before invoking the prerenderer, so they push whatever you've -just written. That's the same write the orchestrator's between-iteration -sync would have done — it's not destructive, but it does mean calling -these tools is the moment your local writes hit the realm. The lighter -validators (`run_lint`, `run_parse`) run entirely in-process and don't -touch the realm. - -- `run_lint({ path? })` — Run ESLint + Prettier (with `@cardstack/boxel` - rules) and return an in-memory `RunLintResult` with `status`, - `filesChecked`, `filesWithErrors`, `errorCount`, `warningCount`, - `durationMs`, `lintableFiles`, and per-violation `{ rule, file, line, -column, message, severity }`. Without `path`, lints every `.gts` / - `.gjs` / `.ts` / `.js` file in the target realm. With `path` - (realm-relative file path), lints **only that one file** — prefer this - right after writing or editing a single file. -- `run_tests()` — Run the realm's QUnit suite and receive an in-memory - result object `{ status, passedCount, failedCount, skippedCount, -durationMs, testFiles, failures, errorMessage? }`. Use it when you - want feedback before signalling done. -- `run_parse({ path? })` — Parse and type-check files in the target - realm and return an in-memory `RunParseResult` with `status`, - `filesChecked`, `filesWithErrors`, `errorCount`, `durationMs`, - `parseableFiles`, and per-error `{ file, line, column, message }`. - Without `path`, runs glint (ember-tsc) over every `.gts` / `.gjs` / - `.ts` file in the realm AND validates every `.json` file listed as a - Spec `linkedExample` (same discovery as the parse validation step). - With `path` (realm-relative file path), parses **only that one file** - — `.gts` / `.gjs` / `.ts` runs through glint; `.json` is parsed and - checked for card document structure. The extension is required; - `parseableFiles` entries are always returned in the `.json` / `.gts` - / `.gjs` / `.ts` form, so you can feed any of them straight back into - `path`. Prefer the single-file form right after writing or editing one - file. -- `run_evaluate({ path? })` — Evaluate ESM modules (`.gts` / `.gjs` / - `.ts` / `.js`) in the target realm via the prerenderer sandbox and - return a `RunEvaluateResult` (status, module counts, per-failure - `{ path, error, stackTrace? }`). Without `path`, evaluates every - non-test evaluable module. With `path`, evaluates only that single - realm-relative file — handy for a quick self-check right after writing - one module. Test files (`*.test.*`) are rejected — the test runner - validates those. The tool bound-polls past the brief read-after-write - window where the realm has the source on disk but indexing hasn't - populated the module map yet, so a returned failure is a real failure - — don't retry on the agent side. When a failure reports a line/column, - those numbers refer to the transpiled module — pair with - `boxel read-transpiled` (see Realm-side reads above) to locate the - offending source construct, then fix the `.gts` source (never copy - transpiled patterns back into source). -- `run_instantiate({ path? })` — Instantiate card example instances in - the target realm via the prerenderer sandbox and return a - `RunInstantiateResult` (status, instance counts, per-failure `{ path, -cardName, error, stackTrace? }`). Without `path`, searches the realm - for Spec cards and instantiates every `linkedExample` on every - card/app Spec; specs with no `linkedExamples` still get a bare - instantiation to exercise the card class. With `path`, instantiates - only that single realm-relative `.json` example file — its - `meta.adoptsFrom` supplies the module + card name, and spec discovery - is skipped entirely so you can self-check one instance in isolation. - The `path` argument must end in `.json`. `instanceFiles` only contains - real `.json` example paths (bare-instantiation fallbacks are filtered - out) so any entry can be fed straight back into `path`. If a bare - instantiation fails, its failure entry has `path: ''` and a populated - `cardName` — identify the spec by `cardName` and do NOT pass the empty - path back into `path`. The tool bound-polls past the brief - read-after-write window where the realm has the source on disk but - indexing hasn't populated the module map yet, so a returned failure - is a real failure — don't retry on the agent side. When a failure - reports a line/column, those numbers refer to the transpiled module — - pair with `boxel read-transpiled` (see Realm-side reads above) to - locate the offending source construct, then fix the `.gts` source - (never copy transpiled patterns back into source). - -## Control Flow - -- `signal_done()` — Signal that the current issue is complete. Call this - only after all implementation and test files have been written. -- `request_clarification({ message })` — Signal that you cannot proceed - and need human input. Describe what is blocking. - -## Required Flow + + Most agent tasks won't need this directly — the validators below + already wrap the common ones. See the `boxel-command` skill for + the programmatic surface and failure modes. + +## Validators (CLI commands) + +Run these against the target realm after writing files. The CLI +itself just prints results — it does **not** write validation +cards into the realm. Persisting the audit trail (TestRun, +LintResult, ParseResult, EvalResult, InstantiateResult into the +`Validations/` folder) is **your** responsibility: see +"Validation artifact cards" below for the card types, file naming, +sequence numbers, and how to map `--json` output to the card's +attributes. Always run each validator with `--json` so you can +capture the structured result and convert it into the card. + +**Push first.** All five validators read from the realm (not your +local workspace). After writing files in the workspace, push them to +the realm before running any validator: + +```bash +boxel realm push <local-dir> <target-realm-url> +# e.g. boxel realm push . http://localhost:4201/user/my-realm/ + +# Or two-way sync (resolves conflicts via --prefer-local etc.): +boxel realm sync <local-dir> <target-realm-url> --prefer-local +``` + +See the `realm-sync` skill for the full surface (flags, conflict +resolution, watch mode). + +### `boxel lint [path] --realm <url>` + +ESLint + Prettier with the `@cardstack/boxel` rules. Without a +path, lints every `.gts` / `.gjs` / `.ts` / `.js` file in the realm. +With a realm-relative path, lints just that file — prefer this +right after writing or editing a single file. + +```bash +boxel lint sticky-note.gts --realm http://localhost:4201/alice/my-realm/ +boxel lint --realm http://localhost:4201/alice/my-realm/ # whole realm +``` + +Exit code is non-zero when any error-severity violation exists. +Pass `--json` for a structured result you can pipe through `jq`. + +### `boxel parse [path] --realm <url>` + +Type-checks `.gts` / `.gjs` / `.ts` via glint (template-aware +TypeScript) and validates the document structure of `.json` files +linked as Spec `linkedExamples`. Without a path, parses every +parseable file in the realm. With a path, parses just that file — +the extension is required (`.gts` / `.gjs` / `.ts` / `.json`). + +```bash +boxel parse sticky-note.gts --realm http://localhost:4201/alice/my-realm/ +boxel parse --realm http://localhost:4201/alice/my-realm/ +``` + +**Monorepo-only:** parse runs glint locally against the host app's +node_modules + monorepo paths. It will fail outside the boxel +monorepo with a clear error. + +### `boxel run-command @cardstack/boxel-host/commands/evaluate-module/default` + +Evaluates an ESM module in the realm's prerenderer sandbox. Use +right after writing a `.gts` to catch import errors, decorator +mishaps, or anything else that fails at module load time. + +**Input shape** — `moduleIdentifier` is the _absolute_ module URL +(no `.gts` extension); `realmIdentifier` is the absolute target +realm URL (used for SSRF validation): + +```bash +MODULE="http://localhost:4201/user/my-realm/sticky-note" +REALM="http://localhost:4201/user/my-realm/" +boxel run-command @cardstack/boxel-host/commands/evaluate-module/default \ + --realm "$REALM" --json \ + --input "$(jq -nc --arg m "$MODULE" --arg r "$REALM" \ + '{moduleIdentifier:$m, realmIdentifier:$r}')" +``` + +`--json` returns the standard `run-command` wrapper: +`{"status":"ready"|"error","result":"<json-string>","error":...}`. +Parse `result` as JSON; the command's own fields live at +`data.attributes`: `passed` (bool), and on failure `error` + +`stackTrace`. A handy one-liner: + +```bash +boxel run-command ... --json | jq -r '.result | fromjson | .data.attributes.passed' +``` + +When a failure reports a line/column, those numbers refer to the +**transpiled** module — pair with `boxel read-transpiled` (see +above) to find the offending source construct, then fix the `.gts` +source. **Never copy transpiled patterns back into source.** + +Test files (`*.test.*`) are rejected — `boxel test` handles those. + +### `boxel run-command @cardstack/boxel-host/commands/instantiate-card/default` + +Instantiates a single card in the prerenderer sandbox. Use after +writing a `.json` instance to catch shape mismatches or runtime +errors in field initializers. + +**Input shape** — `moduleIdentifier`, `cardName`, and +`realmIdentifier` are required; `instanceData` is optional but +needed to exercise actual field values. **All three identifiers +must be absolute URLs.** If `instanceData` is passed, its +`data.meta.adoptsFrom.module` must already be the same absolute URL +(the relative form `../sticky-note` will be rejected with +"instanceData adoptsFrom (...) does not match moduleUrl/cardName"). + +```bash +MODULE="http://localhost:4201/user/my-realm/sticky-note" +REALM="http://localhost:4201/user/my-realm/" +CARD_NAME="StickyNote" +# Instance folders are named exactly after the card type (singular), +# matching the convention used across the catalog/experiments realms. +INSTANCE_PATH="StickyNote/note-1.json" + +# Read the workspace JSON, rewrite adoptsFrom.module to the absolute URL, +# then feed it to instantiate-card as the instanceData string. +INSTANCE_DATA=$(jq -c --arg m "$MODULE" --arg name "$CARD_NAME" \ + '.data.meta.adoptsFrom = {module:$m, name:$name}' "$INSTANCE_PATH") + +boxel run-command @cardstack/boxel-host/commands/instantiate-card/default \ + --realm "$REALM" --json \ + --input "$(jq -nc --arg m "$MODULE" --arg n "$CARD_NAME" --arg r "$REALM" --arg d "$INSTANCE_DATA" \ + '{moduleIdentifier:$m, cardName:$n, realmIdentifier:$r, instanceData:$d}')" +``` + +Same `--json` shape as `evaluate-module`: parse the wrapper's +`result` field as JSON, then read `data.attributes.passed` / +`error` / `stackTrace`. + +**Do not** pass a `Spec/...json` path or any card whose +`adoptsFrom.module` is a base-realm URL +(`https://cardstack.com/base/...`). Specs adopt from the base +realm, and the prerender refuses cross-origin module loads with +"moduleUrl origin does not match realmUrl origin". To validate +Specs, run `boxel test` (which exercises the Spec's +`linkedExamples` against the card class). + +### `boxel test --realm <url>` + +Drives a headless Chromium against the host app's compiled test +bundle and runs every `*.test.gts` file in the realm. Returns +pass/fail counts plus per-failure details. + +```bash +boxel test --realm http://localhost:4201/alice/my-realm/ +boxel test --realm <url> --json | jq # machine-readable +boxel test --realm <url> --debug # stream browser console +``` + +**Monorepo-only:** test discovers the host app's `dist/` directory +relative to this CLI. The host app must be built first +(`pnpm --filter @cardstack/host build`). + +A test run with zero tests, or with all tests skipped, returns +`status: "failed"` — **never use `QUnit.skip()` / `QUnit.todo()`** in +your `.test.gts` files. Tests must actually execute. See "Writing +QUnit card tests" below. + +## Required flow per Issue 1. **Inspect before writing.** Search the target realm for existing - cards (`boxel search --realm <url> --query '<json>'` via `Bash` — - see Realm-side reads above, with full syntax in the `boxel-api` - skill). Read or grep the workspace files you plan to change (or - sibling files in the same directory) before creating or modifying - anything. + cards (`boxel search`) and `Read`/`Glob` workspace files you plan + to change (or sibling files in the same directory). 2. **Write card definitions** (`.gts`) into the workspace. -3. **Write `.test.gts` test files** co-located with card definitions. - Every issue must have at least one test file. **Write tests - immediately after the card definition, before any instances or - catalog specs.** +3. **Write `.test.gts` test files** co-located with the card + definition. Every issue must include at least one test file. + Write tests **immediately** after the card definition, before + instances or Spec cards — it forces the API decisions early. 4. **Write card instances** (`.json`) into the workspace. -5. **Write a Catalog Spec card** (`Spec/<card-name>.json`) — adoptsFrom - `https://cardstack.com/base/spec` / `Spec`. Link sample instances via - `relationships.linkedExamples`. -6. **(Optional) Call `run_tests()`** to self-validate before signalling - done. This returns test results in-memory without writing any realm - artifacts. Iterating on your own work with `run_tests` is faster than - round-tripping through the orchestrator pipeline. -7. **Call `signal_done()`** when all implementation and test files are - written. The orchestrator runs the full validation pipeline (which - persists a `TestRun` card, among other artifacts) automatically after - this. -8. **If tests fail**, the orchestrator feeds failure details back. - Re-read the affected workspace files, fix them, and call - `signal_done()` again. -9. **Record progress** by appending to the issue's `comments` array - (Read + Edit the issue JSON). Never modify the issue's `description`. - -## Target Realm Artifact Structure +5. **Write a Catalog Spec card** (`Spec/<card-name>.json`) — adopts + from `https://cardstack.com/base/spec` / `Spec`. Link sample + instances via `relationships.linkedExamples`. +6. **Push the workspace** to the target realm. +7. **Run the validators and iterate until all five pass — or + until you hit a bail-out limit.** Each pass through the loop: + 1. Run each validator (in this order — cheap to expensive): + `boxel lint <changed-file>`, `boxel parse <changed-file>`, + `boxel run-command @cardstack/boxel-host/commands/evaluate-module/default --input '{"moduleIdentifier":"<absolute-module-url>","realmIdentifier":"<absolute-realm-url>"}'`, + `boxel run-command @cardstack/boxel-host/commands/instantiate-card/default --input '{"moduleIdentifier":"<absolute-module-url>","cardName":"<ClassName>","realmIdentifier":"<absolute-realm-url>","instanceData":"<json-string-with-absolute-adoptsFrom>"}'`, + `boxel test`. Use `--json` so the structured result is + available. + 2. After each validator, write a corresponding + `Validations/<type>_<issue-slug>-<n>.json` artifact card + (see "Validation artifact cards" below). The card captures + this run's result — `status: "passed"` or `"failed"` — + regardless of outcome. + 3. If any validator returned `"failed"` or `"error"`: fix the + relevant source files in the workspace, push, and re-run. + Write new artifact cards on the next iteration with the + next sequence number (`<type>_<issue-slug>-2.json`, + `-3.json`, …) — do NOT overwrite the previous ones. The + historical sequence is the audit trail. + 4. Stop iterating when every validator's most recent + artifact card has `status: "passed"`. A single fix-up + that resolves multiple validators can land in one + iteration; you don't have to re-fail before each retry. + + **Do not mark the Issue done until every validator passes.** + "Most failed but a few passed, good enough" is not the bar. + + **Bail-out limits — don't spiral.** The validator loop must + terminate. If you hit any of these, stop iterating and + proceed to "Bailing out" below instead of marking the Issue + done: + - **8 total iterations per Issue.** If after 8 passes through + the validator loop you still don't have all five validators + green, stop. (This matches the orchestrator's old + `maxIterationsPerIssue` default.) + - **3 consecutive failures of the same validator with the + same error.** Compare the latest 3 artifact cards for the + failing validator. If the failure message (or the first + line of the stack) is identical, your fix isn't working — + stop. Continuing past this point burns context for no gain. + - **5 distinct fix attempts on the same validator without a + single pass.** Look across the artifact-card sequence: if a + validator has 5+ failed cards and no passed card, the + problem is outside this Issue's scope. + +8. **Push the workspace** so the final validation cards land on + the realm. +9. **Either mark done or bail out.** + - If all five validators passed: edit + `Issues/<slug>.json:data.attributes.status` to `"done"` and + push. (See `software-factory-scheduling` for the full + status-transition rules.) + - If you hit a bail-out limit: see "Bailing out" below. + +## Bailing out (when validators won't go green) + +Some failures aren't fixable in one agent session — a brief +ambiguity, a runtime error rooted outside the workspace, a flaky +host-app dependency. Don't keep retrying past the limits in +step 7. + +When you stop: + +1. Set the Issue's `status` to `"blocked"` (not `"done"`, not + `"in_progress"`). +2. Append a comment to the Issue (see "Adding a comment to an + existing Issue") summarizing: + - Which validator(s) you couldn't get green. + - The most recent failure message(s), copied verbatim from + the artifact card. + - A brief enumeration of what you tried, keyed to the artifact + card sequence numbers (e.g. "Iteration 1: added missing + field. Iteration 2: changed type. Iteration 3: ..."). + - Which bail-out limit you hit (8 iterations / 3 identical + failures / 5 distinct attempts). +3. Push so the status flip + comment land on the realm. +4. Hand back to `software-factory-scheduling` — pick the next + eligible Issue and work it. **Do not** mark the Project + `projectStatus: completed` if any Issues are blocked; finish + what's still workable, then stop and report. + +The artifact cards under `Validations/` already capture the +detailed evidence (per-validator output, sequence of failures). +The Issue comment is the human-readable summary on top of that +audit trail. + +If you cannot make progress at any step, set the Issue's `status` +to `"blocked"`, append a comment explaining what's stuck, push, and +report back to the user. See `software-factory-scheduling`. + +## Target realm artifact structure ``` target-realm/ @@ -330,85 +505,182 @@ target-realm/ │ └── sample-instance.json # Card instance ├── Spec/ │ └── card-name.json # Catalog Spec card -├── Validations/ -│ ├── test_issue-slug-1.json # TestRun card (test results) -│ └── lint_issue-slug-1.json # Lint result card ├── Projects/ │ └── project-name.json # Project card +├── Boards/ +│ └── project-name.json # IssueTracker card ├── Issues/ │ └── issue-slug.json # Issue card -└── Knowledge Articles/ - └── article-name.json # KnowledgeArticle card +├── Knowledge Articles/ +│ └── article-name.json # KnowledgeArticle card +└── Validations/ + ├── lint_issue-slug-1.json # LintResult card + ├── parse_issue-slug-1.json # ParseResult card + ├── eval_issue-slug-1.json # EvalResult card + ├── instantiate_issue-slug-1.json # InstantiateResult card + └── test_issue-slug-1.json # TestRun card ``` -## Debugging Runtime Evaluation Errors - -Eval-step and instantiate-step validation failures surface line/column -references that point to the **transpiled** JavaScript output, not the -`.gts` source you wrote. The realm compiles `.gts` to JS before execution -and runtime errors reference the compiled output. +## Validation artifact cards + +After each validator runs, write a corresponding artifact card +under `Validations/`. Together they form the audit trail the human +sees in the Boxel host UI — a sortable history of every validation +run for every Issue. + +Five card types, one per validator, all published from the source +realm. Build each module URL from the target realm's origin (same +pattern as the tracker module URL): + +| CLI | Card class | Source module | +| ------------------------------------------------ | ------------------- | ---------------------------------------------- | +| `boxel lint` | `LintResult` | `<origin>/software-factory/lint-result` | +| `boxel parse` | `ParseResult` | `<origin>/software-factory/parse-result` | +| `boxel run-command .../evaluate-module/default` | `EvalResult` | `<origin>/software-factory/eval-result` | +| `boxel run-command .../instantiate-card/default` | `InstantiateResult` | `<origin>/software-factory/instantiate-result` | +| `boxel test` | `TestRun` | `<origin>/software-factory/test-results` | + +### File naming + +`Validations/<type>_<issue-slug>-<n>.json` where: + +- `<type>` ∈ `lint`, `parse`, `eval`, `instantiate`, `test`. +- `<issue-slug>` is the Issue's slug (the part after `Issues/` in + its file path — e.g. `sticky-note-sticky-note` for + `Issues/sticky-note-sticky-note.json`). +- `<n>` is a per-issue sequence number: 1 on first run, increment + for retries. Before writing, glob + `Validations/<type>_<issue-slug>-*.json` to find the highest + existing number and use `n+1`. On the first iteration the folder + may not exist yet — that's fine, `<type>_<issue-slug>-1.json`. + +### Document shape + +For each artifact card: + +1. Run the validator with `--json` and capture the structured + output. The CLI's JSON shape is **not** the card's attribute + shape — they overlap but differ. +2. Introspect the live card schema before writing: + + ```bash + boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef":{"module":"<source-module-url>","name":"<ClassName>"}}' + ``` + +3. Map the validator's `--json` fields to the schema's attributes. + The naming usually matches closely (`status`, `errorCount`, + `durationMs`, etc.) but the schema is the source of truth for + field names, types, and enum values. Don't guess. +4. Write the card via `Write` with the standard JSON:API envelope: + + ```json + { + "data": { + "type": "card", + "attributes": { + /* mapped from the validator's --json output, per the schema */ + "status": "passed", + "runAt": "2026-05-15T10:42:00.000Z" + /* ...other schema attributes... */ + }, + "relationships": { + /* if the schema names an issue/project relationship, link + back to ../Issues/<slug> / ../Projects/<slug> */ + }, + "meta": { + "adoptsFrom": { + "module": "<source-module-url>", + "name": "<ClassName>" + } + } + } + } + ``` + +### Write artifact cards even on success + +The audit trail is the point — a green `LintResult` is just as +valuable as a red one for showing the human what was checked. +Always write the card after running the validator, regardless of +pass/fail. The `status` attribute (`"passed"` / `"failed"` / +`"error"`) carries the outcome. + +### Iteration semantics + +When fixing a validator failure, **don't overwrite the previous +artifact**. Write a new card with the next sequence number +(`<type>_<issue-slug>-2.json`, `-3.json`, …) so the history shows +the full path from failure to fix. The host UI sorts by +`sequenceNumber` (or `runAt`) and displays the latest at the top. + +## Debugging runtime evaluation errors + +`evaluate-module` and `instantiate-card` failures surface +line/column references that point to the **transpiled** JavaScript +output, not the `.gts` source you wrote. The realm compiles `.gts` +to JS before execution, and runtime errors reference the compiled +output. When a validation error contains text like -`(error occurred in '/.../sticky-note.gts' @ line 66 : column 32)`, the -line number is for the transpiled module. Fetch the transpiled output -and read the reported line to see what compiled construct raised the -error — then reason back to the `.gts` source construct that produced -it. +`(error occurred in '/.../sticky-note.gts' @ line 66 : column 32)`, +the line number is for the transpiled module. Fetch the transpiled +output and read the reported line: +```bash +boxel read-transpiled sticky-note.gts \ + --realm <target-realm-url> | sed -n '60,70p' ``` -boxel read-transpiled sticky-note.gts --realm <target-realm-url> -``` - -Pipe through `sed -n '60,70p'` (or similar) to focus on a window around -the reported line. -For example, `" is not a valid character within attribute names: (error -occurred in '/.../sticky-note.gts' @ line 66 : column 32)` typically -points inside a `precompileTemplate(...)` block in the transpiled -output. The actual fault in the source is often in a CSS comment or a -template expression — line 66 in your `.gts` source is unrelated. -Reading the transpiled line is what connects the error back to the -source. +For example, `" is not a valid character within attribute names` at +`line 66 : column 32` typically points inside a +`precompileTemplate(...)` block in the transpiled output. The +actual fault in the source is often in a CSS comment or a template +expression — line 66 in your `.gts` source is unrelated. Reading +the transpiled line is what connects the error back to the source +construct. ### The transpiled output is for DEBUGGING ONLY — never for implementation -**Scope:** the transpiled fetch (`boxel read-transpiled`) is only for -investigating **runtime errors in `.gts` modules you have already -written** — when an eval or instantiate validation failure points to a -line/column in the transpiled output and you need to map that -coordinate back to your source. It is not for learning how to write -cards, not for understanding Boxel patterns, and not a general -reference. - -- **Do not copy patterns, imports, or shapes from the transpiled output - into your `.gts` source.** The transpiler emits artifacts like - `setComponentTemplate(...)`, `precompileTemplate(...)`, wire-format - template arrays, base64 CSS imports (`./file.gts.CiAg...`), and other - compiler internals. None of those belong in source code. +**Scope:** `boxel read-transpiled` is only for investigating +**runtime errors in `.gts` modules you have already written** — +when an eval or instantiate failure points to a line/column in the +transpiled output and you need to map that coordinate back to your +source. It is **not** for learning how to write cards, not for +understanding Boxel patterns, and not a general reference. + +- **Do not copy patterns, imports, or shapes from the transpiled + output into your `.gts` source.** The transpiler emits artifacts + like `setComponentTemplate(...)`, `precompileTemplate(...)`, + wire-format template arrays, base64 CSS imports + (`./file.gts.CiAg...`), and other compiler internals. None of + those belong in source code. - **Do not write `.gts` that "looks like" the compiled JS.** Always - write clean, idiomatic Ember / `<template>`-tag / CardDef / FieldDef - source. If you find yourself tempted to hand-write a - `setComponentTemplate(...)` call or a wire-format template, stop — - you're modeling the wrong layer. -- **Always edit the `.gts` source, never the transpiled output.** The + write clean idiomatic Ember / `<template>`-tag / CardDef / + FieldDef source. If you find yourself tempted to hand-write a + `setComponentTemplate(...)` call or a wire-format template, stop + — you're modeling the wrong layer. +- **Always edit `.gts` source, never the transpiled output.** The realm regenerates the transpiled JS on every write, so any edit there is silently discarded. -- **When in doubt, favor idiomatic card development practices.** The - `boxel-development` skill and existing cards in the target realm are - the right references — not what the compiler happens to emit. +- **When in doubt, favor idiomatic card development practices.** + The `boxel-development` skill and existing cards in the target + realm are the right references — not what the compiler happens to + emit. Use the transpiled fetch the way a developer uses a source map: to -translate a runtime line number back to a source construct in the code -**you wrote**, then close the transpiled view and fix the source -idiomatically. +translate a runtime line number back to a source construct in the +code **you wrote**, then close the transpiled view and fix the +source idiomatically. -## Writing QUnit Card Tests +## Writing QUnit card tests -Test files are `.test.gts` files co-located with card definitions in the -target realm. Each test file exports a `runTests()` function that +Test files are `.test.gts` files co-located with card definitions +in the target realm. Each file exports a `runTests()` function that registers QUnit modules and tests. -### Example Test +### Example test ```typescript // sticky-note.test.gts — co-located with sticky-note.gts @@ -425,7 +697,8 @@ export function runTests() { test('renders title in fitted view', async function (assert) { let loader = getService('loader-service').loader; - let { StickyNote } = await loader.import(cardModuleUrl); + let { StickyNote } = + await loader.import<typeof import('./sticky-note')>(cardModuleUrl); let note = new StickyNote({ title: 'Test Note', body: 'Hello' }); await renderCard(loader, note, 'fitted'); assert.dom('[data-test-title]').hasText('Test Note'); @@ -434,45 +707,75 @@ export function runTests() { } ``` -### Key Points - -- Tests are `.test.gts` files co-located with the card definition (e.g., - `sticky-note.gts` and `sticky-note.test.gts`) -- Each test file must export a `runTests()` function -- Use `import.meta.url` to resolve card definitions relative to the test - file — never hardcode realm URLs +**Why `loader.import<typeof import('./sticky-note')>(...)`?** The +`loader.import()` return type is untyped by default — destructuring +`{ StickyNote }` from it would type-check as `any` in your +`.test.gts` and **`boxel parse` will fail with a type error** +(loader.import returns `{}` for a generic call). Always pass the +module's TypeScript shape via the type generic, using the same +relative path you'd use for a direct import. This is parse-step +table stakes; without it your tests don't get past validation. + +### Key points + +- Tests are `.test.gts` files co-located with the card definition + (e.g., `sticky-note.gts` and `sticky-note.test.gts`). +- Each test file must export a `runTests()` function. +- Use `import.meta.url` to resolve card definitions relative to the + test file — never hardcode realm URLs. - Use `setupCardTest(hooks)` for rendering context, then - `renderCard(loader, card, format)` for DOM assertions -- No external realm writes during tests — all test data lives in browser - memory -- Use `data-test-*` attributes for DOM selectors when testing rendered - output + `renderCard(loader, card, format)` for DOM assertions. +- No external realm writes during tests — all test data lives in + browser memory. +- Use `data-test-*` attributes for DOM selectors when testing + rendered output. - Use QUnit assertions: `assert.dom()`, `assert.strictEqual()`, - `assert.ok()` + `assert.ok()`. +- **Wrap every `test(...)` in a QUnit `module('<name>', function +(hooks) { ... })` block.** The TestRun UI (and any future + TestRun cards you write) group results by module name; top-level + tests collapse into a "default" bucket and become hard to read. - **Never use `QUnit.skip()` or `QUnit.todo()`.** All tests must - actually execute. Skipped/todo tests are flagged as `skipped` in the - TestRun card and treated as a failure when no tests actually ran. The - orchestrator will reject a TestRun where every test is skipped. - -## Important Rules - -- **Never write to the source realm.** All generated artifacts go to the - target realm via the workspace mirror. -- **Stay inside the workspace.** Workspace fs operations are scoped to - the local mirror of the target realm. Use realm-relative paths - (`sticky-note.gts`, `StickyNote/note-1.json`) — never absolute paths - outside the workspace, never the user's home directory, never the - source realm. -- **Don't drive sync yourself.** The orchestrator owns `boxel sync` / - `boxel push`. Read-only `boxel` commands (`boxel status`, - `boxel history`) are fine for inspection, but never run sync, push, - or any command that mutates the realm directly. -- **Write source code, not compiled output.** When writing `.gts` files, - write clean idiomatic source — never compiled JSON blocks or base64- - encoded content. -- **Use absolute `adoptsFrom.module` URLs** when referencing definitions - that live in a different realm (e.g., the source realm's tracker - schema). -- **Start small and iterate.** Write the smallest working implementation - first, then add the test. If tests fail, read the failure output - carefully before making targeted fixes. + actually execute. A run with zero tests, or with every test + skipped, is reported as `failed`. + +## Important rules + +- **Never write to the source realm.** All generated artifacts go + to the target realm via the workspace mirror. +- **Stay inside the workspace.** Workspace fs operations are scoped + to the local mirror of the target realm. Use realm-relative paths + (`sticky-note.gts`, `StickyNote/note-1.json`) — never absolute + paths outside the workspace, never the user's home directory, + never the source realm. +- **Push after every meaningful batch of writes.** The validators + read from the realm. Workspace writes are invisible to them until + the push lands. Don't run a validator and then act surprised at + stale results. +- **Write source code, not compiled output.** When writing `.gts` + files, write clean idiomatic source — never compiled JSON blocks + or base64-encoded content. +- **Use absolute `adoptsFrom.module` URLs** when referencing + definitions that live in a different realm (e.g., the source + realm's tracker schema or `https://cardstack.com/base/spec`). +- **Start small and iterate.** Write the smallest working + implementation first, then add the test. If tests fail, read the + failure output carefully before making targeted fixes — don't + pile speculative changes on top of a failure you haven't + understood. + +## See also + +- `software-factory-scheduling` — picking the next Issue, + status-transition rules, building per-issue context from + relationships. +- `software-factory-bootstrap` — what to do when the Issue's + `issueType` is `bootstrap` (create Project / IssueTracker / + Knowledge Articles / implementation Issues from a brief). +- `boxel-development` — `.gts` card authoring patterns + (CardDef / FieldDef, fields, formats, templates, common + pitfalls). The agent-facing reference for "what does the + `.gts` actually look like". +- `boxel-api` — full `boxel search` query syntax. +- `boxel-command` — programmatic surface for `boxel run-command`. +- `realm-sync` — `boxel realm push` / `boxel realm pull` / `boxel realm sync` / workspace sync. diff --git a/packages/software-factory/.agents/skills/software-factory-scheduling/SKILL.md b/packages/software-factory/.agents/skills/software-factory-scheduling/SKILL.md new file mode 100644 index 00000000000..8abe198f5f5 --- /dev/null +++ b/packages/software-factory/.agents/skills/software-factory-scheduling/SKILL.md @@ -0,0 +1,229 @@ +--- +name: software-factory-scheduling +description: Use when working a software factory run from inside an interactive Claude Code session — picking the next unblocked Issue from a target realm, transitioning its status through the lifecycle, and recording progress as the agent works through the issue backlog. Replaces the orchestrator's `issue-scheduler` + status-transition logic now that the agent drives the loop directly. +--- + +# Software Factory Scheduling + +Use this skill whenever you need to **pick the next issue to work on** +in a software-factory target realm, or when you need to **transition +an issue's status** as you start, finish, or get stuck on it. This is +the loop control logic the orchestrator used to own; in the +interactive Claude Code flow the agent owns it. + +## When you use this skill + +The user pointed you at a target realm and asked you to work the +factory backlog. You are about to: + +- Search the realm for `Issue` cards. +- Decide which one to pick up next. +- Set its status to `in_progress`, work it (per the + `software-factory-bootstrap` or `software-factory-operations` + skill, depending on `issueType`), then mark it `done` or `blocked`. +- Repeat until no eligible issues remain. + +## First: verify `boxel` is installed + +The user is expected to have `@cardstack/boxel-cli` installed +(see the runbook prerequisites). Verify it before any `boxel +<cmd>` invocation: + +```bash +boxel --version +help_output="$(boxel --help)" +for cmd in lint parse test; do + echo "$help_output" | grep -qE "^[[:space:]]+$cmd[[:space:]]" || { + echo "boxel --help is missing the \`$cmd\` subcommand." + echo "Ask the user to install or upgrade @cardstack/boxel-cli:" + echo " pnpm i -g @cardstack/boxel-cli" + exit 1 + } +done +``` + +If verification fails, stop and report. Don't try to install +`boxel` yourself. + +## Picking the next issue + +1. **Search the target realm for every Issue card.** The Issue + card-type is published by the source realm's `darkfactory` + module. Construct the search query with the live tracker module + URL (see "Discovering the tracker module URL" below): + + ```bash + boxel search --realm <target-realm-url> --query '{ + "filter": { + "type": { "module": "<tracker-module-url>", "name": "Issue" } + } + }' --json + ``` + +2. **Filter to eligible issues.** An Issue is eligible if **all** of + the following are true: + - `attributes.status` is `"backlog"` **or** `"in_progress"`. + - **Every** Issue listed in `relationships.blockedBy.*.links.self` + has `attributes.status === "done"`. Resolve each `blockedBy` + `links.self` against the parent issue's `id` (it's a relative + path like `../Issues/foo`); the resulting URL matches the + blocker's `id` from the search index. + - You haven't already tried this issue and exhausted your turn + budget on it in the current run. + +3. **Order the eligible set** and take the first: + - `in_progress` before `backlog` (resume semantics — finish what + you started before starting something new). + - Then by `attributes.priority`: `critical` > `high` > `medium` + > `low`. + - Then by `attributes.order` ascending (lower order first). + +4. **If no Issue is eligible**, you're done. Tell the user, then: + - If every Issue in the realm has `status === "done"`, set the + **Project** card's `projectStatus` to `"completed"` and push + the workspace. + - Otherwise some Issues are stuck (blocked, or blocked-by-blocked). + Report which ones and why. + +## Status transitions you perform + +The agent owns the full status lifecycle in this flow. There is no +orchestrator to flip statuses for you. + +| Transition | When | +| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `backlog` → `in_progress` | The moment you pick the issue up, **before** doing any work on it. | +| `in_progress` → `done` | All required validators have passed AND the workspace has synced cleanly. | +| `in_progress` → `blocked` | You cannot make progress: ambiguous brief, missing dependency, or you've hit one of the validator-loop bail-out limits documented in `software-factory-operations` ("Bailing out" section — 8 iterations per Issue, 3 identical consecutive validator failures, or 5 distinct fix attempts on the same validator without a pass). Always append a comment explaining which limit you hit and what you tried. | +| `blocked` → `backlog` | The user (or a future you) decides to retry — out of this skill's scope. | + +**Always push the workspace after a status change.** The +status flip is local until `boxel realm push <local-dir> +<target-realm-url>` lands it on the realm. Running another search +before the push would read stale state. + +**Never set `status` to a value not listed above** (e.g. `"running"`, +`"completed"`, custom strings). The Issue schema enforces an enum; +introspect it with +`boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default --realm <url> --input '{"codeRef": {"module": "<tracker-module-url>", "name": "Issue"}}'` +if you need the exact allowed values for the realm you're working +against. + +## Updating an Issue card + +Issue cards live at `Issues/<slug>.json` in the workspace (the local +mirror of the target realm). + +**Read before write.** Always `Read` the file first, mutate only the +attributes you intend to change, then `Write` (or `Edit`) the merged +document back. Do not overwrite the entire file with just the +new fields — you'll silently drop the existing attributes (summary, +description, relationships, prior comments). + +```jsonc +// Issues/sticky-note-define-card.json — after status flip +{ + "data": { + "type": "card", + "attributes": { + "issueId": "SN-1", + "summary": "Implement StickyNote card", + "description": "…", // <- DO NOT touch + "status": "in_progress", + "priority": "high", + "order": 1, + "updatedAt": "2026-05-15T10:42:00.000Z", + "comments": [ ... existing comments preserved ... ] + }, + "relationships": { ... preserved ... }, + "meta": { ... preserved ... } + } +} +``` + +**`description` is immutable after creation.** Never modify an +Issue's `description`. To add context — blocked reasons, progress +notes, validation failures — append to `attributes.comments[]` +instead. The comment shape is documented in the +`software-factory-operations` skill. + +## Discovering the tracker module URL + +The orchestrator used to inject this URL into the agent's system +prompt. You now find it yourself: + +- The tracker schema (Project / IssueTracker / Issue / + KnowledgeArticle) is published at + **`<target-realm-server-origin>/software-factory/darkfactory`**. +- Build it from the target realm URL: take the URL's origin, append + `/software-factory/darkfactory`. Example: + - target realm: `http://localhost:4201/alice/my-realm/` + - origin: `http://localhost:4201` + - tracker module URL: `http://localhost:4201/software-factory/darkfactory` +- Verify the URL is reachable by introspecting one of its exports + before relying on it: + ```bash + boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \ + --realm <target-realm-url> \ + --input '{"codeRef": {"module": "<tracker-module-url>", "name": "Issue"}}' + ``` + If this returns a schema, you have the right URL. If it 404s or + returns "module not found", check the source realm — the brief + the user gave you is in a realm published from the same realm + server, so the source realm is reachable at the same origin under + `/software-factory/`. + +Cache the tracker module URL for the rest of the run; it doesn't +change between issues. + +## Building issue context + +Before working an issue, load its surrounding context so you're not +operating blind: + +1. **Project card** — follow the issue's `relationships.project.links.self` + (relative to the issue's `id`). Read the resulting `.json` file + from the workspace; pull `attributes.objective`, + `attributes.successCriteria`, `attributes.technicalContext`, and + any other guiding fields the schema returns. +2. **Knowledge articles** — follow each + `relationships.relatedKnowledge.<n>.links.self` link the same way. + Read each `Knowledge Articles/<slug>.json` and load its + `attributes.content` / `attributes.body` (the schema names the + field) into your working context. +3. **The issue itself** — `attributes.summary`, + `attributes.description`, `attributes.acceptanceCriteria`, and + the existing `attributes.comments[]` (which carry orchestrator + /agent feedback from previous attempts, if any). + +If a relationship link can't be resolved (workspace miss, 404), surface +that to the user — don't silently proceed with partial context. + +## Common pitfalls + +- **Forgetting to push between status flips.** If you flip + `backlog → in_progress`, do your work, flip `in_progress → done`, + and only push once at the end, an observer who searched the realm + mid-flight would never see `in_progress`. Push immediately after + the pickup flip too, so the realm reflects what you're working on. +- **Treating a stale search index as ground truth.** The realm's + source POST returns once writes are durable, but the search index + settles asynchronously. After a sync, the next search may briefly + see the prior state. If a status you just wrote isn't reflected on + the first search, wait a few hundred milliseconds and retry. +- **Picking a `done` or `blocked` issue.** Re-confirm the eligibility + filter (`status ∈ {backlog, in_progress}` AND all blockers `done`) + every cycle. Don't trust an in-memory list from a previous turn. +- **Working a blocked issue without addressing the blocker first.** A + blocker that's `in_progress` or `blocked` isn't done. Skip the + dependent issue until the blocker resolves. + +## See also + +- `software-factory-bootstrap` — what to do **inside** an Issue + whose `issueType` is `bootstrap` (read the brief, create the + Project / IssueTracker / Knowledge Articles / implementation + Issues). +- `software-factory-operations` — what to do **inside** a regular + implementation Issue (write `.gts` / `.test.gts` / instances / + Spec, run validators, fix failures, sync). diff --git a/packages/software-factory/.claude/CLAUDE.md b/packages/software-factory/.claude/CLAUDE.md index cf9aeb8957e..2645d5f3ffb 100644 --- a/packages/software-factory/.claude/CLAUDE.md +++ b/packages/software-factory/.claude/CLAUDE.md @@ -22,20 +22,35 @@ pnpm factory:go --brief-url <url> --target-realm <url> ## Skill loading -The agent's instructions live in `.agents/skills/`. The factory loader -(`src/factory-skill-loader.ts`) walks three directories: - -1. `packages/software-factory/.agents/skills/` — factory-specific skills - (`software-factory-bootstrap`, `software-factory-operations`). -2. `packages/boxel-cli/plugin/skills/` — boxel-cli Claude Code plugin - skills (`boxel-api`, `boxel-command`); same directory the plugin - distributes to end users. -3. monorepo root `.agents/skills/` — general domain skills - (`boxel-development`, `boxel-file-structure`, `ember-best-practices`). - -`packages/software-factory/.claude/skills` is a symlink to -`.agents/skills/` so Claude Code and the factory loader read the same -files. +Two parallel skill paths exist, one per factory run mode: + +- **SDK orchestrator** (`pnpm factory:go`): the loader at + `src/factory-skill-loader.ts` reads from + **`.agents/skills-orchestrator/`** first. Those skills describe the + factory-MCP-tool surface (`signal_done`, `get_card_schema`, + `run_lint`, …) that `ToolUseFactoryAgent` actually provides at + runtime. +- **Interactive Claude Code** (paste the prompt from + `docs/runbook.md`): Claude Code reads + **`.agents/skills/`** via the `.claude/skills` symlink. Those + skills describe the `boxel` CLI surface and the agent-owned + status lifecycle. The interactive flow has no orchestrator + process; the agent drives the loop directly. + +Fallback dirs for both modes (skills that aren't software-factory +specific): + +1. `packages/boxel-cli/plugin/skills/` — boxel-cli Claude Code + plugin skills (`boxel-api`, `boxel-command`); same directory + the plugin distributes to end users. +2. monorepo root `.agents/skills/` — general domain skills + (`boxel-development`, `boxel-file-structure`, + `ember-best-practices`). + +The two software-factory skill sets diverged during CS-11149. They +stay separated until the SDK orchestrator is retired; at that +point the orchestrator code and `.agents/skills-orchestrator/` get deleted +together. ## Architectural principle diff --git a/packages/software-factory/.gitignore b/packages/software-factory/.gitignore index 64b78757e2f..c6281eaaa12 100644 --- a/packages/software-factory/.gitignore +++ b/packages/software-factory/.gitignore @@ -2,3 +2,8 @@ playwright-report/ test-results/ playwright/ .software-factory-cache/ + +# Local workspace mirrors created by `boxel realm pull` during test runs. +# These are sync copies of the target realm; the canonical data lives in +# the realm itself. +factory-test-*/ diff --git a/packages/software-factory/AGENTS.md b/packages/software-factory/AGENTS.md index 513a58d128c..86dc3d8718c 100644 --- a/packages/software-factory/AGENTS.md +++ b/packages/software-factory/AGENTS.md @@ -7,7 +7,16 @@ each issue using native fs tools (`Read` / `Write` / `Edit` / `Glob` / flow). See [README.md](./README.md) for architecture. The agent's loaded -instructions live in `.agents/skills/` (root + this package + `boxel-cli`). +instructions live in two parallel directories: + +- `.agents/skills-orchestrator/` — consumed by `pnpm factory:go` (the SDK + orchestrator). Describes the factory-MCP tool surface. +- `.agents/skills/` — consumed by interactive Claude Code (via the + `.claude/skills` symlink). Describes the `boxel` CLI surface and the + agent-owned status lifecycle. + +Both modes also fall back to `packages/boxel-cli/plugin/skills/` and +the monorepo-root `.agents/skills/` for shared domain skills. ## Commands @@ -24,9 +33,11 @@ instructions live in `.agents/skills/` (root + this package + `boxel-cli`). creates the seed issue, runs the loop. - `src/issue-loop.ts` — inner/outer issue scheduling loop. - `src/factory-skill-loader.ts` — resolves and loads skills from - `packages/software-factory/.agents/skills/` (primary), - `packages/boxel-cli/plugin/skills/` (fallback), and monorepo root - `.agents/skills/` (fallback). + `packages/software-factory/.agents/skills-orchestrator/` (primary — + consumed by `pnpm factory:go`), `packages/boxel-cli/plugin/skills/` + (fallback), and monorepo root `.agents/skills/` (fallback). The + interactive Claude Code path reads `.agents/skills/` directly via + `.claude/skills`. - `src/workspace-fs.ts` — local-filesystem mirror of the target realm; the agent reads/writes here, the orchestrator syncs. - `src/factory-agent/opencode.ts` — agent backend (opencode in passthrough diff --git a/packages/software-factory/docs/runbook.md b/packages/software-factory/docs/runbook.md new file mode 100644 index 00000000000..17858607105 --- /dev/null +++ b/packages/software-factory/docs/runbook.md @@ -0,0 +1,182 @@ +# Interactive Factory Runbook + +A user runs the software factory entirely from inside a logged-in +Claude Code session by pasting **one prompt** that drives the +agent through bootstrap, per-issue implementation, validation, and +project completion in a single end-to-end loop. There is no +orchestrator process; the agent does every step itself. + +## Prerequisites + +The user has done the following outside Claude Code: + +- Installed `@cardstack/boxel-cli` globally and run `boxel profile add` + so the active profile points at the target realm server. + `boxel profile list` shows the active profile. + + ```bash + pnpm i -g @cardstack/boxel-cli + boxel --version + ``` + + `boxel --help` should list `lint`, `parse`, and `test` among the + subcommands. If not, upgrade: `pnpm i -g @cardstack/boxel-cli@latest`. + +- Installed Claude Code and run `/login` so the session is + subscription-billed. `ANTHROPIC_API_KEY` is **not** set in the shell + Claude Code launched from (it would override subscription auth — see + the auth precedence in the Claude Code docs). +- A running realm server reachable at the URL they intend to use. For + local work that is `mise run dev-all` in the boxel monorepo; for + staging/prod they have credentials in their profile. +- `boxel parse` and `boxel test` need a few extra bits available + (monorepo-only, until the realm-server `_parse` endpoint and the + built-in QUnit harness ship in follow-up tickets): + - The host app's `dist/` is built: `pnpm --filter @cardstack/host build`. + - Playwright's headless Chromium is installed: `npx playwright install chromium`. One-time per machine. +- Claude Code is launched from `packages/software-factory/` so the + `.claude/skills` symlink (→ `.agents/skills/`) is discovered. + The agent creates its own scratch workspace inside `mktemp -d` + during the run — you don't need to pre-create one. + +The user also knows: + +- The **brief URL** — a card in the source realm describing what to + build (e.g. `http://localhost:4201/software-factory/Wiki/<brief-slug>`). +- The **target realm URL** they want the factory to create + (e.g. `http://localhost:4201/<username>/<realm-name>/`). + +## The prompt + +A single prompt drives the entire run. The agent bootstraps the +realm, then loops through the implementation Issues it created, +implementing each one through validators + validation cards until +no eligible Issues remain. The user pastes it once and lets the +session run. + +``` +Run the software factory on this brief: + + Brief URL: <BRIEF_URL> + Target realm: <TARGET_REALM_URL> + +Use a fresh temp directory (`mktemp -d`) as the workspace. + +Follow docs/runbook.md end-to-end: + +1. Wire up the dev `boxel` CLI (per the software-factory-bootstrap + or software-factory-scheduling skill — first action). +2. Create the target realm at the URL above and pull it into the + workspace. +3. Read the brief and follow software-factory-bootstrap to write + the Project, IssueTracker, Knowledge Articles, one Issue per + entry-point card the brief describes, plus the bootstrap-seed + Issue. Push the workspace. +4. Hand off to software-factory-scheduling: pick the next + unblocked Issue, follow software-factory-operations to + implement it (.gts + .test.gts + sample instances + Catalog + Spec), then iterate the validator loop: + a. Run all five validators against the realm. + b. After each one, write a + `Validations/<type>_<issue-slug>-<n>.json` artifact card + capturing the result (passed or failed). + c. If any validator failed, fix the source, push, and + re-run the failing validator(s). Write new artifact + cards with the next sequence number — do NOT overwrite + the previous ones. + d. Repeat (a)–(c) until every validator returns + `status: "passed"`. Only then mark the Issue `done`. + Honor the bail-out limits in software-factory-operations' + "Bailing out" section — stop iterating if you hit 8 total + iterations on the Issue, or 3 consecutive identical failures + from the same validator, or 5 distinct fix attempts without + a single pass. When you bail out: set the Issue to + `blocked` with a comment summarizing what failed and what + you tried, push, then move on. Do NOT keep grinding past + those limits. +5. Loop step 4 until no eligible Issues remain (either all + `done`, or the remaining ones are `blocked` per step 4). +6. When the backlog is empty: if every Issue is `done`, set + the Project's `projectStatus` to `completed` and push. If + some Issues are `blocked`, leave `projectStatus` as + `active` and report which ones are stuck and why. + +Stop and report only if you hit a bail-out limit on every +remaining Issue. Otherwise, complete the project in this one +session. +``` + +> **Working through multiple prompts instead.** If you'd rather +> drive the run interactively — pausing between bootstrap and +> implementation, or between each Issue — paste the single prompt +> above and ask the agent to stop after each phase. The skills +> support both modes; the single-prompt form is the default +> because the recipe brief proved it works end-to-end without +> intervention. + +## What the agent calls (and from where) + +| Capability | How the agent invokes it | +| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Realm creation | `boxel realm create <slug> "<display-name>"` (native subcommand; `<slug>` must match `^[a-z0-9-]+$`) | +| Workspace pull / push | `boxel realm pull <url> <dir>` / `boxel realm push <dir> <url>` (realm-sync skill) | +| Federated search | `boxel search --realm <url> --query '<json>'` (boxel-api skill) | +| Card-type schema | `boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default --realm <url> --input '{"codeRef":{"module":"...","name":"..."}}'` | +| Lint | `boxel lint [path] --realm <url>` (whole-realm or single-file) | +| Parse / type-check | `boxel parse [path] --realm <url>` (monorepo-only — glint + JSON validation) | +| Evaluate module | `boxel run-command @cardstack/boxel-host/commands/evaluate-module/default --realm <url> --input '{"moduleIdentifier":"<abs-url>","realmIdentifier":"<abs-url>"}'` | +| Instantiate card | `boxel run-command @cardstack/boxel-host/commands/instantiate-card/default --realm <url> --input '{"moduleIdentifier":"<abs-url>","cardName":"...","realmIdentifier":"<abs-url>","instanceData":"<json>"}'` | +| Run QUnit tests | `boxel test --realm <url>` (monorepo-only — drives headless Chromium) | +| Read transpiled output | `boxel read-transpiled <path> --realm <url>` (for debugging eval/instantiate errors) | +| Write files | native `Write` / `Edit` | +| Read / search workspace | native `Read` / `Glob` / `Grep` | + +There are no factory MCP tools. `signal_done` and +`request_clarification` are replaced by writing the issue's `status` +field directly (`done` / `blocked`) and appending to its `comments[]` +array. + +## Follow-up work + +A few rough edges remain — not blocking, but worth tracking: + +- **`/factory-run` slash command** — wrap the single prompt above + as a slash command (or a `boxel factory run` CLI that spawns + `claude` with the prompt baked in) so the user doesn't paste + prose every time. +- **Realm-server `_parse` endpoint** — replace the monorepo-bound + `boxel parse` with a server-side endpoint mirroring `_lint`, so + the published `boxel-cli` can lint AND parse without checking + out the repo. +- **Dev `boxel-cli` setup automation** — the ~30 lines of bash the + agent runs at the start of every session (find monorepo, rename + stale `dist/`, symlink to PATH) should collapse into a single + per-machine prerequisite, not per-session boilerplate. +- **Orchestrator retirement** — once this runbook has shipped and + run in production for long enough, delete the SDK orchestrator + code (`issue-loop.ts`, `factory-agent/`, etc.). + +## Expected output + +A successful run produces, in the target realm: + +- `Projects/<slug>.json` — the Project card (`projectStatus` ends + up at `completed`). +- `Boards/<slug>.json` — the IssueTracker board linked back to the + Project. +- `Knowledge Articles/<slug>-brief-context.json` and + `<slug>-agent-onboarding.json` — at minimum; more if the brief + warrants. +- `Issues/bootstrap-seed.json` — the bootstrap anchor Issue + (status `done`, issueType `bootstrap`). +- `Issues/<slug>-<card-name>.json` — one Issue per entry-point + card the brief describes (status `done` once implemented). +- `<card-name>.gts` and `<card-name>.test.gts` — the card + definition and its co-located QUnit tests. +- `<CardType>/<instance>.json` — one or more sample instances. +- `Spec/<card-name>.json` — the Catalog Spec card linking the + sample instances via `linkedExamples`. +- `Validations/lint_<issue-slug>-<n>.json`, + `parse_…`, `eval_…`, `instantiate_…`, `test_…` — one validation + artifact card per validator per iteration (sequence numbers + increment on retry; old cards are kept). diff --git a/packages/software-factory/src/factory-skill-loader.ts b/packages/software-factory/src/factory-skill-loader.ts index c789c7d1699..599804efc7b 100644 --- a/packages/software-factory/src/factory-skill-loader.ts +++ b/packages/software-factory/src/factory-skill-loader.ts @@ -9,7 +9,18 @@ import type { IssueData, ProjectData, ResolvedSkill } from './factory-agent'; const PACKAGE_ROOT = resolve(__dirname, '..'); const MONOREPO_ROOT = resolve(PACKAGE_ROOT, '../..'); -const DEFAULT_SKILLS_DIR = join(PACKAGE_ROOT, '.agents', 'skills'); +/** + * The SDK orchestrator and the new interactive Claude Code path each get + * their own copies of `software-factory-bootstrap` / `software-factory-operations`. + * The orchestrator loads from `.agents/skills-orchestrator/`; its skills still describe + * the factory-MCP-tool surface (`signal_done`, `get_card_schema`, `run_lint`, …) + * that the orchestrator's `ToolUseFactoryAgent` actually provides. Interactive + * Claude Code reads from `.agents/skills/` via the `.claude/skills` symlink; + * those skills describe the `boxel` CLI surface and the agent-owned status + * lifecycle. The two diverged during CS-11149 and need to stay separated + * until the orchestrator is retired. + */ +const DEFAULT_SKILLS_DIR = join(PACKAGE_ROOT, '.agents', 'skills-orchestrator'); /** * Additional skill search directories, checked in order when a skill is not