Skip to content

Agentic loop stability fixes from repeated e2e testing#4394

Open
habdelra wants to merge 5 commits intomainfrom
agentic-loop-stability-fixes
Open

Agentic loop stability fixes from repeated e2e testing#4394
habdelra wants to merge 5 commits intomainfrom
agentic-loop-stability-fixes

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Apr 13, 2026

Summary

Addresses state transition issues, testing infrastructure bugs, and consistency problems discovered through repeated end-to-end cycling of the software factory flow (factory:go → bootstrap → implement → validate). Each fix was validated by running the full e2e loop (10 runs).

Orchestrator owns issue status transitions

The agent can no longer set issue status to done or in_progress — the loop manages these transitions:

  • backlogin_progress: when the loop picks up an issue
  • in_progressdone: when the agent calls signal_done AND validation passes
  • Agent can only set blocked (can't proceed) or backlog (unblock)

Previously the agent could mark issues done before validation passed, causing the loop to exit with failing tests.

Card update tools use read-patch-write

update_issue, update_project, and create_knowledge now read the existing card, merge provided attributes, and write back — instead of creating a fresh document that clobbers existing attributes. This was a critical bug: the bootstrap-seed issue was reduced to just status and updatedAt after an update.

Fix darkfactory module URL in buildCardDocument

buildCardDocument() was constructing adoptsFrom.module from the target realm URL instead of the software-factory realm. This caused refreshIssue() searches (which filter by type module) to not find updated cards, making the loop think issues were still in backlog after being marked done.

Fix Playwright waitForFunction timeout

The timeout was passed as the second argument (treated as arg) instead of the third (options). Playwright fell back to its 30s default, causing test runs that took 30-50s to silently time out instead of completing.

Remove run_tests from agent tools

The validation pipeline runs tests automatically after every agent turn. Giving the agent the run_tests tool caused it to waste iterations running tests manually (sometimes before writing test files), creating duplicate TestRun instances, and even attempting to run Node.js scripts via run_command.

Other fixes

  • ensureJsonExtension: Card tool paths normalized with .json (realm API card+source requires full path with extension)
  • Project auto-completion: Project status set to completed when all issues done
  • Status badges: Issue and Project cards show colored status badges in fitted + isolated templates (matching TestRun pattern)
  • TestRun 0/0 empty state: Shows gray "empty" badge instead of green "passed" when no tests ran
  • Fix uneven red border: Failing test module group border is now uniform
  • Issue linked-card: Project in Issue isolated view uses @format='embedded' to prevent layout occlusion
  • Drop "MVP" suffix: Project naming convention simplified
  • run_command description: Explicitly states "Boxel host commands ONLY"
  • Updated phase-2-plan.md: Documents all validated design decisions

Test plan

  • pnpm lint passes (lint:js, lint:types, lint:format)
  • pnpm test — 352 unit tests pass (17 new tests covering status transitions, auto-mark, revert, project completion, read-patch-write, status stripping)
  • 10 end-to-end runs (smoke-test-1 through smoke-test-10) validating the full factory:go flow

🤖 Generated with Claude Code

Addresses state transition issues, testing infrastructure bugs, and
consistency problems discovered through repeated end-to-end cycling of
the software factory flow (factory:go → bootstrap → implement → validate).

Key changes:

- Orchestrator owns issue status transitions (backlog → in_progress → done)
  rather than letting the agent set status directly, which caused premature
  exits when validation was still failing
- Card update tools (update_issue, update_project, create_knowledge) use
  read-patch-write instead of full document replacement, preventing attribute
  clobbering on partial updates
- Fix buildCardDocument to use correct darkfactoryModuleUrl from
  software-factory realm instead of target realm, which broke issue refresh
  searches by type
- Fix Playwright waitForFunction timeout argument position (was being treated
  as arg instead of options, causing 30s default instead of configured 5min)
- Remove run_tests from agent tool set — validation pipeline handles test
  execution automatically after each agent turn
- Add ensureJsonExtension helper for card tool paths (card+source requires
  full file path with extension)
- Auto-mark project as completed when all issues are done
- Set issue to in_progress when loop picks it up
- Status badges on Issue and Project cards (fitted + isolated templates)
- TestRun 0/0 passed shows gray "empty" badge instead of green
- Fix uneven red border on failing test module groups
- Issue linked-card in isolated view uses embedded format to prevent occlusion
- Drop "MVP" from project naming convention
- Update prompts, skills, and phase-2-plan.md with validated design decisions
- 352 unit tests passing, all covering new behaviors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c31b802e40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes the software-factory agentic issue loop based on repeated end-to-end runs by tightening ownership of status transitions, preventing card-clobbering updates, fixing darkfactory module URL usage, and improving validation/test execution reliability.

Changes:

  • Move issue status transitions (“in_progress”, “done”) to the orchestrator (loop) and restrict agent status updates to “blocked/backlog” + signal_done intent.
  • Update card-writing tools (update_issue, update_project, create_knowledge) to read-patch-write and normalize JSON paths with .json.
  • Fix infra/UI reliability issues (Playwright waitForFunction timeout arg position, darkfactory module URL threading, test result/status badge rendering).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
packages/software-factory/tests/issue-loop.test.ts Adds/adjusts tests for orchestrator-owned transitions and project completion behavior.
packages/software-factory/tests/factory-tool-executor.spec.ts Threads darkfactoryModuleUrl into tool building in tests.
packages/software-factory/tests/factory-tool-builder.test.ts Updates tool builder tests for read-patch-write behavior and removes run_tests coverage.
packages/software-factory/src/test-run-execution.ts Fixes Playwright waitForFunction timeout handling and increases timeout.
packages/software-factory/src/realm-operations.ts Adds ensureJsonExtension() helper for realm card paths.
packages/software-factory/src/issue-scheduler.ts Uses .json normalization for issue updates; adds project status update support.
packages/software-factory/src/issue-loop.ts Implements orchestrator-owned in_progress/done transitions + project completion hook.
packages/software-factory/src/factory-tool-builder.ts Adds darkfactoryModuleUrl, removes run_tests tool, enforces status stripping + read-patch-write + .json normalization.
packages/software-factory/src/factory-issue-loop-wiring.ts Threads darkfactoryModuleUrl into tool-builder config.
packages/software-factory/src/darkfactory-schemas.ts Fixes buildCardDocument to use the software-factory darkfactory module URL (not target realm).
packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Updates smoke test config for darkfactoryModuleUrl.
packages/software-factory/realm/test-results.gts Improves status display (adds “empty” state) and refines module group styling.
packages/software-factory/realm/darkfactory.gts Adds status badges and adjusts Issue embedded Project rendering.
packages/software-factory/prompts/ticket-implement.md Updates agent workflow ordering + clarifies signal_done and status ownership.
packages/software-factory/prompts/system.md Removes mention of run_tests and documents post-signal_done validation pipeline.
packages/software-factory/prompts/bootstrap-implement.md Updates bootstrap instructions (paths/naming + signal_done instead of status updates).
packages/software-factory/docs/phase-2-plan.md Documents validated design decisions and new loop/tool semantics.
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md Updates skills to match orchestrator-owned testing/status semantics.
packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md Updates bootstrap skill docs (paths/naming + signal_done).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

habdelra and others added 2 commits April 13, 2026 16:22
The test was passing status: 'in_progress' to update_issue, but the
tool now only allows 'blocked' and 'backlog' (orchestrator owns other
transitions). Updated test to use 'blocked'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Gate project completion on all realm issues being done (not just processed ones)
- Read-patch-write: only create fresh doc on confirmed 404, error on other failures
- Extract shared readPatchDocument helper for update_project/update_issue/create_knowledge
- Copy attributes before mutating to avoid side effects on caller's args
- Fix test comment about removed buildRunTestsTool
- Add note that we expect one Project per realm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
habdelra and others added 2 commits April 13, 2026 17:23
Remove -mvp suffix from example path to match updated naming convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
listIssues() returns [] on search failures, and [].every() is true,
which would falsely mark the project completed. Require non-empty list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants