Skip to content

Retry mechanism for failed submission workflows#4631

Merged
richardhjtan merged 4 commits intomainfrom
CS-10975-retry-on-submission-failure
May 5, 2026
Merged

Retry mechanism for failed submission workflows#4631
richardhjtan merged 4 commits intomainfrom
CS-10975-retry-on-submission-failure

Conversation

@richardhjtan
Copy link
Copy Markdown
Contributor

@richardhjtan richardhjtan commented May 4, 2026

Summary

Adds a user-driven Retry on the submission workflow card so failed submissions can be re-run without losing the original Matrix room, workflow card, or the already-created PrCard.

The trigger was the failure mode shown below: bot-runner timeout / fatal worker error during PR creation / PR creation error, which left the workflow card in a red error state with no recovery path.

Example
Screenshot 2026-05-04 at 3 53 37 PM

What changed

New event type & flow

  • pr-listing-retry — separate bot trigger event, dispatched to its own handler in command-runner.ts. The bot-runner reads the workflow card to recover roomId, listingId, persisted branchName, and any already-linked PrCard.
  • If a PrCard is already linked, retry skips collect-submission-files + create-pr-card, fetches the existing PrCard via fetch-card-json, and resumes at the GitHub branch / commit / open-PR step. No duplicate PrCard, no expensive re-collect.
  • If no PrCard yet, retry runs the full flow from scratch.
  • Both create and retry paths now flow through one shared orchestrator (runWorkflow) with each step in its own method (collectFiles, createPrCard, loadExistingPrCard, pushToGitHub, linkPrCardOnWorkflow). StepError tags failures with which step blew up — replaces the previous mutable failedStep variable.

Workflow card

  • New failedStep field surfaces which step failed (collect-files / lint / create-pr-card / github-pr), so the UI can show step-specific status text instead of a generic "PR creation failed".
  • roomId and branchName are now persisted on the card at creation time so retry can re-emit the trigger event into the same Matrix room and target the same GitHub branch as the original attempt.
  • New Retry button in the create-PR step block, gated on prCreationError / failedStep / lintStatus === 'failed' and the presence of commandContext + roomId. Optimistically clears the error attributes on click for instant UI feedback.

Host

Bot setup

  • setup-submission-bot.ts registers a new pr-listing-retry row in bot_commands so retry events pass the bot's content-type allowlist. Existing environments must re-run this script.

GitHub handler

  • getCreateListingPRContext prefers explicit input.branchName over recomputing from listingName. Guarantees retry targets the same branch as the original create.

Cross-realm safety

  • Bot-runner resolves relative links.self URLs (e.g. "../CardListing/abc") against the workflow card URL when building the retry context, so listingId reaches collect-submission-files as an absolute URL.

Bot-runner refactor (review feedback)

Per @habdelra: command-runner.ts is meant to be command-agnostic but had hardcoded knowledge of pr-listing-* events.

  • New BotCommandHandler interface in command-runner.ts; CommandRunner now takes handlers[] at construction and dispatches to the first match.
  • All listing-PR specifics (constants, types, step methods, GitHub/lint deps) moved to a new PrListingWorkflowHandler implementing BotCommandHandler.
  • Both listing-PR files grouped under lib/pr-listing/ so the orchestrator + GitHub leaf are co-located.
  • timeline-handler.ts wires the handler in via makeEnqueueRunCommand(...).

command-runner.ts is ~150 lines with zero references to any specific command. Deleting lib/pr-listing/ tomorrow leaves it compiling unchanged.

Tests

  • New: pr-listing-retry with existing PrCard skips collect-files + create-pr-card — verifies the skip-step path and asserts the GitHub branch matches the persisted branchName (regression for the title-vs-listingName branch-name bug).
  • New: pr-listing-create failure tags workflow card with failedStep.
  • Updated: existing pr-listing-create happy-path assertion now expects the success patch to also clear prCreationError / failedStep.

All 38 bot-runner tests pass.

Test plan

  • Re-run pnpm run setup-submission-bot to register pr-listing-retry.
  • Happy retry path (skip steps 1+2): invalidate SUBMISSION_BOT_GITHUB_TOKEN → submit → see workflow card fail at github-pr step with PrCard linked → restore token → click Retry → bot-runner logs pr-listing-retry: reusing existing PrCard … → PR opens.
  • Idempotency: double-click Retry — second click is a no-op while first is in flight.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a user-driven retry path for failed submission workflows so a submission can be re-run without losing the original Matrix room/workflow card (and without duplicating an already-created PrCard). This extends the existing bot-triggered PR creation pipeline with a dedicated pr-listing-retry event and shared workflow orchestration in the bot-runner.

Changes:

  • Introduces pr-listing-retry support end-to-end (Matrix bot allowlist + host retry command + bot-runner handler/orchestrator).
  • Persists roomId + branchName on the SubmissionWorkflowCard and adds a UI Retry button with in-flight state.
  • Refactors bot-runner PR workflow into step methods with failedStep tagging and adds tests for retry + failure tagging.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/matrix/scripts/setup-submission-bot.ts Registers pr-listing-retry in bot command allowlist.
packages/host/app/commands/retry-submission-workflow.ts New host command to clear failure state and emit retry trigger into the original room.
packages/host/app/commands/index.ts Wires the new retry command into shims and command class list.
packages/host/app/commands/create-submission-workflow.ts Creates Matrix room earlier and persists roomId/branchName; passes branchName to bot trigger.
packages/catalog-realm/submission-workflow-card/submission-workflow-card.gts Adds failedStep, step-specific failure text, Retry button + retry UX state; changes “View listing” navigation behavior.
packages/bot-runner/tests/command-runner-test.ts Adds retry-with-existing-PrCard test and failedStep tagging test; updates happy-path patch expectations.
packages/bot-runner/lib/create-listing-pr-handler.ts Prefers explicit input.branchName over recomputing from listing name.
packages/bot-runner/lib/command-runner.ts Adds retry handler, shared orchestrator with step tagging, fetch-card-json support, and failure recording with failedStep.
packages/base/command.gts Adds RetrySubmissionWorkflowInput type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/host/app/commands/create-submission-workflow.ts
Comment thread packages/catalog-realm/submission-workflow-card/submission-workflow-card.gts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Preview deployments

Host Test Results

    1 files  ±0      1 suites  ±0   2h 1m 37s ⏱️ + 1m 7s
2 564 tests ±0  2 549 ✅ +14  15 💤 ±0  0 ❌ ± 0 
2 583 runs  ±0  2 568 ✅ +28  15 💤 ±0  0 ❌  - 14 

Results for commit d7f7456. ± Comparison against earlier commit 1c669a5.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   18m 53s ⏱️ + 1m 40s
1 234 tests ±0  1 234 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 306 runs  ±0  1 306 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit d7f7456. ± Comparison against earlier commit 1c669a5.

@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch from 51b3229 to 67b7163 Compare May 4, 2026 09:03
@richardhjtan richardhjtan requested a review from Copilot May 4, 2026 09:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/host/app/commands/create-submission-workflow.ts Outdated
Comment thread packages/catalog-realm/submission-workflow-card/submission-workflow-card.gts Outdated
@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch from 67b7163 to 479aba8 Compare May 4, 2026 09:26
@richardhjtan richardhjtan requested a review from Copilot May 4, 2026 09:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment thread packages/bot-runner/lib/command-runner.ts Outdated
@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch from 479aba8 to 12f6cb3 Compare May 4, 2026 11:39
@richardhjtan richardhjtan requested a review from Copilot May 4, 2026 11:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/host/app/commands/create-submission-workflow.ts Outdated
@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch from 12f6cb3 to c6195cf Compare May 4, 2026 11:59
@richardhjtan richardhjtan marked this pull request as ready for review May 4, 2026 13:02
@richardhjtan richardhjtan requested review from a team May 4, 2026 13:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6195cfd3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/bot-runner/lib/command-runner.ts Outdated
Comment on lines +130 to +137
// Inline orchestrators for the listing-PR workflow. These are workarounds
// until user-based proxy requests exist (would let users register auth
// tokens for external API calls instead of routing through the bot).
if (eventContent.type === PR_LISTING_CREATE) {
return await this.handlePrListingCreate(runAs, eventContent);
}
if (eventContent.type === PR_LISTING_RETRY) {
return await this.handlePrListingRetry(runAs, eventContent);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here and not in the individual command handler?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to be careful that this module doesn't end up as a dumping ground for specific command logic. this module is meant to be command agnostic. any logic that you have for a specific command should not live here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@habdelra Agreed. This is pre-existing tech debt (there's already a // TODO: inline handling for 'pr-listing-create' is a workaround on main), but adding pr-listing-retry does make it worse. I'll handle it in this PR

'@cardstack/boxel-host/commands/fetch-card-json/default';

const PR_LISTING_CREATE = 'pr-listing-create';
const PR_LISTING_RETRY = 'pr-listing-retry';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we hard coding knowledge about specific commands here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to refactor this module and get rid of ALL these consts for specific commands

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in the latest commit, hope the approach matches the vision!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

packages/host/app/commands/create-submission-workflow.ts:153

  • SendBotTriggerEventCommand can throw (e.g. missing Matrix userId, invalid payload, or matrixService.sendEvent failure). If this happens after the room + workflow card are created, the workflow card won’t be marked failed and the user may be left with an orphaned workflow card/room and no in-card retry path. Consider wrapping the send in try/catch and patching the workflow card with prCreationError (and possibly failedStep) when the trigger dispatch fails.
    await new SendBotTriggerEventCommand(this.commandContext).execute({
      roomId,
      realm,
      type: 'pr-listing-create',
      input: {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/host/app/commands/retry-submission-workflow.ts Outdated
Comment thread packages/host/app/commands/create-submission-workflow.ts
Comment thread packages/bot-runner/lib/pr-listing/pr-listing-workflow-handler.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

packages/host/app/commands/create-submission-workflow.ts:153

  • If SendBotTriggerEventCommand fails after the workflow card is created/opened, the card will remain without a recorded error state and (per the PR description) may not present a Retry path. Consider wrapping the bot-trigger send in a try/catch and patching the workflow card with a failure message (and/or failedStep) when the send fails so the UI can surface recovery.
    await new SendBotTriggerEventCommand(this.commandContext).execute({
      roomId,
      realm,
      type: 'pr-listing-create',
      input: {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 91 to 93
@@ -70,14 +92,12 @@ export class CommandRunner {
return;
}
Comment on lines +261 to +270
private async runFreshPrCardFlow(ctx: WorkflowContext): Promise<PrCardData> {
let { textFiles, binaryFiles, totalCount } = await runStep(
'collect-files',
() => this.collectFiles(ctx),
);
await runStep('lint', () => this.applyLintSkip(ctx, totalCount));
let { prCardResult, prCardUrl } = await runStep('create-pr-card', () =>
this.createPrCard(ctx, textFiles, totalCount),
);
return { prCardResult, prCardUrl, binaryFiles };
@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch 4 times, most recently from b6cd864 to 1c669a5 Compare May 5, 2026 09:15
Address review feedback that command-runner.ts is meant to be
command-agnostic but had hardcoded knowledge of pr-listing-create
and pr-listing-retry (constants, types, step methods, GitHub/lint
dependencies).

- Add BotCommandHandler interface and makeEnqueueRunCommand factory
  to command-runner.ts; CommandRunner now takes handlers[] and
  dispatches to the first match (or falls back to generic enqueue).
- Move all listing-PR specifics — command strings, event types,
  WorkflowContext, StepError, step methods, helpers — into a new
  pr-listing-workflow-handler.ts implementing BotCommandHandler.
- Wire PrListingWorkflowHandler into timeline-handler.ts.
- Test setup goes through a small makeRunner() helper.
@richardhjtan richardhjtan force-pushed the CS-10975-retry-on-submission-failure branch from 1c669a5 to d7f7456 Compare May 5, 2026 12:48
@richardhjtan richardhjtan merged commit 02f35fb into main May 5, 2026
73 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants