Skip to content

ci(llm-gate): mirror conclusion to PR head SHA on workflow_dispatch#288

Merged
stevenobiajulu merged 1 commit into
mainfrom
tweak-ci-llm-gate-mirror-check-run-on-workflow-dispatch
May 31, 2026
Merged

ci(llm-gate): mirror conclusion to PR head SHA on workflow_dispatch#288
stevenobiajulu merged 1 commit into
mainfrom
tweak-ci-llm-gate-mirror-check-run-on-workflow-dispatch

Conversation

@stevenobiajulu
Copy link
Copy Markdown
Member

Summary

Ports the LLM-gate check-mirror fix from open-agreements (open-agreements/open-agreements#393, commit 60582ad) to safe-docx. Three surgical edits to .github/workflows/llm-based-quality-gate.yml:

  1. Add checks: write to the aggregate job's permissions: block.
  2. Add id: compose to the "Compose and post checklist comment" step.
  3. Append a "Mirror conclusion to PR head SHA (workflow_dispatch)" step that POSTs an explicit check_run to /repos/.../check-runs.

Why

When the LLM gate is re-fired via workflow_dispatch (the documented manual re-trigger after pushing fixes for a WARN), GitHub's PR statusCheckRollup — which branch protection consults — does not include workflow_dispatch-triggered check_runs. The check appears green in the Checks API but the PR stays mergeStateStatus: BLOCKED. The only workaround today is toggling PR draft status (or the llm-gate/override label) to force a pull_request event.

The explicit check_run POST DOES appear in statusCheckRollup, so branch protection sees it.

Living evidence in open-agreements

PR open-agreements/open-agreements#360 has two "Aggregate and post review" entries on its head SHA — one auto-generated, one mirrored. That's the in-the-wild proof the same fix works.

Behavior under existing knobs

  • Normal PR triggers (opened, ready_for_review, label toggle): mirror step is skipped by its github.event_name == 'workflow_dispatch' guard — single check entry, unchanged behavior.
  • skip_gate (package-lock-only PRs): the compose step still succeeds with a SKIPPED comment, so the mirror posts a passing check_run — same semantics branch protection would see from a normal skip_gate PR-triggered run.
  • WARN rows in blocking mode: compose calls core.setFailed(), mirror's always() guard runs it anyway, and COMPOSE_OUTCOME != "success" produces a failing check_run — fails closed on WARN.

Test plan

  • Wait for CI on this PR to confirm no regression on the pull_request path (single "Aggregate and post review" check as today).
  • After merge, on any open safe-docx PR, run gh workflow run llm-based-quality-gate.yml -R UseJunior/safe-docx -f pr_number=<N> and verify:
    • Two "Aggregate and post review" check_runs appear on the PR head SHA (one auto, one mirrored).
    • gh pr view <N> --json statusCheckRollup shows the mirrored check (the auto-generated one will not appear in the rollup — that's the bug being worked around).
    • mergeStateStatus is CLEAN (assuming other required checks pass), not BLOCKED.
  • Negative test on a future WARN: confirm the mirrored check is a failure (not success) when the gate finds WARN rows in blocking mode.

When the LLM-Based Quality Gate is re-fired via `workflow_dispatch`
(e.g. after pushing fixes for a WARN, since the gate intentionally
does not re-run on `synchronize` for cost reasons), the auto-generated
check_run for the "Aggregate and post review" job is attached to the
PR's head SHA and appears green in the Checks API — but GitHub's PR
`statusCheckRollup` (which branch protection consults) does NOT
include workflow_dispatch-triggered check_runs.

Result: the required check appears green but the PR stays
`mergeStateStatus: BLOCKED`. The only workaround today is toggling
PR draft status (or the `llm-gate/override` label) to force a
`pull_request` event.

This ports the fix already shipped to the open-agreements LLM gate
(open-agreements/open-agreements#393, where open-agreements PR 360
has the in-the-wild "2× Aggregate and post review" entries as living
proof). Three surgical edits to the aggregate job:

1. Add `checks: write` permission so the mirror step can POST to
   /repos/.../check-runs without 403.
2. Add `id: compose` to the "Compose and post checklist comment"
   step so its outcome is addressable from the mirror step.
3. Append a "Mirror conclusion to PR head SHA (workflow_dispatch)"
   step that POSTs an explicit check_run with the resolved head SHA.
   This explicit POST DOES appear in `statusCheckRollup`, so branch
   protection sees the check.

The mirror step is guarded by
`if: always() && github.event_name == 'workflow_dispatch' && needs.parse-checklist.outputs.head_sha != ''`
so it has no effect on normal `pull_request` triggers. `always()` is
critical: when the gate finds WARN rows and the github-script step
calls core.setFailed(), the default `if: success()` would skip the
mirror — the conclusion is derived from `steps.compose.outcome`, so
a failed gate still produces a failed mirror (fails closed on WARN).

The mirror's behavior under safe-docx's `skip_gate` (package-lock-only
PRs) is correct as-is: when skip_gate is true the compose step still
succeeds with a SKIPPED comment, so the mirror posts a passing
check_run — same semantics branch protection would see from a normal
skip_gate PR-triggered run.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
site Ready Ready Preview, Comment May 31, 2026 12:27am

Request Review

@github-actions github-actions Bot added the ci label May 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

LLM-Based Quality Gate

Overall: ✅ PASS (14 pass · 0 warn · 14 total)

Check Verdict
read_file response metadata parity The PR only modifies the GitHub workflow in .github/workflows/llm-based-quality-gate.yml and does not touch the read_file tool or response metadata.
Live DOM namespace-safe OOXML writes The PR only modifies the GitHub actions workflow in .github/workflows/llm-based-quality-gate.yml and does not touch comments.ts or write prefixed OOXML attributes/elements.
Deleted field markup keeps w:fldChar outside w:del The PR only modifies the GitHub actions workflow file in .github/workflows/llm-based-quality-gate.yml and does not touch field atomization or OOXML manipulation logic.
Field validation per story, not global The PR only modifies the GitHub workflow in .github/workflows/llm-based-quality-gate.yml and does not touch any field validation or story splitting logic.
Revision IDs seeded from all revision-bearing side parts The PR only modifies '.github/workflows/llm-based-quality-gate.yml' and does not touch 'packages/docx-mcp/src/session/manager.ts', revision-ID allocation, or MCP tools.
Accept/reject sweep side parts and caches The PR only modifies the LLM-based quality gate workflow in .github/workflows/llm-based-quality-gate.yml and does not touch any of the revision-tracking, accept/reject, or side-part logic.
DocumentViewNode.heading stays canonical The PR only modifies the workflow configuration in .github/workflows/llm-based-quality-gate.yml and does not touch document_view.ts, heading heuristics, or heading normalization.
AI-author parity across entry points The PR only modifies the GitHub actions workflow in .github/workflows/llm-based-quality-gate.yml and does not touch session management, docx-mcp source files, or SessionManager entry points.
Property-change wrapper discipline The PR only modifies .github/workflows/llm-based-quality-gate.yml and does not touch any source files related to property-change wrappers or formatting.
SUPPORT.md Table A drift vs. implementation The PR only modifies the GitHub Actions workflow in .github/workflows/llm-based-quality-gate.yml and does not touch OOXML revision emission behavior or packages/docx-core/SUPPORT.md.
Table A / Table B boundary on side-part revisions The PR only modifies workflow configuration in .github/workflows/llm-based-quality-gate.yml and does not touch comments, footnotes, or other side-part primitives.
Canonical-emission surface completeness The PR only modifies the GitHub workflow in .github/workflows/llm-based-quality-gate.yml and does not touch any tracked-edit surface in packages/docx-core/src/primitives/ or packages/docx-mcp/src/tools/.
Lean predicate drift against engine semantics (asymmetric) The PR only modifies the LLM-based quality gate workflow in .github/workflows/llm-based-quality-gate.yml and does not touch field-wrapper semantics, the proof boundary, or atomizer behavior.
Unit-test quality (avoid tautological / change-detector tests) The PR only modifies the GitHub workflow file '.github/workflows/llm-based-quality-gate.yml' and does not add or modify any test files.
Full checklist questions
  1. read_file response metadata parity: If this PR touches packages/docx-mcp/src/tools/read_file.ts, budgeted pagination returns, or additive response metadata like warnings / comment_load_error, do every successful return path (default budgeted early return, non-budget fallthrough, explicit limit/node_ids) preserve the same additive diagnostic fields? read_file has multiple success exits; diagnostics have already disappeared on one path before. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 surfaced comment_load_error, fix(docx-mcp): warn when read_file budget is exceeded by a single node (closes #184) #186 added an early budget return + warnings, fix(docx-mcp): surface comment_load_error on the default budgeted read path (closes #189) #191 fixed the missing comment_load_error on the default budgeted path.

  2. Live DOM namespace-safe OOXML writes: If this PR touches packages/docx-core/src/primitives/comments.ts or writes prefixed OOXML attributes/elements (w14:*, w15:*, xmlns:*, comments.xml, commentsExtended.xml, people.xml), are prefixed OOXML names written with namespace-aware APIs — root aliases bound with setAttributeNS(XMLNS_NS, ...), prefixed attributes with setAttributeNS(W14_NS/W15_NS, ...), and is there a test that proves the live DOM works before serialization/reparse? String-prefixed attributes can serialize plausibly while the live DOM still throws namespace errors. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 (xmlns:w14/w15 declared on comments root before writing prefixed attrs).

  3. Deleted field markup keeps w:fldChar outside w:del: If this PR touches field atomization, validateFieldStructure, hasFldCharInsideDel, w:fldChar, w:instrText, w:delInstrText, or collapsed field comparison logic, does deleted field output stay ECMA-376-conformant — w:fldChar sibling-level (never inside w:del), deleted instructions use w:delInstrText only inside valid delete wrappers, accept/reject safety checks still reject malformed combined output? Word treats deleted field-state markup in the wrong container as document-corrupting. References: fix(docx-core): validate w:delInstrText placement and reject w:fldChar inside <w:del> #211, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228.

  4. Field validation per story, not global: If this PR touches packages/docx-core/src/baselines/atomizer/pipeline.ts, splitStories, validateFieldStructure, side-part merge logic, or footnote/endnote field handling, is field validation run independently per ECMA story (document.xml, each footnote, each endnote), with sidecars from both original and revised archives considered, and global counter balance not treated as sufficient? A document can be globally balanced but have an invalid field sequence inside one story. References: fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228, feat(docx-core): sweep side-part revisions on accept/reject #218.

  5. Revision IDs seeded from all revision-bearing side parts: If this PR touches packages/docx-mcp/src/session/manager.ts (especially getRevisionContextForSession or FIXED_REVISION_ID_SEED_PARTS), createRevisionContext, revision-ID allocation, or MCP tools that create tracked changes/comments/footnotes, does revision-ID allocation scan all relevant package parts before issuing new IDs — comments, footnotes, endnotes, glossary, headers, footers — ignore non-revision w:id values (comment IDs, bookmarks), and handle malformed optional parts gracefully? Revision IDs are package-wide; document-only seeding collides with existing side-part revisions. Reference: fix(docx-mcp): seed revision ids from side parts #216 (seed revision ids from side parts).

  6. Accept/reject sweep side parts and caches: If this PR touches DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, does accept/reject process every revision-bearing story — updating document.xml + footnotes.xml + endnotes.xml + comments.xml, writing back only changed side parts while refreshing cached XML, and pruning orphan footnotes without deleting reserved separator entries? Accepting only in the main document leaves stale revisions and dangling references in the package. References: feat(docx-core): sweep side-part revisions on accept/reject #218, fix(docx-mcp): seed revision ids from side parts #216, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225.

  7. DocumentViewNode.heading stays canonical: If this PR touches packages/docx-core/src/primitives/document_view.ts, HeadingValue, heading heuristics, ListMetadata.header_style, or Google Docs document-view heading normalization, does node.heading remain a structural heading signal — exact Word styles Heading1Heading6 win, heuristic sources suppressed inside table cells while real Word heading styles still pass, ordinary body paragraphs omit the heading key? Consumers use node.heading != null as a structural test; heuristic false positives break downstream navigation. References: fix(docx-core): harden heading detection (#157 Phase 1) #178, fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188, feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190.

  8. AI-author parity across entry points: If this PR touches packages/docx-mcp/src/server.ts, packages/docx-mcp/src/cli/tool_runner.ts, packages/docx-mcp/src/cli/commands/**, or adds any new new SessionManager(...) call site in docx-mcp, does every entry point that constructs a SessionManager resolve SAFE_DOCX_AI_AUTHOR with the same three-way semantics (set → use it; empty string → opt out to untracked; unset → defaultAiAuthor), or has a new entry path silently bypassed tracked emission? Each entry path looks locally correct while diverging from another; tracked emission has gone dark in one path before anyone noticed. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (production MCP wiring would have kept tracked emission dark), fix(docx-mcp): honor SAFE_DOCX_AI_AUTHOR in CLI entry points (#181) #182 (CLI runners constructing bare SessionManager() silently produced untracked edits).

  9. Property-change wrapper discipline: If this PR touches packages/docx-core/src/primitives/layout.ts, packages/docx-core/src/primitives/text.ts, packages/docx-mcp/src/tools/clear_formatting.ts, or packages/docx-core/src/primitives/track-changes-emitter.ts, do tracked formatting/property edits emit exactly one correct *PrChange wrapper (pPrChange / rPrChange / trPrChange / tcPrChange) carrying a snapshot of the prior live properties — not stacking stale wrappers, not stripping valid historical children (cellIns/cellDel/cellMerge), and not omitting the snapshot when the operation is formatting-aware? Emitted OOXML is visually plausible but subtle snapshot mistakes only surface during later accept/reject or in Word's tracked-changes UI. References: feat(docx-core): emit pPrChange/trPrChange/tcPrChange from layout setters (#140) #167 (duplicate pPrChange/trPrChange/tcPrChange stacking + over-broad tcPr exclusion), feat(docx-mcp): emit rPrChange from clear_formatting MCP tool (#141) #170 (clear_formatting failing to strip stale rPrChange), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (rPrChange for formatted paragraph replacements + filtering nested stale records).

  10. SUPPORT.md Table A drift vs. implementation: If this PR modifies OOXML revision emission behavior (w:ins, w:del, w:rPrChange, etc.) in packages/docx-core/src/primitives/**, or touches packages/docx-core/SUPPORT.md, does the PR symmetrically update Table A in SUPPORT.md when the supported revision-emission surface in primitives changed — added, removed, or weakened — or is the documented contract now lying about what's supported? Reviewers focus on TS AST correctness and golden tests; Markdown contract tables get treated as an afterthought, so the documented surface drifts from the actual surface. Reference: [120.8] Regression suite for canonical revision emission across the surface #143 review caught replaceParagraphTextRange should emit w:rPrChange when run formatting changes #173 (formatting mismatch in Table A) and addCommentReply should emit body revision markup OR SUPPORT.md should be softened #174 (comment body revision omission forcing a Table A softening) late in peer review.

  11. Table A / Table B boundary on side-part revisions: If this PR touches packages/docx-core/src/primitives/comments.ts, packages/docx-core/src/primitives/footnotes.ts, or other side-part primitives, and adds/changes revision markup (w:ins, w:del), does tracked-change revision logic stay scoped to Table A (document-body content inside the side part) without leaking revision markup into Table B (the side-part package bootstrap — comments.xml/footnotes.xml element registration itself)? Body runs and side-part package elements share nearly identical XML namespace schemas; revisions emitted in the wrong table corrupt the package contract while looking plausible. References: [120.3] Emit w:ins/w:del for comment body anchors #138 (comment-body straddle constraints), [120.4] Emit w:ins/w:del for footnote reference and text #139 (footnote-reference straddle constraints).

  12. Canonical-emission surface completeness: If this PR adds or changes a tracked-edit surface in packages/docx-core/src/primitives/** or packages/docx-mcp/src/tools/**, are the paired artifacts updated together — packages/docx-core/src/integration/canonical-emission-regression.test.ts, packages/docx-mcp/src/integration/canonical-emission-mcp.test.ts, and the documented emitter surface (Table A) — or is the rollout only partially wired? The primitive change looks done before the MCP path, regression matrix, and documented contract are wired through; partial rollouts ship undocumented surface that drifts. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (RevisionContext threaded through every Table A MCP tool), test(docx-core,docx-mcp): final regression suite for canonical emission (#143) #175 (24-test regression suite + verified write-time emitter rows), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (re-enabled rPrChange regression + updated support surface for replaceParagraphTextRange).

  13. Lean predicate drift against engine semantics (asymmetric): If this PR changes field-wrapper semantics, the proof boundary, or atomizer behavior — packages/docx-core/src/baselines/atomizer/**, verification/lean/LeanSpike/Spec.lean, verification/lean/Tier2/**, or packages/docx-core/src/integration/lean-spec-bridge.test.ts — and if the TS engine semantics shifted, did the PR also update the Lean residual predicate and bridge tests, or is the proof now pinned to a stale stronger/weaker assumption? Asymmetric: a TS change without a corresponding Lean update is WARN; a Lean-only change without a TS update should not fire. The Lean side can still compile while the abstraction boundary is subtly wrong for the next engine refactor. References: feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208 (closed inv_field_001 using stronger recursivelyWellformed), refactor(verification): weaken inv_field_001 axiom to document-level preservationFriendly (rebased follow-up to #208) #220 (weakened the axiom to document-level preservationFriendly to avoid breakage when field fragmentation lands).

  14. Unit-test quality (avoid tautological / change-detector tests): If this PR adds or modifies any **/*.test.ts (or other test files), are the test assertions independent of the system under test — expected values constructed from first principles rather than re-derived from the function under test, mocks limited to external boundaries (filesystem, network, clocks) rather than mocking the SUT itself, assertions making concrete semantic claims rather than just snapshotting current behavior or asserting non-null, and any test added alongside a bug fix actually exercising the bug? Tests that re-implement the production code as the "expected" value, or mock out the system under test, pass green while providing no regression protection.

Estimated cost (this run): $0.0135 — 39,565 input + 595 output tokens (≈4 chars/token) on gemini-3.5-flash. Char-count estimate, not provider telemetry.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@stevenobiajulu stevenobiajulu merged commit fb9b2f9 into main May 31, 2026
40 checks passed
@stevenobiajulu stevenobiajulu deleted the tweak-ci-llm-gate-mirror-check-run-on-workflow-dispatch branch May 31, 2026 00:34
@stevenobiajulu
Copy link
Copy Markdown
Member Author

Post-merge smoke PASSED

Merged: fb9b2f9e756c8dda8e7861e8e189997e48c11314
Built from: main @ fb9b2f9
Smoke: end-to-end workflow_dispatch re-fire against an open PR

What was tested

The whole point of this PR is the workflow_dispatch mirror behavior — there is no JS code change, so the npm test suite has zero coverage value for it. The realistic smoke is firing the workflow on a real open PR and confirming the mirrored check_run appears in statusCheckRollup.

Target: PR #277 (chore/gitignore-office-locks, head 93e7b7b) — chosen because it already had a pre-fix "Aggregate and post review" entry from its original 2026-05-27 PR-trigger run.

Run: gh workflow run llm-based-quality-gate.yml --repo UseJunior/safe-docx -f pr_number=277run 26699040024 — completed successfully, including the new "Mirror conclusion to PR head SHA (workflow_dispatch)" step.

Results

PR #277 head SHA 93e7b7b after the dispatch — two Aggregate and post review check_runs:

ID Completed Conclusion URL pattern
78134314845 2026-05-27 17:24 success /actions/runs/.../job/... (auto-generated, original PR trigger)
78688577890 2026-05-31 00:39 success /runs/... (mirrored, posted by new step)

statusCheckRollup for PR #277 — the mirrored check is present:

[
  {"completedAt":"2026-05-27T17:24:25Z","conclusion":"SUCCESS","detailsUrl":".../actions/runs/26527086610/job/78134314845"},
  {"completedAt":"2026-05-31T00:39:07Z","conclusion":"SUCCESS","detailsUrl":".../runs/78688577890"}
]

mergeStateStatus: CLEAN — the rollup sees the mirrored check, exactly as designed.

This matches the open-agreements PR #360 evidence pattern (the "2× Aggregate and post review" entries the PR description called out). Fix is working in the wild on safe-docx.

What was NOT tested

  • Negative path (WARN row → failing mirrored check). Will be exercised the next time the gate finds a real WARN; no synthetic reproduction here.
  • skip_gate path (package-lock-only PR). Logic was reviewed: compose step still succeeds with SKIPPED comment, so the mirror posts a passing check_run — same semantics as a PR-triggered skip.

Cleanup

  • ⚠️ remote branch already deleted (repo has auto-delete-on-merge enabled)
  • ✅ removed worktree /Users/stevenobiajulu/Projects/safe-docx-tweak-llm-gate-mirror
  • ✅ deleted local branch tweak-ci-llm-gate-mirror-check-run-on-workflow-dispatch

Log: /tmp/automerge-smoke-288-20260530-203958.log (local).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant