fix(docx-core): harden heading detection (#157 Phase 1) by stevenobiajulu · Pull Request #178 · UseJunior/safe-docx

stevenobiajulu · 2026-05-08T17:14:55Z

Summary

Phase 1 of #157: tighten the existing heading-detection signals in read_file(format=\"json\"), document the canonical precedence rule, and add a centered-title detector. Phase 2 (derived is_heading / heading_level schema fields) is deferred to a follow-up issue.

Three concrete bugs fixed against tests/test_documents/nvca-regression/source.docx:

False negative: the SPA's center-aligned, ALL-CAPS, bold title (SERIES […] PREFERRED STOCK PURCHASE AGREEMENT) now surfaces as header_style: 'title_caps_centered' instead of producing a null heading signal.
False positives in extractHeaderInfo(): the long-paragraph regex branch no longer returns title_bare when no explicit . or : terminator is matched — body prose like Termination of Section 2.2(d)(i) shall not affect … and Except as described in Section 2.2(d)(i), … no longer surface a leading-word header_text.
False positives in detectRunInHeader(): a formatting-transition invariant (headerCharCount < totalVisibleChars) ensures the bold/underline prefix is followed by non-header body. Whole-paragraph bold/underline blocks (signature labels, defined-term lead-in sentences) no longer get classified as run-in headers.

Smoke results across the 367-paragraph NVCA SPA

Header style	Main	This PR	Net
`run_in_header`	12 (all false positives, `header_text` = whole body prose)	0	-12 false positives
`title_with_period`	12	12	unchanged
`title_with_colon`	9	12	+3 (recovered from former `run_in_header` mis-classification)
`title_bare`	90	88	-2 (Termination/Except now correctly null)
`title_caps_centered`	0	3	+3 NEW true positives (SPA title + 2 others)

Why Phase 1 only (not Phase 2)

Peer-reviewed with both Gemini and Codex CLI before implementation. Codex flagged that:

The existing signals are too noisy to bless with a top-level verdict field today (the 12 false-positive run_in_header entries on main are exactly that noise).
The existing header string field is wired into toon header-stripping (document_view.ts:363) and grep locators (packages/docx-mcp/src/tools/grep.ts:56) — repurposing it would be more breaking than adding new fields later.
The Google Docs renderer is structurally cast to DocumentViewNode at packages/docx-mcp/src/tools/gdocs/read_file.ts:21, so any new top-level field is immediately a cross-renderer contract.

Phase 1 lands the cleaner signals and the documented precedence rule first; the Phase 2 schema (heading: null | { text, source, level } per Codex's preference, vs. nested heading: { is_heading, tier, level, source, source_detail } per Gemini's) will be designed against stable signals in a follow-up issue.

Documentation

skills/docx-editing/SKILL.md — new "Heading Detection (read_file JSON output)" subsection with the canonical precedence table.
packages/docx-mcp/README.md — brief paragraph documenting list_metadata.header_text / header_style (previously undocumented), pointing to SKILL.md.

Out of scope (tracked in follow-up)

Top-level is_heading / heading_level schema work.
Refining Heading 1–Heading 6 style detection (e.g., the HeadingPara1 / HeadingPara2 footgun in the NVCA fixture where styles inherit from heading styles but behave like body).
Mirroring on the Google Docs renderer (its structural cast already auto-mirrors any new top-level fields, so no Phase 1 changes break it).

Test plan

npm -w @usejunior/docx-core test — 1137 pass, 1 skipped, 0 fail (5 new branch tests added).
npm -w @usejunior/docx-mcp test — 675 pass, 0 fail (1 new end-to-end JSON-shape test against the SPA fixture).
npm run lint:workspaces — 0 errors (1 pre-existing warning in edit.test.ts unchanged).
Manual smoke against the actual NVCA SPA fixture: SPA title detected as title_caps_centered, Termination of Section … and Except as described in Section … no longer surface a non-null header_text, all 12 main-branch run_in_header false positives suppressed.
Governing Law and Venue: … style inline headers still detected as title_with_colon (no regression in document_view.branch.test.ts:150).
New regression test for a real bold-prefix + body-suffix paragraph confirms run_in_header still fires on legitimate cases.

Ref: #157

Three concrete bugs in `read_file(format="json")` heading signals, verified against `tests/test_documents/nvca-regression/source.docx`: 1. False negative on the SPA's center-aligned ALL-CAPS bold title (`SERIES […] PREFERRED STOCK PURCHASE AGREEMENT`). It produced no heading signal because none of the existing detectors targeted centered-title formatting. 2. False positives in `extractHeaderInfo()`: the long-paragraph regex branch returned `title_bare` for body prose that began with a capitalized word, surfacing the leading word as `header_text` (e.g. `Termination` from `Termination of Section 2.2(d)(i) shall not affect …`). 3. False positives in `detectRunInHeader()`: paragraphs whose visible runs were entirely bold/underline (signature labels, defined-term lead-in sentences) were classified as `run_in_header` because the detector never required a real header-prefix → body transition. Why this scope (Phase 1 only — no `is_heading` schema yet): peer review (gemini + codex) flagged that the existing signals were too noisy to bless with a top-level verdict field. Across the 367-paragraph NVCA SPA, the `run_in_header` count drops from 12 (all false positives, each carrying a body-prose `header_text`) to 0 with the transition guard. With the signals now correct, derived `is_heading` / `heading_level` fields can be designed against a stable foundation in a follow-up issue. Changes: - `extractHeaderInfo()`: long-paragraph regex branch returns null when no explicit `.` or `:` terminator was matched. Short-paragraph branch (≤ 50 chars) and the explicit-terminator paths are unchanged — the existing `Governing Law and Venue: …` test still passes. - `detectRunInHeader()`: adds a formatting-transition invariant (`headerCharCount < totalVisibleChars`). Whole-paragraph bold/underline blocks are no longer surfaced as run-in headers; a real bold-prefix → non-bold-body transition still works. - New `detectTitleCapsCentered()`: final fallback for centered, ALL-CAPS, bold standalone titles. Strict gates (no lowercase, no list label, not in a table cell, ≤ 60 chars). Emits `header_style: 'title_caps_centered'`. - Documents the canonical heading-detection precedence rule in `skills/docx-editing/SKILL.md` and adds a brief pointer in `packages/docx-mcp/README.md`. - Regression tests: 5 new branch-coverage tests in `document_view.branch.test.ts` (synthetic XML) and 1 end-to-end JSON-shape test in `nvca_spa_regression.test.ts` against the actual SPA fixture. The Google Docs renderer (`packages/google-docs-core`) is structurally cast to `DocumentViewNode`; its schema is unchanged. Phase 2 design (the deferred `is_heading` / `heading_level` fields) will be tracked in a follow-up issue. Ref: #157

vercel · 2026-05-08T17:15:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
site	Ready	Preview, Comment	May 10, 2026 8:13pm

codecov · 2026-05-08T17:20:59Z

Codecov Report

❌ Patch coverage is 86.66667% with 10 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
packages/docx-core/src/primitives/document_view.ts	86.66%	6 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Peer review of #178 (gemini + codex CLI) and a smoke against ~/Downloads docx fixtures surfaced four false-positive / false-negative cases that slip past the Phase 1 gates. Address all four in this commit: 1. detectTitleCapsCentered: add a content gate so single-token bracketed placeholders ([COMPANY], [___], <NAME>) and underscore-only signature lines no longer classify as titles. Require >= 3 ASCII letters AND >= 2 whitespace-separated word-tokens. (Both Codex and the smoke independently reported real fixture hits.) 2. detectTitleCapsCentered: split the length cap into its own constant MAX_CENTERED_TITLE_LENGTH = 120. Long corporate titles like `AMENDED AND RESTATED CERTIFICATE OF INCORPORATION OF FOO INC.` were rejected by the 60-char body-prose cap, producing real false negatives on the bundled NVCA COI fixture. 3. buildNodesForDocumentView: run detectTitleCapsCentered BEFORE extractHeaderInfo so short ALL-CAPS centered titles like `SCHEDULE OF PURCHASERS` surface as title_caps_centered (matching the documented precedence in SKILL.md) instead of as title_bare. No production consumers depend on the old emission ordering. 4. detectRunInHeader: replace the headerCharCount >= totalVisibleChars guard with an explicit body-text accumulator so trailing-whitespace-only "body" runs no longer defeat the whole-paragraph-bold rejection. Real false positive observed in Downloads/Co-Founder Agreement.docx (`Compensation & Benefits.` + trailing space-only run, all bold). Add five regression tests covering each fix. Update SKILL.md and the mcp README to reflect the new gates and 120-char cap. All tests still pass: docx-core 1147 (1146 + 1 skip), docx-mcp 675.

stevenobiajulu · 2026-05-10T20:13:51Z

Pushed 1672f55 addressing all four findings from the gemini + codex peer review and the local smoke against ~/Downloads:

detectTitleCapsCentered content gate — require ≥ 3 ASCII letters AND ≥ 2 word-tokens. Rejects [COMPANY] placeholders (smoke) and underscore signature lines like ____________________________________________ (Codex flagged on bundled ILPA fixtures).
MAX_CENTERED_TITLE_LENGTH = 120 (split from MAX_HEADER_TEXT_LENGTH). Lets long titles like AMENDED AND RESTATED CERTIFICATE OF INCORPORATION OF FOO HOLDINGS INC. classify (Codex flagged the NVCA COI fixture being silently rejected at 60).
Reorder: detectTitleCapsCentered before extractHeaderInfo — short ALL-CAPS centered titles like SCHEDULE OF PURCHASERS now surface as title_caps_centered instead of title_bare, matching the documented precedence in SKILL.md. No production consumers depend on the old emission ordering (only the emitter itself references the enum values).
detectRunInHeader whole-paragraph guard, take 2 — replaced headerCharCount >= totalVisibleChars with an explicit body-text accumulator that requires non-whitespace body. The old guard failed open when trailing whitespace lived in a separate non-bold run (real false positive observed in Co-Founder Agreement.docx for Compensation & Benefits. — fully bold + trailing space-only run).

Five new branch tests added covering each gate; SKILL.md and the MCP README updated to describe the new gates and the 120-char cap.

Test status: docx-core 1146 pass + 1 skip, docx-mcp 675 pass, lint clean (1 pre-existing warning unchanged).

Smoke distribution after the fix (sampled from ~/Downloads):

File	run_in_header	title_caps_centered
Co-Founder Agreement	5 → 4	1 (`[COMPANY]`) → 1 (`TERMS OF EMPLOYMENT`)
City Office REIT LPA	0	0 → 4 (real titles)
Blackstone PE LPA	0	0 → 1 (`TABLE OF CONTENTS`)

Confirmed conclusions kept: headerCharCount >= totalVisibleChars was the right invariant direction (just needed the body-text refinement); removing the long-paragraph bare-title return path is fine; Phase 2 (is_heading / heading_level) deferral still right.

Review artifacts:

prompt: /tmp/peer-review-pr178-1762720000.md
gemini: /tmp/gemini-pr178-review.txt
codex: /tmp/codex-pr178-review.txt

stevenobiajulu · 2026-05-10T20:26:14Z

Post-merge smoke passed

Merged: 725f44d
Built from: main @ 725f44d (origin/main HEAD after pull)
Smoke: npm install && npm run build && npm run lint:workspaces && npm test plus a feature-specific run of scripts/smoke_pr178.mjs against ~/Downloads fixtures.

Steps

install — ok
build — ok (workspaces: docx-core, docx-mcp, google-docs-core, safe-docx, safe-docx-mcpb)
lint — ok (1 pre-existing warning in edit.test.ts, unchanged)
tests — ok across all workspaces:
- docx-core: 1218 passed + 1 skipped
- docx-mcp: 724 passed
- google-docs-core: 113 passed + 35 skipped
- safe-docx-mcpb: 3 passed
feature smoke — ok. Re-ran heading-detection smoke against 6 ~/Downloads fixtures from main HEAD. Confirmed post-merge behavior matches pre-push:
- [COMPANY] placeholder no longer classifies as title_caps_centered
- Compensation & Benefits. (whole-bold + trailing space) no longer mis-classifies as run_in_header (count 5 → 4)
- City Office REIT LPA picks up 4 real centered titles (AMENDED AND RESTATED, AGREEMENT OF LIMITED PARTNERSHIP, CITY OFFICE REIT OPERATING PARTNERSHIP, L.P., TABLE OF CONTENTS) previously missed
- Blackstone PE LPA picks up TABLE OF CONTENTS previously missed

Log: /tmp/automerge-smoke-178-1778444682.log (local).

…#187) (#188) * fix(docx-core): suppress non-sectional false-positive headings (#187) Two cross-fixture residual error classes after PR #178: 1. **Table cells**: `extractHeaderInfo()` ran on every paragraph regardless of `tableContext`, even though `detectTitleCapsCentered()` already explicitly skipped table cells. Bonterms NDA was 13/13 `title_bare` and 10/10 `title_with_colon` table-cell false positives (100%); Mutual NDA was 17/18; Common Paper NDA was 16/17. One-line fix adds the `&& !tableContext` gate to mirror the existing one on the centered-title path. 2. **Signature/field-label clusters**: NVCA SPA paragraphs 312–321 (`COMPANY:` / `By:` / `Name:` / `Title:` / `Address:` / `Email:` / `PURCHASER:` / `By:` / `Name:` / `Title:`) are body-level (not in tables) and were each individually classified as `title_with_colon` or `title_bare`. New `suppressSignatureClusters()` post-pass scans the assembled `nodes` array for runs of ≥4 consecutive paragraphs where ≥75% match a signature-label regex (mixed-case `Name:` form OR all-caps `COMPANY: …` form), then nulls `header_text` / `header_style` / `header_formatting` on every node in the cluster. Why these two specifically: a second peer review (Gemini + Codex CLI) measured 80/251 `title_bare` (32%) and 22/56 `title_with_colon` (39%) in-table false positives across 8 fixtures. None of PR #178's four fixes added a `!tableContext` guard to `extractHeaderInfo`, and the signature blocks live outside tables so the table fix alone wouldn't catch them. Cluster heuristic notes: - No "near document end" anchor (Gemini: MSAs and amended docs have signature blocks throughout). - Doesn't require all to be colon-suffixed (Codex: NVCA's `COMPANY: [Insert Name]` lacks colon-only suffix). - ≥4-paragraph minimum prevents single-label false suppression. Cross-fixture verification (8 fixtures): - Bonterms NDA: 13/13 + 10/10 → 0/0 + 0/0 - Mutual NDA: 16/15 + 0/0 → 1/0 + 0/0 - Common Paper NDA: 15/14 + 0/0 → 1/0 + 0/0 - NVCA SPA signature block (paragraphs 312–321): all `header_style: null` - ILPA LPAs (×2): unchanged at 20/0 + 0/0 — real section headings preserved Tests: 4 new (2 branch-coverage with synthetic XML, 2 end-to-end MCP JSON-shape against actual fixtures). Suites green: docx-core 1220 pass + 1 skip; docx-mcp 726 pass. Out of scope (deferred): - Placeholder/bracket rule for `title_bare` path (waits for footnote marker normalization first). - Confidence tiering. Peer review (Gemini + Codex) pending — see Step 8. Closes #187. * fix(docx-core): preserve adjacent headings + extend table gate to run-in detection Two issues caught by Codex peer review on PR #188: 1. **HIGH: cluster post-pass erased real headings adjacent to signature blocks.** suppressSignatureClusters() suppressed every node covered by any qualifying ≥4-paragraph window with ≥75% match density, including nodes that themselves did NOT match a signature-label regex. Codex reproduced: 4 signature labels followed by a Heading1 `Product` paragraph — the window [0..4] has 4/5 = 80% density, so the Heading1 was silently nulled. This is a production-shape failure because signature pages are routinely followed by the next section heading. Fix: only suppress nodes that themselves match a signature-label regex. The window is the density signal that authorizes suppression; adjacent non-matching nodes are preserved. 2. **NORMAL: detectRunInHeader still ran inside table cells**, despite SKILL.md saying heading detection is suppressed there. A table cell containing bold "Purpose:" plus body text would still emit `header_style: "run_in_header"`. Mirror the existing !tableContext gates already present on detectTitleCapsCentered and extractHeaderInfo. Two new regression tests: - All-caps `COMPANY:` / `PURCHASER:` cluster (NVCA-style mixed prefix + label rows, 9 paragraphs) all suppressed. - Boundary-preservation: 4 signature labels + adjacent Heading1 paragraph — labels suppressed, Heading1 preserved. Suites green: docx-core 1222 pass + 1 skip (was 1220, +2 new tests), docx-mcp 726 unchanged. Ref: #187 * fix(docx-core): suppress dense signature blocks; preserve only Word-styled headings adjacent Codex re-review found the previous fix was too lax. After narrowing suppression to "only nodes that match the signature-label regex," common signature-block shapes like: ACME CORP By: Name: Title: still leaked `ACME CORP` as `header_style: "title_bare"` (heuristic short-paragraph fallback fires; the regex doesn't match all-caps company names without a colon terminator). The right rule is in between the two extremes: - The original "any covering window" suppressed too much (erased adjacent Heading1 paragraphs). - "Only matching nodes" suppressed too little (leaked company names). Refined: inside a cluster window, suppress every node EXCEPT real Word-styled headings (`paragraph_style_id` matching `/^Heading[1-6]$/`). The density signal authorizes blanket suppression; only an explicit Word heading style is strong enough to overrule it. This handles both reproductions: - `By:`/`Name:`/`Title:`/`Address:`/`Product[Heading1]` → labels suppressed, Heading1 preserved (Codex's first repro). - `ACME CORP`/`By:`/`Name:`/`Title:` → ACME CORP suppressed (no Word style; only heuristic title_bare) (Codex's second repro). New regression test asserts the company-name lead-in case. Suites green: docx-core 1223 pass + 1 skip (was 1222, +1 test); docx-mcp 726 unchanged. Out of scope (deferred to a follow-up issue): - Broader regex coverage for `INVESTOR 1:`, `PURCHASER(S):`, `BORROWER[S]:` (digits, parens, brackets). Both reviewers flagged these as nice-to-have; they don't block this fix. Ref: #187 * fix(docx-core): narrow signature-cluster suppression to label-matching nodes Address Gemini + Codex peer review of PR #188 (issue #187 follow-up). Codex empirically reproduced the over-suppression bug: a window meeting the 4-paragraph / 75%-label density gate also covers non-label neighbors, and the previous code cleared `header_*` on every paragraph in the window (saving only `/^Heading[1-6]$/`-styled paragraphs). Empirically confirmed false negatives: - `[By:, Name:, Title:, Address:, Product]` — `Product` was cleared even though it doesn't match any signature-label regex. - `Heading7`, `Title`, `Subtitle`, `Article5L1` (all present in the fixture corpus) classified as `title_bare` when alone but were cleared when adjacent to a 4-label cluster. Fix: in the final clear loop, only mutate paragraphs that themselves satisfy `isSignatureClusterLabel`. The density gate still authorizes suppression of labels inside the window; non-label neighbors keep their detected heading metadata regardless of paragraph style. This obsoletes the `Heading1-6` exact-id escape hatch — real headings are preserved by virtue of not matching the label predicate. Tests: - Rewrite the "ACME CORP lead-in" test to assert the corrected behavior: the three label paragraphs are suppressed; ACME CORP keeps its heuristic `title_bare` classification (it's not a label). - Add a regression test for non-label body neighbor (`Product` after four labels). - Add a regression test for custom-style headings (`Heading7`, `Title`, `Subtitle`, `Article5L1`) adjacent to a signature cluster. - Add a regression test locking in correct WHEREAS-recital handling (no false suppression). Verification: - `npm -w @usejunior/docx-core test`: 1226 passed + 1 skipped (was 1220 + 1 skipped — 6 net new test assertions). - `npm -w @usejunior/docx-mcp test`: 726 passed (unchanged). - `npm run lint:workspaces`: 0 errors (1 pre-existing warning). - Cross-fixture spot check: in-table residual count remains 0 across all 7 fixtures. Some legitimate non-cluster headings now correctly preserved (Letter of Intent went from 6 → 11 residuals, all `Heading1`/`Heading2`-styled real section headings). Refs: peer-review feedback at #188 (comment)

#179) (#190) * feat(docx-core): add derived heading object to DocumentViewNode (#179) Phase 2 of #157: expose a sparse top-level `heading` field on every paragraph node, locking the precedence rule into the wire schema so consumers stop re-implementing it. Why now (after #157 / #178 / #188 detector hardening): the underlying signals are now stable enough to bless. Without this, every consumer re-implements the 5-step precedence ladder from `paragraph_style_id` + `list_metadata.header_style`, and every new heuristic added to the detector silently breaks every consumer's hardcoded `if/else` (as just happened with `title_caps_centered` in #178). Centralizing the verdict in the schema makes future detector changes non-breaking for downstream code. Schema (sparse — field omitted entirely for body paragraphs): heading?: { text: string; source: 'word_style' | 'run_in_header' | 'title_with_period' | 'title_with_colon' | 'title_caps_centered' | 'title_bare'; level: number | null; // 1..6 only when source === 'word_style' } Derivation rules (locked in #179): - `level` is set to 1..6 ONLY when `paragraph_style_id` matches /^Heading([1-6])$/ exactly. Style inheritance does NOT count — `HeadingPara1` / `HeadingPara2` styles in the NVCA fixture inherit from heading styles but behave like body paragraphs (`next: Normal`), so they MUST NOT produce a level. - `level` is `null` for all heuristic sources. - Word built-in heading wins over heuristics when both signals fire. - Field is OMITTED (not `null`) for body paragraphs — keeps payload small and gives a clean jq idiom: `.[] | select(.heading)`. Cross-renderer: the Google Docs renderer (packages/google-docs-core/src/document-view.ts) is structurally cast to DocumentViewNode at packages/docx-mcp/src/tools/gdocs/read_file.ts. Mirrored the derivation there but kept it Word-style-only (Google Docs has built-in heading styles; heuristic detectors do not run on the GDocs path). Bonus typing improvement: narrowed `header_style` from `string | null` to `HeuristicHeadingSource | null` (an exported type union) so consumers get autocomplete on the enum values. Tests: 8 new tests (7 in `document_view.branch.test.ts`, 1 e2e in `nvca_spa_regression.test.ts`): - Heading1 → `{source: 'word_style', level: 1}` - Heading6 → `{source: 'word_style', level: 6}` - HeadingPara1 (basedOn Heading1) → no `heading` key - title_caps_centered → `{source: 'title_caps_centered', level: null}` - run_in_header → `{source: 'run_in_header', level: null}` - Body paragraph → no `heading` key - Precedence: Heading2 + title_with_colon → `{source: 'word_style', level: 2}` - E2E: NVCA SPA title → `{source: 'title_caps_centered', level: null}` Suites: docx-core 1224 pass + 1 skip (was 1220, +4 net new tests after the helper-typing test churn). docx-mcp 733 pass; 1 pre-existing failure in `read_file_comments.test.ts:1028` unchanged from main — unrelated to this PR (introduced by PR #180's comments-namespace work). Documentation: SKILL.md and packages/docx-mcp/README.md updated to recommend `node.heading != null` as the canonical "is this a heading" check; raw `header_style` becomes the per-detector explanation layer. GDocs asymmetry documented explicitly. Peer review (Gemini + Codex) pending — see Step 8. Closes #179. * fix(docx-core): suppress heuristic heading promotion in table cells (peer review #190) Both Gemini and Codex independently flagged that heuristic detectors (run_in_header, title_with_period, title_with_colon, title_bare) still fired inside table cells, so the new canonical predicate `node.heading != null` was promoting ordinary label/value cell content ("Notice Address:", "Closing.", "Indemnification. ...") into structural headings. word_style still fires inside cells (legitimate Word Heading1..6 inside a table is still a heading). Other peer-review follow-ups: - Document HeadingValue.text semantic split (word_style = whole para, heuristic sources = extracted prefix) in JSDoc. - Export HeuristicHeadingSource so external consumers can name the narrowed ListMetadata.header_style union. - Fix CachedParagraph.paragraphId JSDoc — it stores namedStyleType (HEADING_1, NORMAL_TEXT), not the Google native paragraph ID. - Add GDocs renderer tests covering HEADING_n normalization, NORMAL_TEXT/TITLE/SUBTITLE rejection, and table-cell word_style. - Add docx-core regression tests for title_with_period, title_bare, table-cell suppression, and word_style-in-cell still firing. Test deltas: docx-core 1224 → 1227, google-docs-core 113 → 116. Pre-existing read_file_comments.test.ts:1028 failure unchanged.

github-actions Bot added the fix label May 8, 2026

stevenobiajulu mentioned this pull request May 8, 2026

Heading detection Phase 2: derived is_heading / heading_level (deferred from #157) #179

Closed

6 tasks

vercel Bot deployed to Preview May 10, 2026 20:13 View deployment

stevenobiajulu merged commit 725f44d into main May 10, 2026
23 checks passed

stevenobiajulu mentioned this pull request May 10, 2026

Reduce non-sectional false positives in heuristic heading detection #187

Closed

5 tasks

This was referenced May 10, 2026

fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188

Merged

feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190

Merged

This was referenced May 26, 2026

epic: ship LLM-Based Quality Gate (3-phase rollout) #252

Open

feat(ci): add LLM-Based Quality Gate (Phase 1, advisory) #253

Merged

This was referenced May 27, 2026

docs(scripts): align conformance script headers with canonical-README contract #271

Merged

fix(ci): harden LLM-gate against npm config-load failures (closes #273) #274

Merged

fix(ci): pass --repo to gh pr view in workflow_dispatch path (closes #275) #276

Merged

stevenobiajulu deleted the 157-heading-detection-20260505 branch May 27, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docx-core): harden heading detection (#157 Phase 1)#178

fix(docx-core): harden heading detection (#157 Phase 1)#178
stevenobiajulu merged 2 commits into
mainfrom
157-heading-detection-20260505

stevenobiajulu commented May 8, 2026

Uh oh!

vercel Bot commented May 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 8, 2026 •

edited

Loading

Uh oh!

stevenobiajulu commented May 10, 2026

Uh oh!

Uh oh!

stevenobiajulu commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevenobiajulu commented May 8, 2026

Summary

Smoke results across the 367-paragraph NVCA SPA

Why Phase 1 only (not Phase 2)

Documentation

Out of scope (tracked in follow-up)

Test plan

Uh oh!

vercel Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stevenobiajulu commented May 10, 2026

Uh oh!

Uh oh!

stevenobiajulu commented May 10, 2026

Post-merge smoke passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading