Skip to content

fix(docs-sync): awk-based frontmatter extraction (#19)#20

Merged
stackbilt-admin merged 1 commit into
mainfrom
fix/docs-sync-frontmatter-greedy
May 2, 2026
Merged

fix(docs-sync): awk-based frontmatter extraction (#19)#20
stackbilt-admin merged 1 commit into
mainfrom
fix/docs-sync-frontmatter-greedy

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

Replaces the greedy sed range pattern in scripts/docs-sync.sh with an awk single-state-machine that captures only the first ^---$ block. Closes #19.

Bug recap

sed -n '/^---$/,/^---$/p' is a range expression. After the first range terminates, sed restarts looking for the start pattern; if the body contains additional ^---$ lines (markdown horizontal-rule separators between sections), each pair captures another range. The captured "frontmatter" was prepended to the upstream content — duplicating swaths of body. api-reference.md upstream has 65 HR separators, so it doubled from 1861 → 2906 lines on resync.

Fix

- existing_fm=$(sed -n '/^---$/,/^---$/p' "$existing")
+ existing_fm=$(awk '/^---$/{n++; print; if(n==2)exit} n==1 && !/^---$/{print}' "$existing")

awk increments n on each ---, prints the line, exits after the second; in between, prints body lines (which inside the frontmatter are key/value pairs). State machine terminates at the close fence and doesn't restart.

Verification

Direct test on api-reference.md (which has 65 HR separators in body):

Extractor Captured lines
Old buggy sed 1053
New awk 8

Files without body HR separators (mcp.md, platform.md, ecosystem.md): both extractors return 8 — no behavior change.

End-to-end sync run after the fix:

[10:55:01] ~ No changes: platform.md
[10:55:02] ~ No changes: mcp.md
[10:55:02] ~ No changes: api-reference.md
[10:55:03] ~ No changes: ecosystem.md
Sync complete: 0 updated, 4 unchanged, 0 failed, 0 generated

Idempotent — re-running the sync against the existing repaired files reports zero changes. Pre-fix this would have written 1045 doubled lines back into api-reference.md again.

Test plan

  • Unit-style: awk '...' file | wc -l on api-reference.md returns 8 (was 1053 with sed)
  • Integration: full --sync-only --source stackbilt-web reports 0 updated / 4 unchanged
  • Pre-commit hook: astro build passes (9 pages built)
  • Future: when an upstream file is actually modified, the sync writes ~1862 lines (frontmatter + new body), not ~2900 — confirm on next real upstream change

Note

This is a stopgap fix. A discussion is in flight to migrate the docs site to read from AEGIS wiki as SoT instead of from N product repos via gh-API pull, which would retire docs-sync.sh entirely. Landing this fix now keeps the current sync stable for the weeks of overlap during that migration.

🤖 Generated with Claude Code

The sed range pattern '/^---$/,/^---$/p' restarts after termination,
capturing every '^---$' pair in the file. Files with markdown HR
separators in the body (api-reference.md upstream has 65) had their
content doubled on each sync, then re-doubled on the next.

Replaced with an awk single-state-machine: increments n on each ---,
prints the line, exits at the second ---; in between, prints body
lines. Idempotent for files with body HR separators; behavior
unchanged for files without (mcp.md, platform.md, etc.).

Verified by re-running the sync after the fix:
  Sync complete: 0 updated, 4 unchanged, 0 failed
api-reference.md stays at 1861 lines (was doubling to 2906 with the
sed pattern). Closes #19.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stackbilt-admin stackbilt-admin merged commit 2863efd into main May 2, 2026
1 check passed
@stackbilt-admin stackbilt-admin deleted the fix/docs-sync-frontmatter-greedy branch May 2, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(docs-sync): greedy frontmatter extraction doubles content for files with markdown HR separators

1 participant