fix(docs-sync): awk-based frontmatter extraction (#19)#20
Merged
Conversation
The sed range pattern '/^---$/,/^---$/p' restarts after termination, capturing every '^---$' pair in the file. Files with markdown HR separators in the body (api-reference.md upstream has 65) had their content doubled on each sync, then re-doubled on the next. Replaced with an awk single-state-machine: increments n on each ---, prints the line, exits at the second ---; in between, prints body lines. Idempotent for files with body HR separators; behavior unchanged for files without (mcp.md, platform.md, etc.). Verified by re-running the sync after the fix: Sync complete: 0 updated, 4 unchanged, 0 failed api-reference.md stays at 1861 lines (was doubling to 2906 with the sed pattern). Closes #19. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the greedy sed range pattern in
scripts/docs-sync.shwith an awk single-state-machine that captures only the first^---$block. Closes #19.Bug recap
sed -n '/^---$/,/^---$/p'is a range expression. After the first range terminates, sed restarts looking for the start pattern; if the body contains additional^---$lines (markdown horizontal-rule separators between sections), each pair captures another range. The captured "frontmatter" was prepended to the upstream content — duplicating swaths of body.api-reference.mdupstream has 65 HR separators, so it doubled from 1861 → 2906 lines on resync.Fix
awk increments
non each---, prints the line, exits after the second; in between, prints body lines (which inside the frontmatter are key/value pairs). State machine terminates at the close fence and doesn't restart.Verification
Direct test on
api-reference.md(which has 65 HR separators in body):Files without body HR separators (mcp.md, platform.md, ecosystem.md): both extractors return 8 — no behavior change.
End-to-end sync run after the fix:
Idempotent — re-running the sync against the existing repaired files reports zero changes. Pre-fix this would have written 1045 doubled lines back into
api-reference.mdagain.Test plan
awk '...' file | wc -lon api-reference.md returns 8 (was 1053 with sed)--sync-only --source stackbilt-webreports 0 updated / 4 unchangedastro buildpasses (9 pages built)Note
This is a stopgap fix. A discussion is in flight to migrate the docs site to read from AEGIS wiki as SoT instead of from N product repos via gh-API pull, which would retire
docs-sync.shentirely. Landing this fix now keeps the current sync stable for the weeks of overlap during that migration.🤖 Generated with Claude Code