Summary
scripts/docs-sync.sh line 241 extracts existing frontmatter with:
existing_fm=$(sed -n '/^---$/,/^---$/p' "$existing")
This sed pattern is a range match. After the first range terminates, sed restarts looking for the start pattern; if the body contains additional ^---$ lines (used as markdown horizontal-rule section separators), each pair captures another range. The captured "frontmatter" is then prepended to the upstream content — duplicating swaths of body each time the script runs against a file that uses HR separators.
Side effect: each successive sync makes the file worse, because the pollution is written back to the local file and gets re-captured on the next run.
Reproduction (just observed today)
Stackbilt-dev/stackbilt-web/docs/api-reference.md upstream has 65 ^---$ lines (HR separators between API sections). After re-syncing on 2026-05-02:
| File |
Upstream |
Expected (upstream + ~9-line frontmatter) |
Actual after sync |
mcp.md |
108 lines, 0 HR |
117 |
117 ✓ |
ecosystem.md |
190 lines, 0 HR |
199 |
199 ✓ |
platform.md |
110 lines, 0 HR |
119 |
119 ✓ |
api-reference.md |
1852 lines, 65 HR |
~1862 |
2906 ⚠️ doubled |
Actual ^---$ count in the synced api-reference.md: 132 (8-line frontmatter + 65*2 from the doubled body). Confirmed two # Stackbilder Platform API Reference headings in the file at lines 1 (within prepended fragment) and 1055 (start of upstream body proper).
Root cause
sed -n '/A/,/B/p' is documented to print every range from A to B. With a single A and a single B, that's one range. With 1+(2N) ^---$ lines (1 frontmatter open + 1 close + N HR pairs), sed prints 1 + N ranges, and the script captures every one of them as existing_fm.
The fix needs to capture exactly the first matched range, then stop. Three options:
Option 1 — awk (cleanest)
existing_fm=$(awk '/^---$/{if(++n==2){print;exit}} n>=1' "$existing")
Reads first ---, includes it; reads body; reads second ---, includes it, exits. Stops at the closing fence regardless of what's after.
Option 2 — sed with quit
existing_fm=$(sed -n '1,/^---$/{p;}; /^---$/,/^---$/{p;/^---$/q;}' "$existing")
Less readable; works but fragile.
Option 3 — head + awk
existing_fm=$(awk '/^---$/{n++} n<=2 {print} n==2 && /^---$/ {exit}' "$existing")
Also explicit.
I'd recommend Option 1 — it's the shortest, most readable, and uses awk's natural state machine.
Workaround
Currently: only sync files whose body doesn't contain ^---$ HR separators. Or repair the polluted output by hand:
{ head -8 "$existing_local"; echo; cat "$upstream_canonical"; } > "$existing_local"
Worked around manually for api-reference.md on 2026-05-02 (commit 2bcd7cc in this repo) — the api-reference.md change was reverted to the previous-sync state since it had been current; only the cleaner files (mcp.md, ecosystem.md, platform.md) committed.
Severity
- Doubles content silently — easy to miss in a
--dry-run since dry-run still calls the same code path
- Each subsequent sync compounds the problem — pollution is written back, then re-captured on next run
- Caught here only because api-reference.md jumped from 1861 → 2906 lines and triggered curiosity
- Files without HR separators in the body are unaffected — explains why
mcp.md / ecosystem.md / platform.md synced cleanly
Test plan after fix
- Run
./scripts/docs-sync.sh --dry-run --source stackbilt-web — expected: 0 files marked for update if local matches upstream
- Touch a single line in
stackbilt-web/docs/api-reference.md upstream; re-run the sync
- Verify the resulting
src/content/docs/api-reference.md is ~1860 lines (frontmatter + upstream body), not ~2900
- Re-run the sync immediately — should report 0 changes (idempotent), not double again
Related
- Surfaced 2026-05-02 during the post-consolidation docs-site update (
8952ffb + 2bcd7cc).
🤖 Generated with Claude Code
Summary
scripts/docs-sync.shline 241 extracts existing frontmatter with:existing_fm=$(sed -n '/^---$/,/^---$/p' "$existing")This sed pattern is a range match. After the first range terminates, sed restarts looking for the start pattern; if the body contains additional
^---$lines (used as markdown horizontal-rule section separators), each pair captures another range. The captured "frontmatter" is then prepended to the upstream content — duplicating swaths of body each time the script runs against a file that uses HR separators.Side effect: each successive sync makes the file worse, because the pollution is written back to the local file and gets re-captured on the next run.
Reproduction (just observed today)
Stackbilt-dev/stackbilt-web/docs/api-reference.mdupstream has 65^---$lines (HR separators between API sections). After re-syncing on 2026-05-02:mcp.mdecosystem.mdplatform.mdapi-reference.mdActual
^---$count in the synced api-reference.md: 132 (8-line frontmatter + 65*2 from the doubled body). Confirmed two# Stackbilder Platform API Referenceheadings in the file at lines 1 (within prepended fragment) and 1055 (start of upstream body proper).Root cause
sed -n '/A/,/B/p'is documented to print every range from A to B. With a single A and a single B, that's one range. With 1+(2N)^---$lines (1 frontmatter open + 1 close + N HR pairs), sed prints1 + Nranges, and the script captures every one of them asexisting_fm.The fix needs to capture exactly the first matched range, then stop. Three options:
Option 1 —
awk(cleanest)existing_fm=$(awk '/^---$/{if(++n==2){print;exit}} n>=1' "$existing")Reads first
---, includes it; reads body; reads second---, includes it, exits. Stops at the closing fence regardless of what's after.Option 2 —
sedwith quitexisting_fm=$(sed -n '1,/^---$/{p;}; /^---$/,/^---$/{p;/^---$/q;}' "$existing")Less readable; works but fragile.
Option 3 —
head+awkexisting_fm=$(awk '/^---$/{n++} n<=2 {print} n==2 && /^---$/ {exit}' "$existing")Also explicit.
I'd recommend Option 1 — it's the shortest, most readable, and uses awk's natural state machine.
Workaround
Currently: only sync files whose body doesn't contain
^---$HR separators. Or repair the polluted output by hand:{ head -8 "$existing_local"; echo; cat "$upstream_canonical"; } > "$existing_local"Worked around manually for
api-reference.mdon 2026-05-02 (commit2bcd7ccin this repo) — the api-reference.md change was reverted to the previous-sync state since it had been current; only the cleaner files (mcp.md,ecosystem.md,platform.md) committed.Severity
--dry-runsince dry-run still calls the same code pathmcp.md/ecosystem.md/platform.mdsynced cleanlyTest plan after fix
./scripts/docs-sync.sh --dry-run --source stackbilt-web— expected: 0 files marked for update if local matches upstreamstackbilt-web/docs/api-reference.mdupstream; re-run the syncsrc/content/docs/api-reference.mdis~1860 lines(frontmatter + upstream body), not~2900Related
8952ffb+2bcd7cc).🤖 Generated with Claude Code