Skip to content

[docs] sanitize JSX attribute quotes in auto-translated MDX#247

Merged
NiveditJain merged 2 commits into
mainfrom
luv-247
Apr 30, 2026
Merged

[docs] sanitize JSX attribute quotes in auto-translated MDX#247
NiveditJain merged 2 commits into
mainfrom
luv-247

Conversation

@NiveditJain
Copy link
Copy Markdown
Member

@NiveditJain NiveditJain commented Apr 30, 2026

Summary

  • The CI / docs / Validate docs job on main is red after [auto] update translations #246 (docs: update translations for changed English sources) because the German translator emitted <Tab title="Tab „Richtlinien""> and <Tab title="Tab „Aktivität""> in docs/de/dashboard.mdx. The opening (U+201E) is fine, but the LLM closed the typographic pair with an ASCII " (U+0022) instead of the proper " (U+201D). That ASCII " terminates the JSX attribute, leaving the real attribute close as a stray character before >, which trips mintlify validate with Unexpected character " .
  • This is the second time this exact regression hit main — PR [docs] fix mintlify parse error in de/dashboard.mdx #229 fixed it once by hand-editing the file, but the next auto-translation cycle regenerated the same broken markup. User directive: "this is failing a lot... lets fix this for once all".
  • Durable fix in three layers, plus an immediate file fix to unblock CI today:
    • scripts/translate-docs/mdx-translator.ts — new sanitizeJsxAttributes(content) strips stray trailing ASCII " after a JSX attribute close, and drops only the surplus unmatched typographic opening quotes (, «, , , ) inside the same value, scanning from the right so a matched pair earlier in the same value is preserved (e.g. „Foo“ und „Bar„Foo“ und Bar). Wired into translateMdxPage ahead of rewriteInternalLinks so every translated page is sanitized before write.
    • scripts/translate-docs/translator.ts — rule Bump next from 16.2.1 to 16.2.2 #2 of the system prompt now explicitly forbids ASCII " inside JSX attribute values, so the LLM is less likely to produce the pattern in the first place. Cached as part of the system-prompt prefix.
    • __tests__/scripts/translate-docs/mdx-translator.test.ts — 9 new tests covering the exact de/dashboard.mdx:65 failure, self-close, multi-attribute, matched typographic pairs, empty values, multiple malformed attributes on one line, and the mixed matched-pair + stray-opener case CodeRabbit flagged.
    • docs/de/dashboard.mdx — strip the inner German quotes from the two <Tab title> attributes (mirrors [docs] fix mintlify parse error in de/dashboard.mdx #229) so the failing docs / Validate docs check passes immediately on this PR rather than waiting for the next cache-invalidating translation run.
  • Why English curly "…" is intentionally not in the openings list: U+201C is the German closer and the English-curly opener, so processing the English pair after German would strip the German closer. The remaining pairs (German, French ×2, Japanese ×2) all have unambiguous openers.
  • CHANGELOG entry added under ## Unreleased### Fixes.

Fixes failing run: https://github.com/exospherehost/failproofai/actions/runs/25147733926

CodeRabbit follow-up (commit 897ee50)

CodeRabbit raised two issues on the first commit and both are addressed:

  1. Don't strip every opener when counts are imbalanced. The first version did cleaned.split(open).join(""), which removed all occurrences of an opener whenever opens > closes, breaking matched pairs. Fixed: now drops only surplus = opens - closes openers, scanning from the right with lastIndexOf so the leftmost matched pair is preserved.
  2. Add a mixed-regression test. Added drops only the surplus opener when a matched pair is also present covering <Tab title="„Foo“ und „Bar""><Tab title="„Foo“ und Bar">.

Test plan

  • bun run test:run __tests__/scripts/translate-docs/mdx-translator.test.ts — 18 tests pass (9 existing + 9 new sanitizer cases including the CodeRabbit mixed case).
  • bun run test:run — full unit suite stays green (1177 passed). The [failproofai:hook] WARN policy "thrower" threw: … lines in the output are intentional fixture coverage from __tests__/hooks/policy-evaluator.test.ts:146-161 (testing the evaluator's fail-open behavior), not regressions from this change.
  • bun run lint — only pre-existing <img> warning in app/components/log-viewer/tool-input-output.tsx.
  • bunx tsc --noEmit — clean.
  • Spot-check: grep -n "Tab title" docs/de/dashboard.mdx shows Tab Richtlinien / Tab Aktivität (no stray quotes).
  • Sanitizer dry-run on the original broken content reproduces the corrected output.
  • First commit's CI: 8 checks green (quality, build, docs, test (default), test (log-debug, debug), test (hook-log-file, 1), test-e2e, CodeRabbit); 1 skipped (Mintlify Deployment).
  • Watch gh run watch until the second commit's docs / Validate docs job goes green too.

🤖 Generated with Claude Code

The German translator periodically emits `<Tab title="Tab „Richtlinien"">`
where it intends `„…"` typographic quotes but uses ASCII `"` for the
closing — the inner straight `"` terminates the JSX attribute and the
real attribute close becomes a stray `"` before `>`, which trips
`mintlify validate` with `Unexpected character "`.

PR #229 fixed this once by hand on `docs/de/dashboard.mdx`. The next
auto-translation run regenerated the same broken markup, so the same
parse error landed on `main` again after #246.

Make it stick:

- `scripts/translate-docs/mdx-translator.ts` adds `sanitizeJsxAttributes`,
  which strips stray trailing ASCII `"` after a JSX attribute close and
  drops unmatched typographic opening quotes (`„`, `"`, `«`, `‹`, `「`,
  `『`) inside the same value. Matched pairs (e.g. `「ポリシー」`) are
  preserved. Wired into `translateMdxPage` ahead of `rewriteInternalLinks`.
- `scripts/translate-docs/translator.ts` extends rule #2 of the system
  prompt to forbid ASCII `"` inside JSX attribute values entirely, so
  the LLM is less likely to produce the pattern in the first place.
- `__tests__/scripts/translate-docs/mdx-translator.test.ts` covers the
  exact `de/dashboard.mdx` failure plus self-close, multi-attribute,
  matched typographic pairs, empty-value, and multiple-on-one-line cases.
- `docs/de/dashboard.mdx` drops the inner German quotes from the two
  `<Tab title>` attributes (mirrors #229) so CI on `main` goes green
  immediately rather than waiting for the next translation cycle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

This PR fixes MDX translation validation failures by introducing a sanitizeJsxAttributes post-processor that removes malformed ASCII quotes from JSX attribute values, updating the translator system prompt to prohibit ASCII quotes within attribute values, and applying the fix to the affected German dashboard MDX file.

Changes

Cohort / File(s) Summary
Changelog and Documentation
CHANGELOG.md, docs/de/dashboard.mdx
Documentation of the fix for MDX validation failure and removal of German quotation marks from tab title labels in the German dashboard.
Translation Pipeline Core
scripts/translate-docs/mdx-translator.ts
Added sanitizeJsxAttributes function to detect and remove stray ASCII/typographic quotes following JSX attribute closing quotes, integrated as a post-processing step before internal-link rewriting.
Translator System Prompt
scripts/translate-docs/translator.ts
Enhanced MDX-preservation rules to explicitly forbid ASCII double-quotes within JSX attribute values and instruct omission of language-native quotation marks within attributes.
Test Coverage
__tests__/scripts/translate-docs/mdx-translator.test.ts
Comprehensive test suite for sanitizeJsxAttributes covering malformed trailing quotes, matched/unmatched typographic quote pairs, empty attributes, and multiple corrupted attributes on the same line.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through JSX trails,
Where stray quotes break and parsing fails,
With a sanitizing spell so tight,
We've banished those ASCII-marks from sight,
Translation's cleaner, attributes bright! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: sanitizing JSX attribute quotes in auto-translated MDX documents to fix a recurring docs validation failure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering problem statement, root cause analysis, durable fix strategy across multiple layers, immediate fix, test plan with verification results, and rationale for design choices.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch luv-247

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@__tests__/scripts/translate-docs/mdx-translator.test.ts`:
- Around line 101-112: Add a new unit test that exercises sanitizeJsxAttributes
with an attribute value containing one correctly matched typographic quote pair
plus an extra stray opening typographic quote (e.g., a Japanese or German opener
alongside an ASCII or different closer) so the sanitizer must preserve the
matched pair and remove the stray opener; call sanitizeJsxAttributes with that
mixed input (use the same <Tab title="..."> pattern as existing tests) and
assert the returned string keeps the balanced pair intact while stripping only
the unmatched opener (i.e., expected string equals the attribute with the
matched quote pair preserved and the stray opener removed).

In `@scripts/translate-docs/mdx-translator.ts`:
- Around line 47-52: The loop that currently removes all occurrences of an
opener when opens > closes (using cleaned = cleaned.split(open).join("")) should
instead remove only the surplus unmatched opener(s); compute surplus = opens -
closes and remove that many occurrences (e.g., run a loop that calls cleaned =
cleaned.replace(open, "") surplus times or remove the last/first N instances
based on desired behavior), keeping matched pairs intact; update the block that
iterates over openings/open/close and use the variables opens, closes, cleaned
and prefix to perform limited removals (or replace this pass with a proper
JSX/MDX parse-based transform if you prefer).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3da2ab8f-3545-4039-8c5a-d8aee4da55d1

📥 Commits

Reviewing files that changed from the base of the PR and between c13d811 and 97ba9a4.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • __tests__/scripts/translate-docs/mdx-translator.test.ts
  • docs/de/dashboard.mdx
  • scripts/translate-docs/mdx-translator.ts
  • scripts/translate-docs/translator.ts

Comment thread __tests__/scripts/translate-docs/mdx-translator.test.ts
Comment thread scripts/translate-docs/mdx-translator.ts
Two findings on PR #247:

1. mdx-translator.ts:50 — `cleaned.split(open).join("")` removed *every*
   occurrence of an opener when `opens > closes`, so a value containing
   one matched typographic pair plus one stray opener (e.g.
   `„Foo“ und „Bar`) lost the matched pair too. Fix: drop only the
   surplus = opens - closes openers, scanning from the right with
   `lastIndexOf` so the leftmost matched pair is preserved.

2. mdx-translator.test.ts — add a regression test for that mixed case
   (one matched „…“ pair + one dangling „) so the bug above can't
   recur.

Also drop the English curly “…” pair from the openings list. U+201C
is both the German closer and the English-curly opener, so processing
the English pair after the German pair would strip the very German
closer we just preserved. The remaining pairs (German, French ×2,
Japanese ×2) all have unambiguous openers.

1177 unit tests pass (was 1176 — the new mixed-case test is the +1).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@NiveditJain NiveditJain merged commit 90856dc into main Apr 30, 2026
9 checks passed
NiveditJain added a commit that referenced this pull request May 5, 2026
…ion-bumps policy (#285)

* [luv-cut-0.0.10-beta.0] chore: cut 0.0.10-beta.0 release

Bumps package.json from 0.0.9-beta.3 to 0.0.10-beta.0 and rolls the
## Unreleased changelog section into ## 0.0.10-beta.0 — 2026-05-04.

Why 0.0.10-beta.0 and not 0.0.9-beta.3:
0.0.9 is already published as `latest` on npm. Per semver,
0.0.9-beta.3 < 0.0.9 — publishing it would point the `beta` dist-tag
at a version semver-older than the released 0.0.9, while shipping
*more* features than 0.0.9 ever had. The next pre-release after a
shipped 0.0.9 must live in the 0.0.10 line.

Why the version had drifted to 0.0.13-beta.1 before #284 reset it:
PRs #266 (OpenCode) and #267 (Pi) each speculatively bumped
package.json in their feature branches even though no release was
being cut. When unified into #270, the bumps stacked
(0.0.10-beta.1 → .2 → 0.0.11-beta.1 → 0.0.12-beta.1 → 0.0.13-beta.1).
Going forward, feature PRs should leave package.json alone — only
release-cut PRs touch the version.

Adds since v0.0.9:

Features:
- Add Gemini CLI integration (beta) (#277)
- Add OpenCode (sst/opencode) integration (beta) (#270)
- Add Pi (pi-coding-agent) integration (beta) (#270)
- Add GitHub Copilot CLI integration (beta) (#236)
- Add Cursor Agent CLI integration (beta) (#245)
- Project page lists Copilot and Cursor sessions (#245)

Fixes:
- Pi integration: cache sessionId in shim (#284)
- Cursor integration: support cursor-agent 2026-04+ layout (#283)
- block-read-outside-cwd: deny message for all 6 CLIs (#270)
- require-ci-green-before-stop: scope to current HEAD (#266)
- failproofai policies --uninstall: correct selector wording (#236)
- README: replace broken Copilot and Cursor logos (#236, #257)
- Auto-translated MDX: sanitize JSX attribute quotes (#247)

Docs:
- README: drop "more coming soon" tagline (#281)
- README: add Gemini, Pi, Cursor to supported-CLIs list (#277, #264, #245)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: add block-version-bumps custom policy

Prevents the kind of drift that caused this very release. PRs #266
(OpenCode) and #267 (Pi) each speculatively bumped package.json in
their feature branches, and when unified into #270 the bumps stacked
all the way to 0.0.13-beta.1. PR #284 then over-corrected to
0.0.9-beta.3 — older than the already-published 0.0.9.

The policy lives at .failproofai/policies/block-version-bumps.mjs
(auto-loaded by failproofai's project-scope hooks). It blocks:
- Edit/Write/MultiEdit on package.json that touches the "version" key
- Bash:  npm|yarn|pnpm|bun (pm) version <args>
- Bash:  sed|awk|jq mutating package.json referencing "version"

Allowed when on a `luv-cut-*` branch — the established release-cut
branch convention. Branch detection is a best-effort `git rev-parse`
that fails open (returns false) so a missing/unusable git tree never
blocks a legitimate edit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: address CodeRabbit review on block-version-bumps

Three valid findings, all fixed:

1. sed/awk/jq detection (line 26): regex required `package.json`
   to appear before `version`, missing forms like
   `jq '.version="x"' package.json`. Switched to two non-consuming
   lookaheads so either ordering matches within a shell segment.

2. Value-only Edit/MultiEdit bypass (lines 74-84): an agent could
   issue `Edit { old_string: '"0.0.9-beta.3"', new_string:
   '"0.0.10-beta.0"' }` — neither string contains the literal
   `"version"` key, so the previous check let it through. Added
   STANDALONE_SEMVER_VALUE_RE plus an editTouchesVersion() helper
   that catches a value-only swap when both sides are bare
   semver-quoted strings that differ. The anchors (^ / $) and
   leading-digit requirement intentionally exclude
   range-prefixed dep entries (`"^1.2.3"`) and key-prefixed ones
   (`"react": "18.2.0"`), so dep-version Edits aren't false-positive.

3. Loose cut-branch match (line 36): `^luv-cut-/` allowed any
   suffix (e.g. `luv-cut-feature`). Tightened to require a
   semver-shape suffix:
   `^luv-cut-\d+\.\d+\.\d+(?:-[0-9A-Za-z.-]+)?$`.

Verified via 16 regex test cases (sed orderings, dep edits with
keys, range prefixes, cut branch shapes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant