test(docs): add code example extraction and syntax validation by kovtcharov-amd · Pull Request #1250 · amd/gaia

kovtcharov-amd · 2026-05-29T03:47:35Z

Doc code examples had no automated syntax checking — broken Python snippets could ship to users without anyone noticing until they copy-pasted from the docs. Now util/check_doc_code.py extracts fenced code blocks from all MDX/MD files and runs compile() on Python blocks, reporting file and line number for each syntax error. Handles real-world MDX patterns: indented blocks inside <Tabs>/<Steps>/<Accordion>, top-level await (async doc examples), return/yield (method body excerpts), and ... placeholders. Pseudo-code in design docs (docs/spec/, docs/plans/) and non-runnable patterns (function signatures, flow diagrams, mixed-language blocks) are detected and skipped. CI runs it automatically on doc changes alongside the existing link checker.

Test plan

python -m pytest tests/unit/test_doc_code_validation.py -xvs — 47 tests pass
python util/check_doc_code.py --lang python — 464 blocks validated, 0 errors, exits 0
python util/check_doc_code.py (no filter) — shows correct OK/skipped counts even without --verbose

Closes #244

Doc code examples had no automated syntax checking — broken Python snippets could ship to users without anyone noticing until they copy-pasted. Now `util/check_doc_code.py` extracts fenced code blocks from all MDX/MD docs and runs `compile()` on Python blocks, reporting file/line for each error. Handles MDX patterns: indented blocks inside <Tabs>/<Steps>/<Accordion>, top-level `await` (async doc examples), `return`/`yield` (method body excerpts), and `...` placeholders. Closes #244

github-actions · 2026-05-29T03:51:19Z

Good infrastructure addition — compile()-based Python syntax checking is lightweight and the MDX normalization handles the real patterns (indented component nesting, ... placeholders, top-level await). Three issues need addressing before this is safe to merge: the CI job will fail immediately on the existing docs, the statistics output is wrong without --verbose, and one error handler silently eats failures.

Issues

🟡 CI job will fail on every doc-touching PR after merge

The new check-code-examples step runs python util/check_doc_code.py --lang python which exits 1 because 104 existing blocks in docs/spec/ and docs/plans/ are pseudo-code with intentional syntax violations. Merging this PR as-is means every subsequent PR that touches docs/** will have a failing required check until all 104 are resolved.

Three viable fixes — pick one:

Option A — Exclude spec/plans directories (scope the check to user-facing docs only):

          echo "Validating Python code examples in docs..."
          python util/check_doc_code.py --lang python --exclude docs/spec docs/plans

(Requires adding --exclude support to main() and find_doc_files().)

Option B — Mark the step continue-on-error with a documented reason (CLAUDE.md explicitly permits this for non-fatal CI steps):

      - name: Check documentation code examples
        continue-on-error: true  # 104 pseudo-code blocks in spec/plans fail; tracked in #244
        run: |
          echo "Validating Python code examples in docs..."
          python util/check_doc_code.py --lang python

Option C — Fix the 104 existing issues before or alongside this PR (preferred long-term, but likely a separate PR given the scope).

🟡 "Code blocks checked" count is wrong without `--verbose` (`util/check_doc_code.py:861`)

ok results are only appended to results when verbose=True (check_code_blocks, line 812–814). Without --verbose, ok_count = 0, so the summary prints:

Code blocks checked: 104      ← looks like only 104 blocks exist
  OK:       0                  ← misleads author into thinking nothing passed
  Errors:   104

The actual numbers (confirmed locally) are 1157 blocks checked, 1053 OK, 104 errors. The check counter needs to be maintained independently of the results list:

def check_code_blocks(
    repo_root: str,
    check_imports_flag: bool = False,
    verbose: bool = False,
    lang_filter: Optional[str] = None,
) -> List[CodeResult]:
    """Check all code blocks in documentation files."""
    root = Path(repo_root)
    results: List[CodeResult] = []
    _ok_total = 0  # track totals independent of verbosity

    for filepath in find_doc_files(repo_root):
        for block in extract_code_blocks(filepath, root):
            if lang_filter and block.lang != lang_filter.lower():
                continue

            if block.lang in PYTHON_LANGS:
                err = check_python_syntax(block.source, f"{block.file}:{block.line}")
                if err:
                    results.append(CodeResult(
                        block.file, block.line, block.lang,
                        "error", err, block.title,
                    ))
                else:
                    _ok_total += 1
                    if check_imports_flag:
                        for w in check_imports(block.source):
                            results.append(CodeResult(
                                block.file, block.line, block.lang,
                                "warning", w, block.title,
                            ))
                    if verbose:
                        results.append(CodeResult(
                            block.file, block.line, block.lang,
                            "ok", "syntax ok", block.title,
                        ))
            ...
    # attach totals so format_results can use them
    # (simplest approach: return a second value, or attach as an attribute)

Simplest fix without changing the return type: carry the verbose-independent ok count through a wrapper or a dataclass. Alternatively, always append ok results and filter them in format_results/the summary counter — that changes memory behaviour for large doc trees but is simpler to reason about.

🟡 Silent exception in `extract_code_blocks` (`util/check_doc_code.py:687`)

except Exception: return blocks silently swallows file-read failures — CLAUDE.md explicitly prohibits this pattern. A corrupted or permission-denied file produces no diagnostic.

    try:
        content = filepath.read_text(encoding="utf-8", errors="replace")
    except OSError as e:
        raise RuntimeError(f"Could not read {filepath}: {e}") from e

If soft failure is intentional (e.g. for network-mounted docs dirs), at minimum:

    try:
        content = filepath.read_text(encoding="utf-8", errors="replace")
    except OSError as e:
        print(f"WARNING: skipping {filepath}: {e}", file=sys.stderr)
        return blocks

🟢 `format_results` `verbose` parameter is dead code (`util/check_doc_code.py:833`)

The parameter is accepted but never read inside the function — the filtering already happened at the check_code_blocks level.

def format_results(results: List[CodeResult]) -> str:

(And update the two call sites accordingly.)

🟢 `_setup_docs` duplicated in `TestCheckCodeBlocks` and `TestMain`

Identical helper appears in both test classes. Extract once at module level or into a shared fixture in conftest.py.

Strengths

The MDX normalization in _normalize_python_source is well-thought-out: ... → pass, top-level await wrapping, and textwrap.dedent for indented component blocks are exactly the patterns that would cause false positives without them.
Test coverage is solid — extraction, normalization, syntax checking, integration, formatting, and CLI exit codes all have dedicated test classes. The test_mixed_valid_and_invalid and test_indented_code_in_mdx_component cases cover the tricky real-world patterns.
_KNOWN_MODULES allowlist prevents the --check-imports mode from producing false positives in a bare CI environment where GAIA's dependencies aren't installed — good defensive design.

Verdict

Request changes. The CI job will fail immediately after merge (104 existing issues in spec/plans), and the statistics bug means the default output is misleading. Both are quick fixes. Address the CI strategy decision (exclude dirs, continue-on-error, or a companion cleanup PR) and the ok_count counting before merging.

…ntation The validator was flagging 104 blocks that are function signatures, flow diagrams, mixed-language blocks, or spec pseudo-code — not runnable Python. Now it detects and skips: bare signatures without bodies, arrow notation, TOML/shell mixed blocks, trailing-ellipsis placeholders, and indented fragments. Design docs (docs/spec/, docs/plans/) are skipped entirely. Also fixes a real indentation bug in the chat-agent playbook where the "# Use it" section was at column 0 instead of the code block's indent level. Result: 464 Python blocks validated, 0 errors — CI passes.

github-actions · 2026-05-29T04:20:07Z

Good addition to the CI toolchain — doc-code validation closes a real gap (broken snippets shipping to users undetected). The pseudo-code heuristics cover the tricky MDX patterns well, and the test suite is thorough. Two issues worth fixing before merge.

Issues

🟡 Silent exception swallow in `extract_code_blocks` (`util/check_doc_code.py:770–771`)

    except Exception:
        return blocks

This violates CLAUDE.md's "no silent fallbacks" rule. A file that can't be read (permission error, OS-level race after directory scan) is silently skipped — the CI scanner continues and exits 0, giving a false "all clear". Should print to stderr and re-raise or skip with a logged warning:

    except OSError as e:
        print(f"warning: could not read {filepath}: {e}", file=sys.stderr)
        return blocks

The errors="replace" on read_text already handles encoding faults, so OSError is the only realistic subclass left.

🟢 `global` anti-pattern in autouse fixture (`tests/unit/test_doc_code_validation.py:23`)

    global check_doc_code
    check_doc_code = importlib.import_module("check_doc_code")

Module-level global in a fixture makes test isolation fragile — if two test files ever share a session, the reference can bleed. The sys.path manipulation + importlib.import_module dance is also unusual when a simpler approach works. Since util/ only needs to be on the path once, this is cleaner at module scope:

import importlib
import sys
from pathlib import Path

_util_dir = str(Path(__file__).resolve().parents[2] / "util")
if _util_dir not in sys.path:
    sys.path.insert(0, _util_dir)

check_doc_code = importlib.import_module("check_doc_code")

Then remove the autouse=True fixture entirely — it's no longer needed.

🟢 Duplicated `_setup_docs` helper (`tests/unit/test_doc_code_validation.py:411–419` and `633–640`)

TestCheckCodeBlocks._setup_docs and TestMain._setup_docs are identical. Extract to a module-level fixture:

@pytest.fixture()
def setup_docs(tmp_path):
    def _inner(files: dict) -> str:
        docs_dir = tmp_path / "docs"
        docs_dir.mkdir()
        for name, content in files.items():
            p = docs_dir / name
            p.parent.mkdir(parents=True, exist_ok=True)
            p.write_text(textwrap.dedent(content), encoding="utf-8")
        return str(tmp_path)
    return _inner

Affected: all test methods in TestCheckCodeBlocks and TestMain that call self._setup_docs(tmp_path, ...).

🟢 `warnings` variable name in `check_imports` (`util/check_doc_code.py:969`)

warnings: List[str] = [] shadows the stdlib warnings module name. Not a runtime error here (it isn't imported), but it confuses readers and linters.

    issues: List[str] = []

Update all references in the function accordingly.

🟢 Test plan description vs. actual tool behavior

The PR description says --lang python "surfaces ~104 genuine syntax issues in specs/plans (pseudo-code), exits non-zero". But PSEUDO_CODE_DIRS skips docs/spec and docs/plans entirely — those 104 issues would come from other directories (guides, SDK docs), or would exit 0 if all non-spec pseudo-code is caught by _is_pseudo_code(). The test plan checkbox wording creates confusion about what "verified" means. Worth updating the PR description to clarify, even though it doesn't affect the code.

Strengths

Heuristic design is solid. The layered _is_pseudo_code checks — arrows, TOML headers, signature stubs, all-indented fragments, mixed-shell blocks — cover the real MDX patterns without false-positiving on normal code. The priority order (pseudo-code check → normalize → compile) is the right pipeline.
Self-contained stdlib-only tool. No pip install step needed in CI, and the existing actions/checkout@v6 + actions/setup-python@v6 versions used throughout the repo are correctly reused.
Playbook fix is a genuine improvement. Replacing the triple-quoted f-string (indentation-sensitive in MDX) with explicit string concatenation removes a real fragility, and the "Use it" block indentation fix corrects broken indentation the validator itself would have caught.
Test coverage. 46 tests cover extraction, normalization, pseudo-code detection, integration, output formatting, and the CLI exit codes. Edge cases like top-level await, return outside a function, and ... placeholders all have dedicated tests.

Verdict

Approve with suggestions. The silent exception swallow is the only issue worth blocking on; everything else is style/maintainability. Fix the OSError handler before merge; the global fixture and duplicated helper can follow in a cleanup commit.

- Replace silent `except Exception` with `OSError` + stderr warning (CLAUDE.md prohibits swallowing errors) - Fix statistics bug: ok_count and skipped_count now tracked independently of --verbose, so default output shows correct totals (was showing 0/0) - Return CheckSummary NamedTuple instead of bare list to carry counts - Remove dead `verbose` param from format_results() - Rename `warnings` variable to `issues` in check_imports() to avoid shadowing the stdlib module - Replace `global` anti-pattern in test fixture with module-level import - Extract duplicated `_setup_docs` helper into shared pytest fixture - Add test_ok_count_without_verbose to verify the statistics fix

github-actions · 2026-05-29T06:52:29Z

This PR closes a real gap: broken Python snippets in MDX docs could ship to users with no automated catch. The implementation is solid across the board — clean data modelling, correct handling of real-world MDX quirks, and a thorough test suite. Two minor nits worth fixing before this becomes the pattern other utilities follow.

Issues

🟢 Minor — `_is_pseudo_code` angle-bracket check produces false negatives (`util/check_doc_code.py:817`)

if re.search(r"<\w+[^>]*>", stripped) and "f'" not in stripped and 'f"' not in stripped:
    return True

Any code block containing a non-f-string literal with an XML/HTML-like pattern — e.g. text.replace("<thinking>", "") or response = "<answer>" — will be silently classified as pseudo-code and skipped. This is a false-negative (valid syntax goes unchecked), not a false-positive (CI won't break). It's an acceptable pragmatic trade-off for now, but worth a comment so future readers know why the check is there and what its limits are:

    # Heuristic: XML/HTML-like tags (<Tag>, <response>) signal template pseudo-code.
    # False-negative: string literals like text.replace("<tag>", "") are also skipped.
    if re.search(r"<\w+[^>]*>", stripped) and "f'" not in stripped and 'f"' not in stripped:
        return True

🟢 Minor — Windows backslash paths in `PSEUDO_CODE_DIRS` are dead code in CI (`util/check_doc_code.py:717`)

PSEUDO_CODE_DIRS = {"docs/spec", "docs/plans", "docs\\spec", "docs\\plans"}

CI always runs on Ubuntu where pathlib produces /-separated relative paths, so docs\\spec and docs\\plans never match. If the intent is local Windows support, add a comment; if not, drop the dead entries.

# Use both separators so the check works on Windows local dev too
PSEUDO_CODE_DIRS = {"docs/spec", "docs/plans", "docs\\spec", "docs\\plans"}

Or, normalize at the point of comparison so the set stays clean:

PSEUDO_CODE_DIRS = {"docs/spec", "docs/plans"}
...
# in check_code_blocks:
normalized_file = block.file.replace("\\", "/")
if any(normalized_file.startswith(d) for d in PSEUDO_CODE_DIRS):

Strengths

Test suite is genuinely thorough. The 47 tests exercise the edge cases that matter in practice — top-level await, method-body excerpts with return/yield, MDX component indentation, mixed-language blocks, empty blocks, ... placeholders. These are exactly the cases where naive compile() would produce spurious failures.
Pseudo-code heuristics avoid the CI-breaks-on-docs trap. _is_pseudo_code + _is_signature_only_block handle the common doc patterns (function signatures, flow diagrams, TOML-mixed blocks) that look like Python but aren't runnable. Getting this right is the hardest part of this kind of tool, and the coverage here is solid.
Clean architecture. NamedTuple data classes (CodeBlock, CodeResult, CheckSummary) make each stage composable and independently testable. The playbook fix (aligning the "Use it" block to 4-space indent so textwrap.dedent() can normalise the whole snippet) is correct and necessary.

Verdict

Approve. No blocking issues. The two nits above are cosmetic/documentation-level — they don't affect correctness or CI behaviour.

itomek

The three blockers from the first review are resolved in head — extract_code_blocks now catches OSError and warns to stderr (no silent swallow), check_code_blocks returns a verbose-independent ok_count so the default summary is accurate, and the spec/plans pseudo-code blocks are skipped so CI won't fail the repo on merge. Approving. Minor, non-blocking: the description's "~104 genuine syntax issues" line is stale relative to the current skip behavior — worth a one-line edit so the test-plan wording matches what the tool now does.

Generated by Claude Code

github-actions Bot added devops DevOps/infrastructure changes tests Test changes labels May 29, 2026

github-actions Bot added the documentation Documentation changes label May 29, 2026

kovtcharov-amd self-assigned this May 29, 2026

kovtcharov-amd added the p2 low priority label May 29, 2026

itomek approved these changes May 29, 2026

View reviewed changes

kovtcharov-amd added this pull request to the merge queue May 29, 2026

Merged via the queue into main with commit 6d4711d May 29, 2026
30 checks passed

kovtcharov-amd deleted the feat/doc-code-validation-244 branch May 29, 2026 16:29

kovtcharov-amd mentioned this pull request Jun 1, 2026

Release v0.20.0 #1334

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(docs): add code example extraction and syntax validation#1250

test(docs): add code example extraction and syntax validation#1250
kovtcharov-amd merged 3 commits into
mainfrom
feat/doc-code-validation-244

kovtcharov-amd commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

itomek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov-amd commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

github-actions Bot commented May 29, 2026

Issues

🟡 CI job will fail on every doc-touching PR after merge

🟡 "Code blocks checked" count is wrong without --verbose (util/check_doc_code.py:861)

🟡 Silent exception in extract_code_blocks (util/check_doc_code.py:687)

🟢 format_results verbose parameter is dead code (util/check_doc_code.py:833)

🟢 _setup_docs duplicated in TestCheckCodeBlocks and TestMain

Strengths

Verdict

Uh oh!

github-actions Bot commented May 29, 2026

Issues

🟡 Silent exception swallow in extract_code_blocks (util/check_doc_code.py:770–771)

🟢 global anti-pattern in autouse fixture (tests/unit/test_doc_code_validation.py:23)

🟢 Duplicated _setup_docs helper (tests/unit/test_doc_code_validation.py:411–419 and 633–640)

🟢 warnings variable name in check_imports (util/check_doc_code.py:969)

🟢 Test plan description vs. actual tool behavior

Strengths

Verdict

Uh oh!

github-actions Bot commented May 29, 2026

Issues

🟢 Minor — _is_pseudo_code angle-bracket check produces false negatives (util/check_doc_code.py:817)

🟢 Minor — Windows backslash paths in PSEUDO_CODE_DIRS are dead code in CI (util/check_doc_code.py:717)

Strengths

Verdict

Uh oh!

itomek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kovtcharov-amd commented May 29, 2026 •

edited

Loading

🟡 "Code blocks checked" count is wrong without `--verbose` (`util/check_doc_code.py:861`)

🟡 Silent exception in `extract_code_blocks` (`util/check_doc_code.py:687`)

🟢 `format_results` `verbose` parameter is dead code (`util/check_doc_code.py:833`)

🟢 `_setup_docs` duplicated in `TestCheckCodeBlocks` and `TestMain`

🟡 Silent exception swallow in `extract_code_blocks` (`util/check_doc_code.py:770–771`)

🟢 `global` anti-pattern in autouse fixture (`tests/unit/test_doc_code_validation.py:23`)

🟢 Duplicated `_setup_docs` helper (`tests/unit/test_doc_code_validation.py:411–419` and `633–640`)

🟢 `warnings` variable name in `check_imports` (`util/check_doc_code.py:969`)

🟢 Minor — `_is_pseudo_code` angle-bracket check produces false negatives (`util/check_doc_code.py:817`)

🟢 Minor — Windows backslash paths in `PSEUDO_CODE_DIRS` are dead code in CI (`util/check_doc_code.py:717`)