OpenHands · xingyaoww · Oct 21, 2025 · Oct 17, 2025 · Oct 17, 2025 · Oct 17, 2025
diff --git a/.github/scripts/README.md b/.github/scripts/README.md
@@ -0,0 +1,135 @@
+# Documentation Code Block Sync
+
+This directory contains scripts for automatically syncing code blocks in documentation files with their corresponding source files from the agent-sdk repository.
+
+## Overview
+
+The `sync_code_blocks.py` script ensures that code examples in the documentation always match the actual source code in the agent-sdk `examples/` directory. This prevents documentation drift and ensures users always see accurate, working examples.
+
+## How It Works
+
+1. **Scans MDX Files**: The script recursively scans all `.mdx` files in the docs repository
+2. **Finds Code Blocks**: It looks for Python code blocks with file references using the pattern:
+   ```markdown
+   ```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py
+   <code content>
+   ```
+   ```
+3. **Extracts File Path**: The file path is extracted from the code block metadata (e.g., `examples/01_standalone_sdk/02_custom_tools.py`)
+4. **Reads Source File**: The actual source file is read from the checked-out agent-sdk repository
+5. **Compares Content**: The code block content is compared with the actual file content
+6. **Updates Docs**: If there are differences, the documentation file is automatically updated
+
+## Usage
+
+### Via GitHub Actions
+
+The workflow `.github/workflows/sync-docs-code-blocks.yml` automatically runs:
+- **Daily at 2 AM UTC** to catch any changes
+- **Manually** via workflow dispatch (allows specifying a custom agent-sdk branch/tag)
+
+When differences are detected, the workflow:
+1. Checks out the docs repository
+2. Checks out the agent-sdk repository into `agent-sdk/` subdirectory
+3. Runs the sync script
+4. Creates a pull request with the updates
+
+### Manual Run
+
+To test locally:
+
+```bash
+cd docs
+
+# Clone agent-sdk if not already present
+git clone https://github.com/All-Hands-AI/agent-sdk.git agent-sdk
+
+# Run the script
+python .github/scripts/sync_code_blocks.py
+
+# Clean up
+rm -rf agent-sdk
+```
+
+## Code Block Format
+
+For the script to detect and sync code blocks, they must follow this format:
+
+```markdown
+```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py
+<code will be synced from agent-sdk/examples/01_standalone_sdk/02_custom_tools.py>
+```
+```
+
+The file reference must:
+- Start with `examples/`
+- End with `.py`
+- Be a valid path relative to the agent-sdk repository root
+
+Examples:
+- `examples/01_standalone_sdk/02_custom_tools.py`
+- `examples/02_remote_agent_server/01_convo_with_local_agent_server.py`
+- `examples/03_github_workflows/01_basic_action/action.py`
+
+## Features
+
+- **Automatic Detection**: Finds all code blocks with file references
+- **Smart Comparison**: Normalizes content (trailing whitespace, line endings) for accurate comparison
+- **Batch Updates**: Can update multiple files in a single run
+- **GitHub Integration**: Automatically creates PRs when changes are needed
+- **Safe Operation**: Only updates files with actual differences
+- **Clear Logging**: Provides detailed output about what's being processed and updated
+- **Flexible Scheduling**: Daily automatic runs plus manual trigger option
+
+## Configuration
+
+### Workflow Configuration
+
+Edit `.github/workflows/sync-docs-code-blocks.yml` to customize:
+- Schedule timing (cron expression)
+- Default agent-sdk branch
+- PR title/body templates
+
+### Script Behavior
+
+The script:
+- Expects agent-sdk to be checked out in `docs/agent-sdk/`
+- Scans all `.mdx` files recursively
+- Updates files in-place
+- Sets GitHub Actions output for PR creation
+
+## Troubleshooting
+
+### "Source file not found" warnings
+
+This means the script found a code block reference but couldn't locate the corresponding source file. This can happen if:
+- The file reference doesn't match an actual file in `agent-sdk/examples/`
+- The file has been moved or renamed in agent-sdk
+- The agent-sdk checkout is incomplete or at the wrong ref
+
+### No changes detected when you expect them
+
+Check that:
+1. The code block format matches the expected pattern (with full `examples/` path)
+2. The file path in the code block is correct and includes `.py` extension
+3. Whitespace differences are normalized (trailing spaces are ignored)
+4. The agent-sdk repository is checked out at the correct branch/tag
+
+### Script fails with path errors
+
+Ensure:
+- The script is run from the docs repository root, or
+- The agent-sdk repository is checked out in `docs/agent-sdk/`
+
+## Maintenance
+
+- Keep the regex patterns updated if the code block format changes
+- Update the workflow if the repository structure changes
+- Monitor workflow runs to catch any issues early
+- Review PRs to ensure changes are expected
+
+## Related Files
+
+- `.github/workflows/sync-docs-code-blocks.yml` - GitHub Actions workflow
+- `agent-sdk/examples/` - Source files that are synced to documentation
+- `sdk/guides/` - Documentation files containing code blocks
diff --git a/.github/scripts/sync_code_blocks.py b/.github/scripts/sync_code_blocks.py
@@ -0,0 +1,226 @@
+#!/usr/bin/env python3
+"""
+Sync code blocks in documentation files with their corresponding source files.
+
+This script:
+1. Scans MDX files for code blocks with file references (e.g., ```python expandable examples/01_standalone_sdk/02_custom_tools.py)
+2. Extracts the file path from the code block metadata
+3. Reads the actual content from the source file in agent-sdk/
+4. Compares the code block content with the actual file content
+5. Updates the documentation if there are differences
+"""
+
+import os
+import re
+import sys
+from pathlib import Path
+
+
+def find_mdx_files(docs_path: Path) -> list[Path]:
+    """Find all MDX files in the docs directory."""
+    mdx_files: list[Path] = []
+    for root, _, files in os.walk(docs_path):
+        for file in files:
+            if file.endswith(".mdx"):
+                mdx_files.append(Path(root) / file)
+    return mdx_files
+
+
+def extract_code_blocks(content: str) -> list[tuple[str, str, int, int]]:
+    """
+    Extract code blocks that reference source files.
+
+    Returns list of tuples: (file_reference, code_content, start_pos, end_pos)
+
+    Pattern matches blocks like:
+    ```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py
+    <code content>
+    ```
+    """
+    # Captures examples/...*.py after the first line, then the body up to ```
+    pattern = r'```python[^\n]*\s+(examples/[^\s]+\.py)\n(.*?)```'
+    matches: list[tuple[str, str, int, int]] = []
+    for match in re.finditer(pattern, content, re.DOTALL):
+        file_ref = match.group(1)
+        code_content = match.group(2)
+        start_pos = match.start()
+        end_pos = match.end()
+        matches.append((file_ref, code_content, start_pos, end_pos))
+    return matches
+
+
+def read_source_file(agent_sdk_path: Path, file_ref: str) -> str | None:
+    """
+    Read the actual source file content.
+
+    Args:
+        agent_sdk_path: Path to agent-sdk repository
+        file_ref: File reference like "examples/01_standalone_sdk/02_custom_tools.py"
+    """
+    source_path = agent_sdk_path / file_ref
+    if not source_path.exists():
+        print(f"Warning: Source file not found: {source_path}")
+        return None
+    try:
+        return source_path.read_text(encoding="utf-8")
+    except Exception as e:
+        print(f"Error reading {source_path}: {e}")
+        return None
+
+
+def normalize_content(content: str) -> str:
+    """Normalize content for comparison (remove trailing whitespace, normalize line endings)."""
+    return "\n".join(line.rstrip() for line in content.splitlines())
+
+
+def resolve_paths() -> tuple[Path, Path]:
+    """
+    Determine docs root and agent-sdk path robustly across CI and local layouts.
+    Priority for agent-sdk path:
+      1) AGENT_SDK_PATH (env override)
+      2) $GITHUB_WORKSPACE/agent-sdk
+      3) docs_root/'agent-sdk'
+      4) docs_root.parent/'agent-sdk' (legacy)
+    """
+    # docs repo root (script is at docs/.github/scripts/sync_code_blocks.py)
+    script_file = Path(__file__).resolve()
+    docs_root = script_file.parent.parent.parent
+
+    candidates: list[Path] = []
+
+    # 1) Explicit env override
+    env_override = os.environ.get("AGENT_SDK_PATH")
+    if env_override:
+        candidates.append(Path(env_override).expanduser().resolve())
+
+    # 2) Standard GitHub workspace sibling
+    gh_ws = os.environ.get("GITHUB_WORKSPACE")
+    if gh_ws:
+        candidates.append(Path(gh_ws).resolve() / "agent-sdk")
+
+    # 3) Sibling inside the docs repo root
+    candidates.append(docs_root / "agent-sdk")
+
+    # 4) Legacy parent-of-docs-root layout
+    candidates.append(docs_root.parent / "agent-sdk")
+
+    print(f"🔍 Scanning for MDX files in {docs_root}")
+    print("🔎 Trying agent-sdk paths (in order):")
+    for p in candidates:
+        print(f"   - {p}")
+
+    for p in candidates:
+        if p.exists():
+            print(f"📁 Using Agent SDK path: {p}")
+            return docs_root, p
+
+    # If none exist, fail with a helpful message
+    print("❌ Agent SDK path not found in any of the expected locations.")
+    print("   Set AGENT_SDK_PATH, or checkout the repo to one of the tried paths above.")
+    sys.exit(1)
+
+
+def update_doc_file(
+    doc_path: Path,
+    content: str,
+    code_blocks: list[tuple[str, str, int, int]],
+    agent_sdk_path: Path,
+) -> bool:
+    """
+    Update documentation file with correct code blocks.
+
+    Returns True if changes were made, False otherwise.
+    """
+    changes_made = False
+    new_content = content
+    offset = 0  # Track offset due to content changes
+
+    for file_ref, old_code, start_pos, end_pos in code_blocks:
+        actual_content = read_source_file(agent_sdk_path, file_ref)
+        if actual_content is None:
+            continue
+
+        old_normalized = normalize_content(old_code)
+        actual_normalized = normalize_content(actual_content)
+
+        if old_normalized != actual_normalized:
+            print(f"\n📝 Found difference in {doc_path.name} for {file_ref}")
+            print("   Updating code block...")
+
+            adj_start = start_pos + offset
+            adj_end = end_pos + offset
+
+            opening_line_match = re.search(
+                r"```python[^\n]*\s+" + re.escape(file_ref),
+                new_content[adj_start:adj_end],
+            )
+            if opening_line_match:
+                opening_line = opening_line_match.group(0)
+                # Preserve trailing newline behavior
+                if actual_content.endswith("\n"):
+                    new_block = f"{opening_line}\n{actual_content}```"
+                else:
+                    new_block = f"{opening_line}\n{actual_content}\n```"
+                old_block = new_content[adj_start:adj_end]
+
+                new_content = new_content[:adj_start] + new_block + new_content[adj_end:]
+                offset += len(new_block) - len(old_block)
+                changes_made = True
+
+    if changes_made:
+        try:
+            doc_path.write_text(new_content, encoding="utf-8")
+            print(f"✅ Updated {doc_path}")
+            return True
+        except Exception as e:
+            print(f"❌ Error writing {doc_path}: {e}")
+            return False
+
+    return False
+
+
+def main() -> None:
+    docs_root, agent_sdk_path = resolve_paths()
+
+    # Find all MDX files
+    mdx_files = find_mdx_files(docs_root)
+    print(f"📄 Found {len(mdx_files)} MDX files")
+
+    total_changes = 0
+    files_changed: list[str] = []
+
+    for mdx_file in mdx_files:
+        try:
+            content = mdx_file.read_text(encoding="utf-8")
+            code_blocks = extract_code_blocks(content)
+            if not code_blocks:
+                continue
+
+            print(f"\n📋 Processing {mdx_file.relative_to(docs_root)}")
+            print(f"   Found {len(code_blocks)} code block(s) with file references")
+
+            if update_doc_file(mdxx_file := mdx_file, content=content, code_blocks=code_blocks, agent_sdk_path=agent_sdk_path):
+                total_changes += 1
+                files_changed.append(str(mdxx_file.relative_to(docs_root)))
+        except Exception as e:
+            print(f"❌ Error processing {mdx_file}: {e}")
+            continue
+
+    print("\n" + "=" * 60)
+    if total_changes > 0:
+        print(f"✅ Updated {total_changes} file(s):")
+        for file in files_changed:
+            print(f"   - {file}")
+        if "GITHUB_OUTPUT" in os.environ:
+            with open(os.environ["GITHUB_OUTPUT"], "a", encoding="utf-8") as f:
+                f.write("changes=true\n")
+    else:
+        print("✅ All code blocks are in sync!")
+        if "GITHUB_OUTPUT" in os.environ:
+            with open(os.environ["GITHUB_OUTPUT"], "a", encoding="utf-8") as f:
+                f.write("changes=false\n")
+    print("=" * 60)
+
+
+if __name__ == "__main__":
+    main()