Skip to content

Conversation

victlue
Copy link
Contributor

@victlue victlue commented Oct 3, 2025

PR to make clearer the dependencies for extract (for those who haven't used zod or pydantic before)

Copy link

changeset-bot bot commented Oct 3, 2025

⚠️ No Changeset found

Latest commit: b2cdc2d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR enhances the documentation by adding missing import statements to all code examples in the extract documentation. The changes add `import { z } from 'zod';` to TypeScript examples and `from pydantic import BaseModel` (plus `HttpUrl` where needed) to Python examples. This improvement makes the code examples complete and runnable out of the box, which is particularly beneficial for developers who are new to these validation libraries.

The extract functionality in Stagehand relies on schema validation libraries - zod for TypeScript and pydantic for Python - to define the structure of data being extracted from web pages. Previously, the documentation showed usage of these libraries without the corresponding import statements, which would cause import errors for users copying the examples. This change aligns with documentation best practices by providing self-contained, executable code snippets that don't assume prior knowledge of the required dependencies.

Changed Files
Filename Score Overview
docs/basics/extract.mdx 5/5 Added import statements to TypeScript and Python code examples to make them complete and runnable

Confidence score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects simple additive changes that improve documentation without modifying any functionality
  • No files require special attention

Sequence Diagram

sequenceDiagram
    participant User
    participant StagehandPage as "Stagehand Page"
    participant ExtractHandler as "Extract Handler"
    participant DOM as "DOM/Browser"
    participant LLM as "LLM Client"

    User->>StagehandPage: "page.extract(instruction, schema)"
    StagehandPage->>ExtractHandler: "Create extract handler instance"
    ExtractHandler->>ExtractHandler: "Initialize with stagehand, logger, page"

    alt Text Extraction Path
        ExtractHandler->>DOM: "Wait for DOM to settle"
        DOM-->>ExtractHandler: "DOM ready"
        ExtractHandler->>DOM: "Store original DOM state"
        ExtractHandler->>DOM: "Process DOM to create selector mapping"
        DOM-->>ExtractHandler: "Selector mappings"
        ExtractHandler->>DOM: "Collect text annotations with bounding boxes"
        DOM-->>ExtractHandler: "Text annotations array"
        ExtractHandler->>ExtractHandler: "Deduplicate annotations"
        ExtractHandler->>DOM: "Restore original DOM state"
        ExtractHandler->>LLM: "Send formatted text for extraction"
        LLM-->>ExtractHandler: "Structured data response"
    else DOM Extraction Path
        ExtractHandler->>DOM: "Wait for DOM to settle"
        DOM-->>ExtractHandler: "DOM ready"
        ExtractHandler->>DOM: "Retrieve accessibility tree"
        DOM-->>ExtractHandler: "Accessibility tree data"
        ExtractHandler->>ExtractHandler: "Transform schema"
        ExtractHandler->>LLM: "Send DOM data for extraction"
        LLM-->>ExtractHandler: "Structured data response"
    end

    ExtractHandler->>ExtractHandler: "Validate response against schema"
    ExtractHandler->>ExtractHandler: "Update metrics and log response"
    ExtractHandler-->>StagehandPage: "Return extracted data"
    StagehandPage-->>User: "Return structured data object"
Loading

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@victlue victlue merged commit 0791404 into main Oct 4, 2025
1 check passed
@victlue victlue deleted the victor/docs-pr-2 branch October 4, 2025 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants