Skip to content

feat(verifier): add evaluator backend facade#2129

Open
miguelg719 wants to merge 2 commits into
mainfrom
miguelgonzalez/verifier-01-evaluator-compat
Open

feat(verifier): add evaluator backend facade#2129
miguelg719 wants to merge 2 commits into
mainfrom
miguelgonzalez/verifier-01-evaluator-compat

Conversation

@miguelg719
Copy link
Copy Markdown
Collaborator

@miguelg719 miguelg719 commented May 15, 2026

Why

The verifier rewrite needs to coexist with the legacy v3 evaluator so existing evals and callers do not silently change behavior while the new pipeline is reviewed. This PR creates the compatibility boundary that lets us select the evaluator backend explicitly.

What Changed

  • Extracted the current v3 evaluator behavior into LegacyV3Evaluator.
  • Kept V3Evaluator as the public facade.
  • Added STAGEHAND_EVALUATOR_BACKEND=legacy|verifier and constructor backend options.
  • Defaulted the backend to legacy to preserve current ask() and batchAsk() semantics.
  • Added public API tests for backend selection and invalid backend handling.

Tests

  • pnpm --filter @browserbasehq/stagehand run typecheck
  • pnpm --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/public-api/v3-core.test.js
  • git diff --check

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 513b9d9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Client as Caller Code
    participant Facade as V3Evaluator (Facade)
    participant Legacy as LegacyV3Evaluator
    participant LLM as LLM Client
    participant V3 as V3 Instance
    participant Page as Page (Browser)

    Note over Client,Page: Evaluator Backend Selection Flow

    Client->>Facade: new V3Evaluator(v3, { backend })
    alt backend = "verifier"
        Facade->>Facade: Store backend = "verifier"
        Note over Facade: Verifier backend not yet available
    else backend = "legacy" (default)
        Facade->>Facade: Read STAGEHAND_EVALUATOR_BACKEND env
        Facade->>Legacy: Create LegacyV3Evaluator instance
        Note over Facade,Legacy: NEW: Delegates all calls to LegacyV3Evaluator
    end

    Note over Client,Page: ask() - Legacy Backend Flow

    Client->>Facade: ask(options)
    Facade->>Facade: Check backend
    alt backend = "legacy"
        Facade->>Legacy: ask(options)
        Legacy->>Legacy: Validate question & answer/screenshot
        alt screenshot provided as array
            Legacy->>Legacy: _evaluateWithMultipleScreenshots()
            Legacy->>LLM: createChatCompletion() with multiple images
        else screenshot = true (single)
            Legacy->>Page: awaitActivePage()
            Page-->>Legacy: Page object
            Legacy->>Page: screenshot({ fullPage: false })
            Page-->>Legacy: imageBuffer
            Legacy->>LLM: createChatCompletion() with question + image + answer
        else screenshot = false
            Legacy->>LLM: createChatCompletion() with question + agentReasoning
        end
        LLM-->>Legacy: Parsed response (YES/NO + reasoning)
        Legacy-->>Facade: EvaluationResult
        Facade-->>Client: EvaluationResult
    else backend = "verifier"
        Facade->>Facade: Throw StagehandInvalidArgumentError
        Facade-->>Client: Error: "verifier backend not available"
    end

    Note over Client,Page: batchAsk() - Legacy Backend Flow

    Client->>Facade: batchAsk(options)
    Facade->>Legacy: batchAsk(options)
    Legacy->>Legacy: Validate questions array
    alt screenshot = true
        Legacy->>Page: awaitActivePage()
        Page-->>Legacy: Page object
        Legacy->>Page: screenshot()
        Page-->>Legacy: imageBuffer
    end
    Legacy->>Legacy: Format questions into text
    Legacy->>LLM: createChatCompletion() with formatted questions + screenshot
    LLM-->>Legacy: Parsed batch response
    Legacy-->>Facade: EvaluationResult[]
    Facade-->>Client: EvaluationResult[]

    Note over Client,Page: Error Handling - LLM Failure

    Client->>Facade: ask() / batchAsk()
    Facade->>Legacy: Delegate call
    Legacy->>LLM: createChatCompletion()
    alt LLM returns invalid data
        LLM-->>Legacy: Malformed response
        Legacy->>Legacy: Catch parsing error
        Legacy-->>Facade: { evaluation: "INVALID", reasoning: error }
        Facade-->>Client: Fallback result
    else LLM client throws
        LLM-->>Legacy: Error thrown
        Legacy->>Legacy: Catch error
        Legacy-->>Facade: { evaluation: "INVALID", reasoning: error }
        Facade-->>Client: Fallback result
    end
Loading

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant