Skip to content

URLs not parsed properly by gpt-4.1 #214

@LarsHanegraaf

Description

@LarsHanegraaf

Environment Information

Stagehand:

  • Language/SDK: Python
  • Stagehand version: 0.5.4

AI Provider:

  • Provider: OpenAI
  • Model: Doesn't work with gpt-4.1-mini (but seems to work when not specifying a model, probably will use gpt-4o then)

Issue Description

As described in the documentation Stagehand uses internal ID's to map URLs to the ExtractedResult. My final result however doesn't include the URLs for all items found on the page

Steps to Reproduce

  1. Run code below (will not work)
  2. Comment the line that specifies the model (will work)

Minimal Reproduction Code

from stagehand import Stagehand
import asyncio
from dotenv import load_dotenv
from pydantic import BaseModel, HttpUrl
from typing import Optional
import json

load_dotenv()

class Agency(BaseModel):
    name: str
    url: HttpUrl
    description: Optional[str] = None
    location: Optional[str] = None

class Agencies(BaseModel):
    agencies: list[Agency]

async def main():
    stagehand = Stagehand(
        env="LOCAL",
        headless=False,
        model_name="gpt-4.1-mini"
    )
    all_agencies = []
    await stagehand.init()
    page = stagehand.page
    await page.goto("<url I want to scrape>")
    should_continue = True
    while should_continue:
        agencies = await page.extract("Extract all agencies from the page", schema=Agencies)
        all_agencies.extend([agency.model_dump(mode='json') for agency in agencies.agencies])
        next_page_button = await page.observe("Find the next page button")
        should_continue = has_next_page(next_page_button)
        if should_continue:
            await page.click(next_page_button[0].selector)
    
    print(all_agencies)
    with open("agencies.json", "w") as f:
        json.dump(all_agencies, f)


if __name__ == "__main__":
    asyncio.run(main())

Error Messages / Log trace

Image

Screenshots / Videos

Image

Related Issues

No related PRs found (also not in Typescript version)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions