Skip to content

Conversation

@jeremypress
Copy link
Contributor

why

Let's say you're trying to automate buying peelers (a common occurance) and so you tell your trusty LLM to navigate to the peeler page and buy one. But on this peeler website, their call to action to add to cart is actually disjoint from the copy describing the peeler, so it doesn't know which check out button adds the peeler!

Previously, stagehand only looked at interactive elements but this breaks down pretty easily. The new approach is to include interactive and leaf elements (things that hold text). While this brings way more content to the LLM, we still have plenty of tools in our tool box to slim and chunk the dom down more as needed

what changed

  1. moved dom cleaning to it's own file
  2. added some html to peeler.html to make the case more complicated and easier to debug parsing issues
  3. the cleaner logic now uses js dom which was orders of magnitude faster than doing js eval in playwright. We'll use chunking and filtering strategies if /when memory usage becomes a problem
  4. small prompt tweaks, as we now include a comma separated list of DOM elements to the LLM

test plan

Run evals, i'll run them more once we upgrade on braintrust

@jeremypress jeremypress merged commit bfaf985 into main May 22, 2024
@pkiv pkiv deleted the jp-peeler4 branch October 29, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants