# Broker Agent Experimental Code
This is a notebook for exploring the codebase and testing ideas.
Goal is the make sure we can connect to a website and scrape/browse/interact with the page.

In [32]:
import re
import asyncio
import traceback

from langchain_ollama import ChatOllama
from playwright.async_api import async_playwright

from broker_agent.config.config import BrokerAgentConfig

In [33]:
config = BrokerAgentConfig.from_yaml_and_env()

In [34]:
ollama = ChatOllama(base_url=config.OLLAMA_BASE_URL, model=config.llm)

In [35]:
async def extract_playwright_script(script_content):
    """
    Extract the playwright script from between the ```playwright and ``` markers

    Args:
        script_content (str): The content containing the playwright script

    Returns:
        str or None: The extracted script if found, None otherwise
    """
    playwright_code = re.search(r"```playwright\n(.*?)```", script_content, re.DOTALL)
    if playwright_code:
        return playwright_code.group(1)
    return None

In [36]:
async def generate_playwright_script(website: str, dom: str, a11y_tree: str) -> str:
    generate_playwright_script_prompt = """
    Generate a Python Playwright script to navigate to the following website: {website} and search for a studio apartment
    in the $3000-$4000 range.
    The search script should handle clicking the 'Price' button, filling out the min and max price, 
    filling out the 'Bedrooms' dropdown and selecting 'Studio', and clicking the 'Search' button.
    Make sure that the generated playwright code uses the async playwright API i.e (from playwright.async_api import async_playwright).
    
    The following is the DOM content and accessibility tree of the website:
    <dom_content>
    {dom_content}
    </dom_content>

    <a11y_tree>
    {a11y_tree}
    </a11y_tree>

    Generate an executable playwright script that will navigate to the website, search for a studio apartment
    in the $3000-$4000 range, and click the 'Search' button. Make sure that the playwright script is written in Python.
    Make sure that the generated playwright code uses the async playwright API i.e (from playwright.async_api import async_playwright).
    Return the playwright script in markdown format like the following:
    ```playwright
    ...
    ```
    """
    return (await ollama.ainvoke(generate_playwright_script_prompt)).content

## Python vs. JS Playwright
If there's too few examples for python playwright, then try out the JS playwright instead

In [None]:
async def main():
    playwright = await async_playwright().start()
    browser = await playwright.chromium.launch(headless=False)

    for website in config.websites:
        print(f"Navigating to {website}")
        page = await browser.new_page()

        try:
            await page.goto(website)
            dom_content = await page.content()
            print(f"DOM content length: {len(dom_content)} characters")
            a11y_tree = await page.accessibility.snapshot()
            print(
                f"Accessibility tree obtained with {len(a11y_tree.get('children', []))} root elements"
            )

            print("Generating Playwright script...")
            script = await generate_playwright_script(website, dom_content, a11y_tree)
            print(script)

            # Extract and execute the playwright script
            extracted_script = await extract_playwright_script(script)
            if extracted_script:
                print("Executing generated Playwright script...")
                try:
                    # Execute the extracted script against the current page
                    exec_globals = {
                        "page": page,
                        "asyncio": asyncio,
                        "playwright": playwright,
                    }
                    # TODO: Is there a more secure way to do this than exec?
                    # TODO: How to handle dynamic playwright execution?
                    await exec(extracted_script, exec_globals)
                    print("Script execution completed")
                except Exception as script_error:
                    print(f"Error executing generated script: {script_error}")
                    print(f"Script execution stack trace:\n{traceback.format_exc()}")
            else:
                print("Could not extract Playwright script from generated content")

            await page.close()

            # TODO: Implement exponential backoff for politeness when visiting
            await asyncio.sleep(2)

        except Exception as e:
            print(f"Error processing {website}: {e}")
            print(f"Call stack:\n{traceback.format_exc()}")
            await page.close()

    await browser.close()
    await playwright.stop()


await asyncio.gather(main())

Navigating to https://streeteasy.com/for-rent/nyc
DOM content length: 1154000 characters
Accessibility tree obtained with 386 root elements
Generating Playwright script...
<think>
Alright, I need to generate a Playwright script using the async API in Python. The user wants to navigate to a website, search for studio apartments within $3000-$4000, and handle specific elements.

First, I'll start by importing the necessary modules from playwright.async_api. That includes async_playwright, Browser, BrowserContext, Page, FrameLocator, Locator, Download, Dialog, JSHandle, and APIResponse. Even though some might not be used immediately, it's safer to include them for potential future needs.

Next, I'll set up the basic structure using async def main() and await async_playwright(). This is standard practice when working with Playwright's async functionality.

The user provided placeholder values for the website URL and DOM content. Since I don't have the actual URL or DOM details, I'll use a 

  print(f"Script execution stack trace:\n{traceback.format_exc()}")


Navigating to https://www.apartments.com/new-york-ny/
DOM content length: 890649 characters
Accessibility tree obtained with 47 root elements
Generating Playwright script...
<think>
Okay, I need to generate a Python Playwright script using the async API. The task is to navigate to a website, search for a studio apartment priced between $3000 and $4000. Let me break down what needs to be done.

First, I'll start by importing the necessary modules. Since we're using async Playwright, I'll import async_playwright from playwright.async_api. I'll also need to use asyncio for handling asynchronous functions.

Next, I'll set up an async function, maybe called search_studio_apartment, where all the browser actions will take place inside a try block. Using async with async_playwright() as p ensures that Playwright is properly set up and torn down when done.

Within this function, I'll create a new browser context and a page. The page will navigate to the provided website using page.goto(). Then

  print(f"Script execution stack trace:\n{traceback.format_exc()}")


[None]