Skip to content

Conversation

@shivammittal274
Copy link
Contributor

No description provided.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR updates the system prompt for the Claude SDK agent to reflect a fundamental architectural shift from Chrome DevTools Protocol (CDP) page-based operations to BrowserOS controller tab-based operations. The key changes include: replacing the implicit page selection workflow (list_pages/select_page) with explicit tab ID management using browser_list_tabs/browser_switch_tab, renaming all CDP-style tools (take_snapshot, click, fill) to controller-style tools (browser_get_interactive_elements, browser_click_element, browser_type_text), switching from uid-based element targeting to nodeId-based targeting, and making tabId a required parameter for most browser operations. The updated prompt now documents 15+ tools across categories including Tab Management, Navigation & Content, Interaction, Scrolling, and Advanced operations. This change aligns the agent's understanding with the new controller-extension-based architecture where a browser extension communicates via WebSocket with the agent, replacing direct CDP control with a more explicit, tab-centric model.

Important Files Changed

Changed Files
Filename Score Overview
packages/agent/src/agent/ClaudeSDKAgent.prompt.ts 4/5 Updated Claude agent system prompt from CDP page-based workflow to controller tab-based workflow with explicit tab ID management and nodeId-based element targeting

Confidence score: 4/5

  • This PR is generally safe to merge but requires verification that the documented tool signatures match actual implementations
  • Score reflects the critical dependency on alignment between this prompt and the actual tool implementations in packages/tools/src/controller-based/ - any mismatch will cause agent failures when Claude attempts to call tools with incorrect parameters or non-existent tool names
  • Pay close attention to packages/agent/src/agent/ClaudeSDKAgent.prompt.ts to ensure all documented tool signatures (especially parameter requirements like tabId being required vs optional) match the actual tool definitions in the controller-based tools package

Sequence Diagram

sequenceDiagram
    participant User
    participant Agent as ClaudeSDKAgent
    participant Browser as BrowserOS
    
    User->>Agent: "Request browser automation task"
    
    rect rgb(240, 248, 255)
        Note over Agent,Browser: Tab Identification
        Agent->>Browser: "browser_list_tabs() or browser_get_active_tab()"
        Browser-->>Agent: "Return tab information with IDs"
        
        opt If different tab needed
            Agent->>Browser: "browser_switch_tab(tabId)"
            Browser-->>Agent: "Tab switched confirmation"
        end
    end
    
    rect rgb(255, 250, 240)
        Note over Agent,Browser: Navigation (if needed)
        Agent->>Browser: "browser_navigate(url, tabId)"
        Browser-->>Agent: "Navigation complete"
    end
    
    rect rgb(240, 255, 240)
        Note over Agent,Browser: Content Analysis
        Agent->>Browser: "browser_get_interactive_elements(tabId)"
        Browser-->>Agent: "Return elements with nodeIds"
        
        opt If visual context needed
            Agent->>Browser: "browser_get_screenshot(tabId)"
            Browser-->>Agent: "Return screenshot with bounding boxes"
        end
        
        opt If text extraction needed
            Agent->>Browser: "browser_get_page_content(tabId, type)"
            Browser-->>Agent: "Return page content"
        end
    end
    
    rect rgb(255, 240, 245)
        Note over Agent,Browser: Interaction
        alt Click action
            Agent->>Browser: "browser_click_element(tabId, nodeId)"
            Browser-->>Agent: "Click executed"
        else Type action
            Agent->>Browser: "browser_type_text(tabId, nodeId, text)"
            Browser-->>Agent: "Text typed"
        else Clear input
            Agent->>Browser: "browser_clear_input(tabId, nodeId)"
            Browser-->>Agent: "Input cleared"
        else Scroll to element
            Agent->>Browser: "browser_scroll_to_element(tabId, nodeId)"
            Browser-->>Agent: "Scrolled to element"
        end
    end
    
    rect rgb(245, 245, 245)
        Note over Agent,Browser: Advanced Operations
        opt Execute JavaScript
            Agent->>Browser: "browser_execute_javascript(tabId, code)"
            Browser-->>Agent: "Execution result"
        end
        
        opt Send keyboard keys
            Agent->>Browser: "browser_send_keys(tabId, key)"
            Browser-->>Agent: "Keys sent"
        end
    end
    
    Agent-->>User: "Return task results"
Loading

Context used:

  • Context from dashboard - CLAUDE.md (source)

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@shadowfax92 shadowfax92 merged commit 007aa91 into main Oct 22, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants