Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions agent-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,92 @@ Get HTML or text content of a page.
}
```

#### `POST /page/execute`

Execute JavaScript code in the context of a specific browser tab via Chrome DevTools Protocol.

**Request:**
```json
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.title",
"returnByValue": true,
"awaitPromise": false
}
```

**Parameters:**
- `clientId` (required): The client ID from `/v1/responses` metadata
- `tabId` (required): The tab ID from `/v1/responses` metadata
- `expression` (required): JavaScript code to execute (string)
- `returnByValue` (optional, default: `true`): Whether to return result by value or as object reference
- `awaitPromise` (optional, default: `false`): Whether to await if the result is a Promise

**Response:**
```json
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"result": {
"type": "string",
"value": "Example Page Title"
},
"exceptionDetails": null,
"timestamp": 1234567890
}
```

**Response Fields:**
- `clientId`: Base client ID (without tab suffix)
- `tabId`: The tab ID where JavaScript was executed
- `result`: CDP `Runtime.evaluate` result object containing:
- `type`: Result type (string, number, object, etc.)
- `value`: The actual value (if `returnByValue: true`)
- `exceptionDetails`: Error details if execution failed, otherwise `null`
- `timestamp`: Unix timestamp in milliseconds

**Example Usage:**

```bash
# Get page title
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.title"
}'

# Count elements
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.querySelectorAll(\"button\").length"
}'

# Execute async code with await
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "fetch(\"https://api.example.com/data\").then(r => r.json())",
"awaitPromise": true
}'
```

**Use Cases:**
- Extract specific data from the page (e.g., element counts, text content)
- Verify JavaScript state/variables for evaluations
- Check DOM state programmatically
- Execute custom validation logic
- Interact with page APIs directly

This endpoint complements `/page/content` by allowing precise JavaScript execution rather than just fetching full HTML/text content.

#### `POST /tabs/open`

Open a new browser tab.
Expand Down
120 changes: 118 additions & 2 deletions agent-server/nodejs/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The eval-server is a **thin HTTP API wrapper for Browser Operator**. It provides
### HTTP API Server (src/api-server.js)
- Exposes REST endpoints for external callers (e.g., Python evals)
- Main endpoint: `POST /v1/responses` - Send task to agent
- CDP endpoints: screenshot, page content, tab management
- CDP endpoints: screenshot, page content, JavaScript execution, tab management
- Returns metadata (clientId, tabId) for subsequent operations

### RPC Client (src/rpc-client.js)
Expand All @@ -57,6 +57,7 @@ The eval-server is a **thin HTTP API wrapper for Browser Operator**. It provides
- Direct Chrome DevTools Protocol communication
- Screenshot capture via `Page.captureScreenshot`
- Page content access via `Runtime.evaluate`
- JavaScript execution via `Runtime.evaluate` (with configurable options)
- Tab management via `Target.createTarget` / `Target.closeTarget`

### Logger (src/logger.js)
Expand Down Expand Up @@ -208,10 +209,124 @@ Get HTML or text content of a page.
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"format": "html"
"format": "html",
"includeIframes": true
}
```

**Parameters:**
- `clientId` (required): The client ID from `/v1/responses` metadata
- `tabId` (required): The tab ID from `/v1/responses` metadata
- `format` (optional, default: `"html"`): Content format - either `"html"` or `"text"`
- `includeIframes` (optional, default: `false`): Whether to include HTML content from iframes. When `true`, recursively captures content from all iframe elements on the page.

**Response:**
```json
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"content": "<html>...</html>",
"format": "html",
"length": 12345,
"frameCount": 3,
"timestamp": 1234567890
}
```

**Response fields:**
- `frameCount` (number, optional): Number of frames included in the content. Only present when `includeIframes: true` is used.

### POST /page/execute

Execute JavaScript code in the context of a specific browser tab via Chrome DevTools Protocol.

**Request:**
```json
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.title",
"returnByValue": true,
"awaitPromise": false
}
```

**Parameters:**
- `clientId` (required): The client ID from `/v1/responses` metadata
- `tabId` (required): The tab ID from `/v1/responses` metadata
- `expression` (required): JavaScript code to execute (string)
- `returnByValue` (optional, default: `true`): Whether to return result by value or as object reference
- `awaitPromise` (optional, default: `false`): Whether to await if the result is a Promise

**Response:**
```json
{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"result": {
"type": "string",
"value": "Example Page Title"
},
"exceptionDetails": null,
"timestamp": 1234567890
}
```

**Response Fields:**
- `clientId`: Base client ID (without tab suffix)
- `tabId`: The tab ID where JavaScript was executed
- `result`: CDP `Runtime.evaluate` result object containing:
- `type`: Result type (string, number, object, etc.)
- `value`: The actual value (if `returnByValue: true`)
- `exceptionDetails`: Error details if execution failed, otherwise `null`
- `timestamp`: Unix timestamp in milliseconds

**Implementation:**
- Uses CDP `Runtime.evaluate` via `browserAgentServer.evaluateJavaScript()`
- Executes code in the page's main JavaScript context
- First 100 characters of expression logged for debugging

**Example Usage:**

```bash
# Get page title
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.title"
}'

# Count elements
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "document.querySelectorAll(\"button\").length"
}'

# Execute async code with await
curl -X POST http://localhost:8080/page/execute \
-H "Content-Type: application/json" \
-d '{
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
"expression": "fetch(\"https://api.example.com/data\").then(r => r.json())",
"awaitPromise": true
}'
```

**Use Cases:**
- Extract specific data from the page (e.g., element counts, text content)
- Verify JavaScript state/variables for evaluations
- Check DOM state programmatically
- Execute custom validation logic
- Interact with page APIs directly

This endpoint complements `/page/content` by allowing precise JavaScript execution rather than just fetching full HTML/text content.

### POST /tabs/open, POST /tabs/close

Tab management via CDP.
Expand Down Expand Up @@ -412,5 +527,6 @@ Removed dependencies:
- ✅ HTTP REST API endpoints
- ✅ CDP screenshot capture
- ✅ CDP page content retrieval
- ✅ CDP JavaScript execution
- ✅ CDP tab management
- ✅ Return metadata (clientId, tabId) for screenshot capture
17 changes: 12 additions & 5 deletions agent-server/nodejs/src/api-server.js
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ class APIServer {
}

async getPageContent(payload) {
const { clientId, tabId, format = 'html' } = payload;
const { clientId, tabId, format = 'html', includeIframes = false } = payload;

if (!clientId) {
throw new Error('Client ID is required');
Expand All @@ -300,21 +300,28 @@ class APIServer {

const baseClientId = clientId.split(':')[0];

logger.info('Getting page content', { baseClientId, tabId, format });
logger.info('Getting page content', { baseClientId, tabId, format, includeIframes });

// Call appropriate method based on format
const result = format === 'html'
? await this.browserAgentServer.getPageHTML(tabId)
: await this.browserAgentServer.getPageText(tabId);
? await this.browserAgentServer.getPageHTML(tabId, { includeIframes })
: await this.browserAgentServer.getPageText(tabId, { includeIframes });

return {
const response = {
clientId: baseClientId,
tabId: result.tabId,
content: result.content,
format: result.format,
length: result.length,
timestamp: Date.now()
};

// Include frame count if iframes were captured
if (result.frameCount !== undefined) {
response.frameCount = result.frameCount;
}

return response;
}

async getScreenshot(payload) {
Expand Down
Loading