[Claimed #1892] Support gpt 5.4 cua upd#2022
Conversation
|
This mirrored PR has been merged into |
|
There was a problem hiding this comment.
2 issues found across 5 files
Confidence score: 3/5
- There is moderate merge risk because both findings are medium severity (5–6/10) with high confidence (8–9/10), indicating a likely policy and maintainability regression rather than a cosmetic issue.
- In
packages/core/lib/v3/agent/OpenAICUAClient.ts, adding a hardcodedgpt-5model-name gate can block valid models and create user-facing failures as model catalogs evolve. - In
packages/core/lib/v3/types/public/agent.ts, extending a hardcoded CUA allowlist repeats the same anti-pattern, increasing the chance of drift and inconsistent model acceptance behavior. - Pay close attention to
packages/core/lib/v3/agent/OpenAICUAClient.ts,packages/core/lib/v3/types/public/agent.ts- hardcoded allowed-model checks may cause avoidable model rejection and future regressions.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/types/public/agent.ts">
<violation number="1" location="packages/core/lib/v3/types/public/agent.ts:452">
P2: Custom agent: **Ensure we never check against hardcoded lists of allowed LLM model names**
New code adds another hardcoded allowed model name to CUA allowlist, violating the rule to avoid hardcoded allowed-model checks.</violation>
</file>
<file name="packages/core/lib/v3/agent/OpenAICUAClient.ts">
<violation number="1" location="packages/core/lib/v3/agent/OpenAICUAClient.ts:60">
P2: Custom agent: **Ensure we never check against hardcoded lists of allowed LLM model names**
New code hardcodes a `gpt-5` model-name gate, violating the rule against hardcoded allowed-model checks.</violation>
</file>
Architecture diagram
sequenceDiagram
participant App as User Application
participant Prov as AgentProvider
participant Client as OpenAICUAClient
participant API as OpenAI API
participant Page as Browser Page
Note over App,Prov: Initialization
App->>Prov: Request agent (model: "gpt-5.4")
Prov->>Prov: NEW: Map gpt-5.4 to "openai" provider
Prov-->>Client: Create OpenAICUAClient
Note over Client,API: Execution Loop
App->>Client: execute(instruction)
loop Agent Steps
Client->>Client: NEW: check usesNewComputerTool (model ~ gpt-5)
Client->>API: Send history + tools
Note right of Client: NEW: uses "computer" tool for gpt-5<br/>Legacy: uses "computer_use_preview"
API-->>Client: Return tool call (ComputerCallItem)
alt NEW: Batched Actions (GPT-5.4)
loop for action in actions[]
Client->>Page: executeAction(action)
Page-->>Client: success/fail
end
else Legacy Single Action
Client->>Page: executeAction(action)
Page-->>Client: success/fail
end
Client->>Page: captureScreenshot()
Page-->>Client: base64 string
alt NEW: GPT-5.4 Response Format
Client->>Client: Build "computer_screenshot" output
Note over Client: Sets detail: "original"
else Legacy Response Format
Client->>Client: Build "input_image" output
Note over Client: CHANGED: current_url excluded from new tool format
end
Client->>API: POST tool_outputs (screenshot + call_id)
end
Client-->>App: Return final result message
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
pirate
left a comment
There was a problem hiding this comment.
this is kinda gnarly, I wish aisdk would handle difference like this for us but 🤷
Mirrored from external contributor PR #1892 after approval by @miguelg719.
Original author: @alexcarv318
Original PR: #1892
Approved source head SHA:
0c6031748e17eebfd6ef570e15c13fd1f4d25022@alexcarv318, please continue any follow-up discussion on this mirrored PR. When the external PR gets new commits, this same internal PR will be marked stale until the latest external commit is approved and refreshed here.
Original description
why
The original GPT-5.4 native CUA support work from #1792 could not be merged cleanly due to conflicts with current
main.This PR carries that work forward on top of the latest code so the feature can be reviewed and merged.
All implementation credit goes to @Kylejeong2. I only resolved merge conflicts and migrated the changes onto the current branch.
what changed
main.OpenAICUAClientwhile preserving both:computer_call.actionshandling.test plan
corepack pnpm installcorepack pnpm run test:core -- packages/core/dist/esm/tests/unit/public-api/llm-and-agents.test.jscorepack pnpm --filter @browserbasehq/stagehand run example -- gpt54-cua-exampleOPENAI_API_KEYfor full runtime success)Summary by cubic
Adds native Computer Use for OpenAI
gpt-5.4using the newcomputertool with batched actions, while keeping the legacycomputer_use_previewflow for compatibility.New Features
gpt-5.4to the OpenAI provider and addopenai/gpt-5.4toAVAILABLE_CUA_MODELS.gpt-5.x, acceptactionoractions[], execute all in a batch, and reply withcomputer_screenshot(withdetail).action,input_imageoutputs, andcurrent_urlon outputs.gpt5-4-cua-example.tsand update types/tests for batched actions andcomputer_screenshotoutputs.Bug Fixes
computer_call/batch.Written for commit 610cbc7. Summary will update on new commits. Review in cubic