feat: expose OpenWork UI control plane and MCP bridge#1638
Closed
benjaminshafii wants to merge 9 commits intodevfrom
Closed
feat: expose OpenWork UI control plane and MCP bridge#1638benjaminshafii wants to merge 9 commits intodevfrom
benjaminshafii wants to merge 9 commits intodevfrom
Conversation
…ion, and inline transcript panel Add app-native voice control via OpenAI Realtime WebRTC so users can drive visible UI actions hands-free through microphone input. - Provider-neutral control surface (window.__openworkControl) with snapshot, listActions, execute, setEnabled, and subscribe - OpenAI Realtime WebRTC bridge with mic input, server VAD, text output, and tool calling (snapshot, list_actions, execute_action, set_input, list_sessions, open_session) - Server endpoint POST /remote/session mints ephemeral client secrets with key from env store; no secrets in browser - Feature Preview settings tab with Realtime toggle, OpenAI key entry, mic selector/test, and transcript panel toggle - Inline right-side Voice transcript pane (not overlay) showing user speech, assistant responses, and tool call lifecycle - Session list/open control actions so voice can navigate by name - Electron mic permission plumbing and macOS entitlements - Stale mic device fallback (OverconstrainedError → system default)
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
|
The following comment was made by an LLM, it may be inaccurate: |
Add voice-accessible controls for renaming and deleting sessions, scrolling the current session to the top or bottom, and reading the latest visible message. Extend the Realtime tool surface so the model can use these actions directly while requiring explicit confirmation for deletion.
Reorganize the realtime voice-control PR so the generic OpenWork control surface lives independently from the OpenAI Realtime driver. Move session-owned control actions into the session domain, move the OpenAI browser driver and activity/status UI into a driver folder, and move backend Realtime session/tool setup out of server.ts.
…tatus bar, better mic test
Activity panel:
- Rename header from "Voice" to "Control" (generic surface, not voice-specific)
- Replace colored role bubbles with softer tints: structure before effects
- Add proper role labels ("You", "Assistant", "Tool") instead of raw role names
- Add relative timestamps on entries ("now", "12s ago", etc.)
- Add pending-dot animation for in-flight entries
- Add dismiss (X) button to hide the panel inline
- Add empty-state icon + descriptive copy
- Reduce width from 300px to 280px for tighter proportion
- Remove shell shadow on the aside (flat-first per DESIGN-LANGUAGE.md)
Status bar control:
- Replace round pill + text label with minimal icon-only button
- Show compact state text ("Listening", "Connecting…", "Error") without truncation
- Use MicOff icon for disconnect affordance
- Remove background fills; use text color only for state (flatter)
Feature Preview settings:
- Thinner mic level bar (1.5px → cleaner)
- Color-coded level: gray idle → accent low → green strong
- Show numeric percentage during test
- Remove Volume2 icon from test description
- Tighter copy for mic test prompt
…ession The voice controller could list/open/rename sessions but couldn't read the content of the currently active session. "What's the last message?" would fail because the model didn't know it had access. Changes: - Add session.read_transcript control action (returns last N messages as readable text with session ID, title, and message count) - Add read_transcript tool to OpenAI Realtime tool schema - Add controller handler for read_transcript dispatching to the action - Improve system instructions: tell the model it CAN see session content and should always call read_transcript/get_latest_message before saying it cannot see the session - Better tool label for transcript reads in activity panel
…on composer
When the user says something like "tell them I'll be there at 3" or
"reply that looks good", the intent is to type and send that as a message
in the active OpenWork session — not to get a response from the voice
controller itself.
Add REPLY INTENT instructions that tell the model to:
1. read_transcript to understand the on-screen conversation
2. compose the reply from the user's spoken words
3. set_input → composer.set_text with the reply
4. execute_action → composer.send
Direct commands to the controller ("list sessions", "open settings")
still get handled directly. When ambiguous, default to treating spoken
input as a session reply — that's the most common intent when the user
is looking at a conversation.
New standalone Electron menubar app at apps/pilot/ that controls macOS via voice. Pilot is the top-level control surface; OpenWork and other apps are connectable targets. What's included: - Electron main process: menubar tray, floating always-on-top panel, global hotkeys (⌘⇧; toggle panel, ⌘⇧L toggle listening) - System control via AppleScript IPC: - list/activate/launch apps - frontmost app detection - keystroke/key-combo injection - clipboard read/write - open URL - Preload bridge: window.__PILOT__.system.* for the UI and future Realtime driver - Floating panel UI: dark vibrancy glass, transcript area, status, mic button, empty state with hotkey hints - macOS entitlements: microphone + AppleScript automation - LSUIElement: true (no dock icon, menubar-only) - electron-builder config for packaging Verified: panel shows, detects frontmost app via AppleScript, counts 18 running apps. System IPC bridge functional. Next: wire up OpenAI Realtime driver with system tools, add OpenWork app connector protocol.
Pilot now owns the Realtime voice driver as the standalone macOS control app. What's included: - Main-process OpenAI Realtime session creation with local API key persistence so long-lived OpenAI keys never enter the renderer - Tool schema for macOS control: snapshot, list/frontmost apps, activate/launch app, type text, press key combo, clipboard read/write, and open URL - Renderer WebRTC Realtime driver with microphone capture, SDP exchange, data-channel tool-call handling, transcript logging, and tool results - Panel settings UI for saving the OpenAI key locally - Panel states for ready/connecting/listening/error and Realtime transcript/tool activity - Vite config so Pilot packages the static panel correctly Verified: - pnpm --filter @openwork/pilot build:ui - pnpm --filter @openwork/pilot package:dir - Launched packaged/dev Pilot panel; AppleScript frontmost-app and list-apps calls still work.
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openwork-ui-mcppackage so external MCP clients can useui_status,ui_snapshot,ui_list_actions, andui_execute_action.OpenWork improvements
Semantic UI control plane
window.__openworkControlregistry withsnapshot(),listActions(),execute(),setEnabled(), andsubscribe().Session and composer actions
session.create_task,session.list_sessions,session.open,session.rename,session.delete,session.latest_message, andsession.read_transcript.composer.set_text,composer.send, andcomposer.stop.MCP-facing OpenWork bridge
userData.packages/openwork-ui-mcpstdio MCP server proxies that bridge as MCP tools:ui_statusui_snapshotui_list_actionsui_execute_actiondocs/mcp-ui-control-profile.mddocuments the intended semantic MCP profile for OpenWork UI control.Optional Realtime preview driver
shell/control-drivers/openai-realtime/.apps/server/src/remote-control/openai-realtime.ts; long-lived OpenAI API keys do not go to the browser.Architecture intent
This PR is not about making voice or OpenAI the foundation of OpenWork control. The durable layer is OpenWork-owned:
Screenshots
Verification
Previously run on this branch:
pnpm --filter @openwork/app typecheck✅pnpm --filter openwork-server typecheck✅pnpm --filter openwork-server build:bin✅pnpm --filter @openwork/desktop package:electron:dir✅session.list_sessionsreturned 30 sessions ✅Latest extraction/MCP sanity checks:
pnpm install --lockfile-only✅node --check packages/openwork-ui-mcp/index.mjs✅node --check apps/desktop/electron/main.mjs✅