Skip to content

Computer Use

scarecr0w12 edited this page Jun 18, 2026 · 1 revision

Computer Use (GUI Automation)

CortexPrism's computer use system enables AI agents to interact with graphical user interfaces through virtual displays, mouse control, keyboard input, and screenshots.

Requirements (Linux Only)

# Debian/Ubuntu
sudo apt-get install xvfb xdotool scrot x11-utils

# Fedora/RHEL
sudo dnf install xorg-x11-server-Xvfb xdotool scrot xorg-x11-utils

# Arch Linux
sudo pacman -S xorg-server-xvfb xdotool scrot xorg-utils

Architecture

┌──────────────────────────────────────────────────────┐
│                Computer Use System                    │
│                                                       │
│  ┌─────────────────┐  ┌─────────────────┐           │
│  │  Display Manager │  │  Screenshot     │           │
│  │  (Xvfb lifecycle)│  │  Capture        │           │
│  │  - Alloc display │  │  - scrot        │           │
│  │  - Health check  │  │  - ImageMagick  │           │
│  │  - Graceful stop │  │  - xwd fallback │           │
│  └─────────────────┘  └─────────────────┘           │
│                                                       │
│  ┌─────────────────┐  ┌─────────────────┐           │
│  │  Mouse Control   │  │  Keyboard       │           │
│  │  (xdotool)       │  │  Control        │           │
│  │  - Coord move    │  │  (xdotool)      │           │
│  │  - Click types   │  │  - Text typing  │           │
│  │  - Drag ops      │  │  - Key combos   │           │
│  │  - Scroll        │  │  - Key holding  │           │
│  └─────────────────┘  └─────────────────┘           │
│                                                       │
│  ┌──────────────────────────────────────┐            │
│  │        Action Executor                │            │
│  │  - Orchestrates all controllers       │            │
│  │  - Configurable timeouts              │            │
│  │  - Error handling                     │            │
│  │  - Policy validation gate             │            │
│  └──────────────────────────────────────┘            │
│                                                       │
│  Docker Support:                                      │
│  ┌──────────────────────────────────────┐            │
│  │  Ubuntu 22.04 + XFCE + Firefox       │            │
│  │  + Chromium + LibreOffice            │            │
│  │  docker/computer-use.Dockerfile       │            │
│  └──────────────────────────────────────┘            │
└──────────────────────────────────────────────────────┘

Available Actions (15)

Action Description Parameters
screenshot Capture current display state format (png/jpeg), quality
left_click Click at coordinates x, y
right_click Right-click at coordinates x, y
middle_click Middle-click at coordinates x, y
double_click Double-click at coordinates x, y
triple_click Triple-click at coordinates x, y
mouse_move Move cursor to coordinates x, y
left_click_drag Drag from point A to point B startX, startY, endX, endY
left_mouse_down Press left mouse button x, y
left_mouse_up Release left mouse button x, y
type Type text string text, delay (ms between chars)
key Press key or combination key (e.g. "ctrl+s", "alt+tab")
hold_key Hold key for duration key, duration (ms)
scroll Scroll in direction direction (up/down/left/right), amount
wait Pause between actions duration (ms)

Security

  • All actions require user approval by default (configurable)
  • Actions validated against security policies before execution
  • Sensitive data detection prevents typing passwords/API keys
  • All operations logged in Cortex Lens audit system
  • Runs in isolated virtual display (not host display)
  • No direct filesystem access (use separate file tools)

Configuration

{
  "computerUse": {
    "enabled": true,
    "display": {
      "width": 1920,
      "height": 1080
    },
    "runtime": "native",
    "screenshot": {
      "format": "png",
      "quality": 90
    },
    "timeout": {
      "action": 5000,
      "display": 10000
    },
    "approval": {
      "requireApproval": true,
      "autoApproveReadOnly": false
    },
    "docker": {
      "image": "cortexprism/computer-use:latest"
    }
  }
}

Docker Support

Pre-built Docker image (docker/computer-use.Dockerfile) includes:

  • Ubuntu 22.04 base
  • XFCE desktop environment
  • Firefox + Chromium browsers
  • LibreOffice suite
  • Xvfb + xdotool + scrot
  • Automatic Xvfb startup
docker build -t cortexprism/computer-use -f docker/computer-use.Dockerfile .

Web UI

  • Screenshot gallery — browse captured screenshots with timestamps
  • Action log — full history of all computer use actions with results
  • Display configuration — resolution (640-3840 × 480-2160), runtime selection
  • Approval settings — require approval toggle, auto-approve read-only

See Also

Clone this wiki locally