-
-
Notifications
You must be signed in to change notification settings - Fork 34
Computer Use
scarecr0w12 edited this page Jun 18, 2026
·
1 revision
CortexPrism's computer use system enables AI agents to interact with graphical user interfaces through virtual displays, mouse control, keyboard input, and screenshots.
# Debian/Ubuntu
sudo apt-get install xvfb xdotool scrot x11-utils
# Fedora/RHEL
sudo dnf install xorg-x11-server-Xvfb xdotool scrot xorg-x11-utils
# Arch Linux
sudo pacman -S xorg-server-xvfb xdotool scrot xorg-utils┌──────────────────────────────────────────────────────┐
│ Computer Use System │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Display Manager │ │ Screenshot │ │
│ │ (Xvfb lifecycle)│ │ Capture │ │
│ │ - Alloc display │ │ - scrot │ │
│ │ - Health check │ │ - ImageMagick │ │
│ │ - Graceful stop │ │ - xwd fallback │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Mouse Control │ │ Keyboard │ │
│ │ (xdotool) │ │ Control │ │
│ │ - Coord move │ │ (xdotool) │ │
│ │ - Click types │ │ - Text typing │ │
│ │ - Drag ops │ │ - Key combos │ │
│ │ - Scroll │ │ - Key holding │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ Action Executor │ │
│ │ - Orchestrates all controllers │ │
│ │ - Configurable timeouts │ │
│ │ - Error handling │ │
│ │ - Policy validation gate │ │
│ └──────────────────────────────────────┘ │
│ │
│ Docker Support: │
│ ┌──────────────────────────────────────┐ │
│ │ Ubuntu 22.04 + XFCE + Firefox │ │
│ │ + Chromium + LibreOffice │ │
│ │ docker/computer-use.Dockerfile │ │
│ └──────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
| Action | Description | Parameters |
|---|---|---|
screenshot |
Capture current display state |
format (png/jpeg), quality
|
left_click |
Click at coordinates |
x, y
|
right_click |
Right-click at coordinates |
x, y
|
middle_click |
Middle-click at coordinates |
x, y
|
double_click |
Double-click at coordinates |
x, y
|
triple_click |
Triple-click at coordinates |
x, y
|
mouse_move |
Move cursor to coordinates |
x, y
|
left_click_drag |
Drag from point A to point B |
startX, startY, endX, endY
|
left_mouse_down |
Press left mouse button |
x, y
|
left_mouse_up |
Release left mouse button |
x, y
|
type |
Type text string |
text, delay (ms between chars) |
key |
Press key or combination |
key (e.g. "ctrl+s", "alt+tab") |
hold_key |
Hold key for duration |
key, duration (ms) |
scroll |
Scroll in direction |
direction (up/down/left/right), amount
|
wait |
Pause between actions |
duration (ms) |
- All actions require user approval by default (configurable)
- Actions validated against security policies before execution
- Sensitive data detection prevents typing passwords/API keys
- All operations logged in Cortex Lens audit system
- Runs in isolated virtual display (not host display)
- No direct filesystem access (use separate file tools)
{
"computerUse": {
"enabled": true,
"display": {
"width": 1920,
"height": 1080
},
"runtime": "native",
"screenshot": {
"format": "png",
"quality": 90
},
"timeout": {
"action": 5000,
"display": 10000
},
"approval": {
"requireApproval": true,
"autoApproveReadOnly": false
},
"docker": {
"image": "cortexprism/computer-use:latest"
}
}
}Pre-built Docker image (docker/computer-use.Dockerfile) includes:
- Ubuntu 22.04 base
- XFCE desktop environment
- Firefox + Chromium browsers
- LibreOffice suite
- Xvfb + xdotool + scrot
- Automatic Xvfb startup
docker build -t cortexprism/computer-use -f docker/computer-use.Dockerfile .- Screenshot gallery — browse captured screenshots with timestamps
- Action log — full history of all computer use actions with results
- Display configuration — resolution (640-3840 × 480-2160), runtime selection
- Approval settings — require approval toggle, auto-approve read-only
-
Built-in Tools —
computertool reference - Security — Policy validation and approval gates
- Security Supervisor — LLM-based access control
CortexPrism — Open-source agentic AI harness · Discord · MIT License · Built with Deno 2.x + TypeScript
- Agent Loop
- Metacognition
- Memory System
- Skills System
- Sub-Agents
- Built-in Tools
- Code Intelligence
- Code Sandbox
- Cross-Agent Context Protocol
- Prompt Lab
- PKM Assistant
- Voice Pipeline
- Computer Use
- Browser Tool
- Git & GitHub
- Scheduler & Jobs
- Dashboard
- Observability
- A2A Protocol
- MCP Gateway
- Distributed Nodes
- Memori Checkpoints
- Eval System
- Workflow Engine
- Triggers
- Projects
- TUI
- Glossary
- Update System
- Chrome Bridge
- AgentLint