Skip to content

Port ComputerUseMixin workflow learning and replay from gaia6 #544

@kovtcharov

Description

@kovtcharov

Summary

Port ComputerUseMixin from gaia6 (agents/base/computer_use.py, ~1,176 lines). Enables agents to learn browser workflows by recording actions, then replay them with parameter substitution.

Tools Provided

  • learn_workflow(name, url) — Record browser actions as a replayable skill
  • replay_workflow(name, params) — Execute a learned skill with parameter substitution
  • list_workflows(filter) — List all learned skills
  • test_workflow(name) — Replay in visible mode for verification

Architecture

  • Uses PlaywrightBridge for browser automation (abstracted for testability)
  • Skills stored in KnowledgeDB as category="skill" with metadata.type="replay"
  • Screenshots stored in ~/.gaia/skills/{insight_id}/step_N.png
  • Depends on MemoryMixin (KnowledgeDB for skill storage)

Reconciliation with Existing CUA Issues

Existing issues (#224, #458-#461) approach CUA from an MCP perspective. This gaia6 approach is mixin-based with workflow learning. Need to reconcile:

Source

Port from gaia6/src/gaia/agents/base/computer_use.py

Acceptance Criteria

  • Workflow learning records browser actions
  • Replay executes with parameter substitution
  • Skills persist across sessions via KnowledgeDB
  • Screenshots captured per step
  • Unit tests (port from gaia6: test_computer_use.py)
  • Integration tests (port from gaia6: test_computer_use_e2e.py)

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentcuaComputer Use Agentdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requestp1medium prioritytrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions