Skip to content

Add progressive agent skills for reusable terminal/browser workflows #1

@Cheggin

Description

@Cheggin

Context

Browser Use Desktop now has a provider-neutral skills pattern that should be adapted for browser-use-terminal.

The important Desktop idea is not "browser-only skills" and not automatic URL memory. It is progressive procedural memory:

  • Keep reusable instructions as local skill files.
  • Inject only a compact metadata index into the agent prompt.
  • Require the agent to explicitly search/view a skill before relying on full instructions.
  • Let the agent create, patch, validate, or delete only persistent user skills under controlled rules.
  • Treat skills as broad procedural memory: browser workflows, terminal workflows, debugging recoveries, repo conventions, output/reporting preferences, and recurring user processes.

Gregor's old per-task skill-memory experiment is useful mainly as prior art for the reflection checklist after a task: look for failure recovery, retries, uncertainty, stable selectors, auth quirks, API shapes, CLI commands, config paths, data formats, and verification steps that future agents should not rediscover. We should not copy its browser-only URL memory architecture.

browser-use-terminal already has adjacent pieces:

  • prompts/browser-agent-system.md explains the agent contract and browser-harness workflow.
  • prompts/interaction-skills/ contains read-only browser mechanics guidance.
  • crates/browser-use-core/src/tools/mod.rs owns the provider-neutral tool registry.
  • crates/browser-use-store owns the state dir and session event stream.
  • crates/browser-use-tui renders session events in the terminal UI.

Proposal

Add progressive skills to browser-use-terminal as reusable procedural memory.

1. Storage layout

Use the existing state-dir model:

.browser-use-terminal/
  skills/
    workflow/<name>/SKILL.md
    browser/<name>/SKILL.md
    debugging/<name>/SKILL.md
    repo/<name>/SKILL.md

Bundled prompt skills stay read-only. User-created skills live under the state dir so they persist across sessions and are not overwritten by app updates.

A user SKILL.md should use simple frontmatter:

---
name: crm-triage
summary: Reusable CRM queue triage workflow after repeated account checks
---

# CRM Triage

Use when...

## Steps
...

## Verification
...

## Gotcha
X What seems obvious but fails
V What actually works

The X/V section is optional. It is useful for sharp lessons learned from a run, but the skill should stay one coherent reusable procedure rather than one tiny skill per gotcha.

2. Compact index injection

Inject a compact skill index into the prompt, not full skill bodies and not hidden runtime reminders after every browser action.

The index should include:

  • bundled prompts/interaction-skills/ metadata
  • optional future bundled prompts/domain-skills/ metadata
  • user state-dir skills/**/SKILL.md metadata

Example prompt section:

## Available Skills
Compact metadata index only. If a skill looks relevant, load full instructions with `skill_view` before using it.

### User skills
- user/workflow/crm-triage: CRM Triage - Reusable CRM queue triage workflow...

### Interaction skills
- interaction/screenshots: Screenshots - Capture and verify visual state...

This is the only "injection" proposed for the first version: normal prompt-time metadata injection. It should not auto-inject URL-matched tips into tool outputs.

3. Provider-neutral skill tools

Because terminal has a shared Rust tool registry, prefer native tools over a shell CLI wrapper:

  • skill_list
  • skill_search(query, limit?)
  • skill_view(id)
  • skill_create(id, summary, body)
  • skill_patch(id, old, new, replace_all?)
  • skill_delete(id) for user skills only
  • skill_validate(id)

Implementation surface:

  • add handler kinds in crates/browser-use-core/src/tools/mod.rs
  • add filesystem-backed implementation alongside tools/files.rs
  • keep path traversal protection and restrict writes/deletes to state-dir user skills
  • emit skill.used for view/search hits and skill.written for create/patch/delete

4. Prompt lifecycle rules

Update prompts/browser-agent-system.md with broad, non-browser-only guidance:

  • Search/view relevant skills before inventing browser, repo, terminal, debugging, or workflow-specific steps.
  • After a successful nontrivial task, create or patch a user skill only if the new procedure is likely to repeat, long-running enough to justify reuse, or generally applicable beyond the current session.
  • Do not write skills for one-off facts/calculations, temporary page state, secrets/tokens, private account details, failed/speculative workflows, or content that belongs in the task output.

Add a lightweight post-task reflection checklist:

  • Did the run discover a repeatable procedure?
  • Did it recover from an error in a way future agents should know?
  • Did it learn a stable selector, API shape, auth flow, CLI command, config path, file layout, data format, or verification step?
  • Did an existing skill help, fail, or need patching?

5. Non-goals

Do not copy the old cloud skill-memory architecture for this first version:

  • No DB-backed URL-prefix memories.
  • No one-gotcha-per-skill auto-generation.
  • No automatic hidden injection after every navigation/tool call.
  • No creating active skills from failed/speculative workflows.
  • No free-text URL/tag dedupe as the only quality gate.

6. Tests and acceptance criteria

Add focused tests for:

  • frontmatter parsing and invalid/missing summaries
  • index construction and truncation
  • search ranking over id/title/summary/body
  • skill_view returns full instructions and emits skill.used
  • create/patch/delete are restricted to user skills and block traversal
  • prompt contains compact index but not full user skill bodies
  • no-write guidance is present for one-offs, secrets, failed/speculative workflows

Add a small eval/smoke script:

  • find an existing interaction skill
  • create a new reusable workflow skill after a successful task
  • patch an existing skill after learning a better step
  • refuse to write a skill for a one-off calculation
  • refuse to write a skill for secret/private data

For terminal UI changes, run the repo's existing verification path:

scripts/verify-terminal-ui.sh

Why this shape

This is primarily the Browser Use Desktop skills model adapted to terminal. The only thing borrowed from Gregor's per-task work is the idea that a completed task can reveal reusable lessons. The system shape should stay conservative: broad procedural skills, explicit loading, compact metadata injection, validation, and write restrictions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions