Skip to content

Linxc-gituser/gui-ai-bridge

Repository files navigation

GUI AI Bridge (Windows First)


中文指引

1. 概述

GUI AI Bridge(建议命令名为 guib)是一个本地 GUI 自动化 CLI 工具:

  • get:读取当前窗口语义结构(无障碍树)
  • do:执行点击、输入、热键、滚动、拖拽
  • do instruction:将自然语言指令拆解为动作链并执行

当前版本以 Windows 为主。

2. 推荐安装方式(预编译 EXE)

这条路线最简单:不需要 Python,下载后加入 PATH 即可在任意命令行使用。

2.1 运行前环境准备

  • 系统:Windows 10/11
  • 建议在普通桌面会话运行(避免受限远程会话)
  • 建议目标应用和 guib 使用相同权限级别(都普通权限,或都管理员权限)

2.2 下载打包后的 EXE

  1. 打开项目仓库
  2. 进入 Releases
  3. 下载以下任一文件:
    • guib.exe(推荐,命令短)
    • gui-ai-bridge.exe(完整命名)

如果没有可用发布包,请直接使用第 4 节“源码安装与本地打包”。

2.3 放置 EXE 到固定目录

建议统一放在 C:\Tools\guib

New-Item -ItemType Directory -Path C:\Tools\guib -Force | Out-Null
Copy-Item "$HOME\Downloads\guib.exe" "C:\Tools\guib\guib.exe" -Force

2.4 添加到系统 PATH(持久生效)

PowerShell(当前用户,推荐):

$target = "C:\Tools\guib"
$current = [Environment]::GetEnvironmentVariable("Path", "User")
if ([string]::IsNullOrWhiteSpace($current)) {
    [Environment]::SetEnvironmentVariable("Path", $target, "User")
} elseif ($current -notlike "*$target*") {
    [Environment]::SetEnvironmentVariable("Path", "$current;$target", "User")
}
if ($env:Path -notlike "*$target*") { $env:Path = "$env:Path;$target" }

CMD(当前用户,持久生效):

setx PATH "%PATH%;C:\Tools\guib"

说明:setx 不会刷新当前终端,会在新开的终端中生效。

2.5 验证安装

where guib
guib env check --json
guib get scan

where guib 显示 C:\Tools\guib\guib.exe,说明 PATH 配置成功。

3. 常用命令

3.1 读取界面

guib get scan
guib get screen --compact-tree --a11y-backend auto
guib get screen --json

get screen 输出包含:

  • Snapshot Summary:节点概况
  • Backend Quality:后端质量评分
  • Semantic Digest:页面语义摘要
  • Actionable Targets Top:可操作目标候选

3.2 执行动作

guib do click "创建仓库" --window-target "edge" --a11y-backend auto
guib do type "hello" --window-target "edge"
guib do hotkey "ctrl+l" --window-target "edge"
guib do swipe down --window-target "edge" --distance 500
guib do drag 240 300 240 700 --window-target "edge"

3.3 自然语言指令执行

guib do instruction "点击搜索框 输入 \"Python GUI\" 并回车" --window-target "msedge.exe"
guib do instruction "press ctrl+l" --window-target "msedge.exe"

4. 源码安装与本地打包

当没有发布包,或你需要自行构建时,使用本节。

4.1 所需环境

  • Python 3.10+
  • Windows 10/11

4.2 安装依赖

cd d:\python_code\PythonApplication_guibridge
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements-windows.txt

如遇 externally-managed-environment,请使用 .venv,不要直接改系统 Python。

4.3 运行测试

pytest -q

4.4 打包 EXE

python -m PyInstaller --noconfirm --clean guib.spec

产物路径:

  • dist\guib.exe

打包后请回到第 2.3 和 2.4 节,将 dist\guib.exe 复制到固定目录并加入 PATH。

5. 故障排查

  • 命令不存在(guib 不是内部或外部命令)

    • 运行 where guib
    • 确认 C:\Tools\guib 已加入 PATH
    • 关闭并重开终端
  • E_PERMISSION

    • 当前构建仅支持 Windows
    • 检查权限级别是否不一致(例如目标应用管理员启动,而 guib 普通启动)
  • E_NOT_FOUND

    • 先执行 guib get screen --compact-tree --a11y-backend auto
    • 再按后端顺序重试:cdp -> ia2 -> msaa -> uia -> hwnd
  • 输出信息太少

    • 先切换后端重读
    • 必要时改用 --full-tree

6. 使用建议

  • 复杂页面优先使用“读屏 -> 小步动作 -> 再读屏”的闭环。
  • Electron 或浏览器场景不要固定单后端反复失败。
  • 详细执行规范请参考 guide.md

English Guide

1. Overview

GUI AI Bridge (recommended command name: guib) is a local GUI automation CLI:

  • get: read semantic GUI structure (accessibility tree)
  • do: execute click, type, hotkey, swipe, and drag actions
  • do instruction: convert natural-language instructions into executable action steps

The current version is Windows-focused.

2. Recommended Installation (Prebuilt EXE)

This is the easiest path: no Python required. Download the EXE and add it to PATH.

2.1 Prerequisites

  • OS: Windows 10/11
  • Recommended: run in a normal desktop session (avoid restricted remote sessions)
  • Recommended: run target apps and guib at the same privilege level (both normal, or both admin)

2.2 Download the packaged EXE

  1. Open the repository page
  2. Go to Releases
  3. Download one of the following assets:
    • guib.exe (recommended, shorter command)
    • gui-ai-bridge.exe (full name)

If no release asset is available, use Section 4 (source install and local packaging).

2.3 Place EXE in a fixed directory

Recommended location: C:\Tools\guib

New-Item -ItemType Directory -Path C:\Tools\guib -Force | Out-Null
Copy-Item "$HOME\Downloads\guib.exe" "C:\Tools\guib\guib.exe" -Force

2.4 Add to system PATH (persistent)

PowerShell (current user, recommended):

$target = "C:\Tools\guib"
$current = [Environment]::GetEnvironmentVariable("Path", "User")
if ([string]::IsNullOrWhiteSpace($current)) {
    [Environment]::SetEnvironmentVariable("Path", $target, "User")
} elseif ($current -notlike "*$target*") {
    [Environment]::SetEnvironmentVariable("Path", "$current;$target", "User")
}
if ($env:Path -notlike "*$target*") { $env:Path = "$env:Path;$target" }

CMD (current user, persistent):

setx PATH "%PATH%;C:\Tools\guib"

Note: setx does not refresh the current terminal session. Open a new terminal window.

2.5 Verify installation

where guib
guib env check --json
guib get scan

If where guib shows C:\Tools\guib\guib.exe, PATH is configured correctly.

3. Common Commands

3.1 Read GUI state

guib get scan
guib get screen --compact-tree --a11y-backend auto
guib get screen --json

get screen output includes:

  • Snapshot Summary: node-level summary
  • Backend Quality: backend quality scores
  • Semantic Digest: page semantic digest
  • Actionable Targets Top: high-priority actionable targets

3.2 Execute actions

guib do click "Create repository" --window-target "edge" --a11y-backend auto
guib do type "hello" --window-target "edge"
guib do hotkey "ctrl+l" --window-target "edge"
guib do swipe down --window-target "edge" --distance 500
guib do drag 240 300 240 700 --window-target "edge"

3.3 Natural-language instruction execution

guib do instruction "click search box then type \"Python GUI\" and press enter" --window-target "msedge.exe"
guib do instruction "press ctrl+l" --window-target "msedge.exe"

4. Source Installation and Local Packaging

Use this section when no release binary is available or when you need to build locally.

4.1 Requirements

  • Python 3.10+
  • Windows 10/11

4.2 Install dependencies

cd d:\python_code\PythonApplication_guibridge
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements-windows.txt

If you see externally-managed-environment, install in .venv instead of system Python.

4.3 Run tests

pytest -q

4.4 Package EXE

python -m PyInstaller --noconfirm --clean guib.spec

Output binary:

  • dist\guib.exe

Then return to Sections 2.3 and 2.4 to move the EXE and add PATH.

5. Troubleshooting

  • Command not found (guib is not recognized)

    • Run where guib
    • Confirm C:\Tools\guib is in PATH
    • Restart terminal
  • E_PERMISSION

    • Current build is Windows-only
    • Check privilege mismatch between target app and guib
  • E_NOT_FOUND

    • Run guib get screen --compact-tree --a11y-backend auto first
    • Retry backends in order: cdp -> ia2 -> msaa -> uia -> hwnd
  • Too little semantic output

    • Switch backend and capture again
    • Use --full-tree when needed

6. Practical Tips

  • For complex pages, use the loop: capture -> small action -> capture again.
  • For browser and Electron apps, avoid repeating the same backend endlessly.
  • For a detailed execution playbook, see guide.md.

Notes: Snapshot Summary & Semantic Digest

  • Snapshot Summary: 简短的节点统计(总节点、命名节点、可操作节点、输入类节点)和一个短哈希,用于快速判断当前抓取是否有意义或与上次抓取是否相同。可用于快速去重与回放验证。
  • Semantic Digest: 从可读节点抽取的语义行(例如按钮/标签/行条目),供 AI 模型快速理解页面关键信息;它不是完整树的替代,而是用于提示和优先级判断。

使用建议:

  • 当 Snapshot Summary 显示命名节点很少或 actionable_nodes 很少时,优先切换后端或使用 --full-tree;仅在这些重试无效时再考虑最大化窗口。
  • Semantic Digest 行数能帮助判断页面复杂度:若行数很少,说明页面语义信号薄弱,需要人工或坐标策略配合。

Better Error Hints

  • E_PERMISSION: 建议用户运行 whoami /priv 检查当前用户权限与目标应用权限是否匹配;若权限不一致,建议以相同权限级别(均为管理员或均为普通用户)重新运行 guib
  • E_NOT_FOUND: 建议先运行 guib get screen --full-tree --a11y-backend auto 以收集更多调试信息,并按后端顺序逐一尝试(cdp -> ia2 -> msaa -> uia -> hwnd)。可通过 --json 导出完整树供离线分析。

About

A useful tool for ai to get GUI and do in GUI by CLI

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors