LightClaw

LightClaw is a lightweight agent runtime and post-training data loop for tool-use, self-correction, and GUI grounding.

它不是一个已经训练完成的生产 Agent，而是一个可本地复现的工程闭环：统一动作 schema、工具执行、错误归因、修复轨迹、reward/verifier、数据导出和 eval dashboard。

user task
  -> agent runtime
  -> tool executor
  -> verifier / reward
  -> trajectory pool
  -> SFT / DPO / GRPO export
  -> eval dashboard

主展示链路现在以 Recruiting Safe Dry-run 为中心：

recruiting HTML fixture
  -> extract jobs / apply steps
  -> guard login/captcha/upload/submit
  -> safe trajectory
  -> replay
  -> eval recruiting metrics
  -> SFT / DPO / GRPO export + data card

Implemented

Unified Action Schema：Pydantic 表达 tool_call、ask_user、final_answer、self_correction、gui_click/gui_grounding。
Tool Executor：参数校验、invalid-format/wrong-args 归因、异常捕获、timeout、latency 和 action logging。
Self-correction Loop Data：构造 attempt -> error/verifier feedback -> revision 样本，并检测 over-correction。
Verifier / Reward：输出 task success、tool correctness、argument correctness、recovery、GUI hit、policy/redundancy penalty 和 latency proxy。
Failure Analysis：eval report 按 error_type 聚合失败，并保留可 replay sample case。
Training Export：导出 SFT / DPO / GRPO / self-correction JSONL，并生成 data card。
GUI Grounding Baseline：rule-based selector/bbox/click point baseline，含 point-in-box、bbox IoU、GUI action accuracy。
Recruiting Safe Dry-run：离线招聘 HTML fixture 抽取岗位/申请步骤，遇到登录、验证码、上传、提交动作时安全停止并记录 stop reason。
Frontend Evaluation View：React Evaluation 页面可展示 latest deterministic report 和 self-correction metrics。

Deterministic Demo

deterministic demo 使用固定 fixtures，不依赖真实 LLM API key。它用于证明 schema、eval、export、replay 和 data-quality 检查链路可运行，不代表真实线上指标。

P0/P1 currently verified:

P0 Recruiting safe dry-run records a safe flow instead of applying for jobs.
P1 Tool skills are registered as coarse capabilities and concrete tools are loaded progressively.

Sample output:

Task success rate: 87.50%
Tool execution success rate: 64.29%
Recovery rate: 80.00%
Wrong args rate: 7.14%
GUI grounding accuracy: 100.00%
Correction attempt rate: 75.00%
Recovery success rate: 83.33%
Over-correction rate: 16.67%

Recruiting safe dry-run sample:

{
  "jobs_extracted_count": 2,
  "apply_flow_steps_count": 8,
  "blocked_by_login": true,
  "blocked_by_captcha": true,
  "safe_stop_count": 2,
  "stop_reason_distribution": {
    "login_required": 1,
    "captcha_blocked": 1,
    "safe_stop": 2
  },
  "safe_stop_rate": 1.0
}

Skill progressive loading sample:

{
  "registered_skill_count": 5,
  "initial_loaded_tool_count": 0,
  "selected_skills": [
    "browser_gui_control",
    "information_retrieval",
    "structured_memory_write"
  ],
  "loaded_tool_count_after_selection": 12
}

Training export marks recruiting safety samples explicitly:

{
  "sample_type": "recruiting_safe_stop",
  "safety_domain": "recruiting",
  "stop_reason_distribution": {
    "login_required": 1,
    "safe_stop": 1
  }
}

Examples:

5-minute Reproducibility

cd backend
uv sync

# 1. Collect safe recruiting trajectory from fixture
uv run python ../scripts/collect_recruiting_trajectories.py --mode fixture

# 2. Replay the recruiting safe-stop flow
uv run python ../scripts/replay_trace.py --trace-domain recruiting

# 3. Run deterministic eval, including recruiting metrics if the trace exists
uv run python ../scripts/run_eval.py --mode deterministic

# 4. Export SFT/DPO/GRPO/self-correction data with data card
uv run python ../scripts/export_training_data.py --fixtures --with-data-card --output-dir data/training_exports/latest

# 5. Replay one failure/correction case
uv run python ../scripts/replay_trace.py --fixture-case wrong_args

# 6. Dry-run exported training data
uv run python ../scripts/train_stub.py --input-dir data/training_exports/latest --dry-run

# 7. Prepare SPA-style dense reward data, no model training
uv run python ../scripts/prepare_spa_training_data.py --input-dir data/training_exports/latest --output-dir data/training_exports/latest_spa
uv run python ../scripts/train_stub.py --input-dir data/training_exports/latest --spa-dir data/training_exports/latest_spa --dry-run

# 8. Run tests
uv run --with pytest --with pytest-asyncio pytest

One-command reproducibility:

cd backend
uv run python ../scripts/run_all_checks.py

One-command interview showcase:

cd backend
uv run python ../scripts/run_showcase.py

Output:

backend/data/showcase/latest/showcase.json
backend/data/showcase/latest/showcase.md

The frontend 演示 page reads /api/eval/showcase/latest and renders the same P0/P1/P2 chain in one place.

Training-preparation pipeline:

cd backend
uv run python ../scripts/run_training_pipeline.py

This runs trajectory collection, trajectory distillation, SFT/DPO/GRPO export, SPA-style dense reward preparation, and deterministic evaluation. It does not train model weights.

Frontend:

cd frontend
npm install
npm run dev

What This Project Is NOT

Not a fully trained production agent.
Not an OSWorld-level or Android-level GUI agent.
Not claiming real online metrics from deployed users.
Not launching LoRA/SFT/DPO/GRPO training inside this repo.
Not requiring a real OpenAI-compatible key for core tests and deterministic demos.

Roadmap

Add more real trajectories and keep deterministic fixtures separate from live reports.
Use browser-plugin SoM screenshots as GUI grounding samples.
Connect exported JSONL to external TRL / LLaMA-Factory / verl training configs.
Evaluate trained models on real task suites and report those results separately.

Project Layout

backend/app/runtime/        Agent loop, executor, observer, recovery
backend/app/schemas/        Pydantic schemas, including AgentAction
backend/app/eval/           deterministic eval, reward, reports
backend/app/gui_grounding/  GUI grounding baseline and metrics
backend/app/training/       SFT/DPO/GRPO export, replay, self-correction samples
frontend/src/pages/         React dashboard pages
scripts/                    reproducibility, export, replay, dry-run training
docs/                       architecture, evaluation, export, interview guide
examples/                   small checked-in output examples

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
browser-extension		browser-extension
docs		docs
examples		examples
frontend		frontend
scripts		scripts
skills		skills
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
项目简历描述.md		项目简历描述.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LightClaw

Implemented

Deterministic Demo

5-minute Reproducibility

What This Project Is NOT

Roadmap

Project Layout

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LightClaw

Implemented

Deterministic Demo

5-minute Reproducibility

What This Project Is NOT

Roadmap

Project Layout

Docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages