Bundle2 Sprint W1→W5-1: anti-mock closure + path guards + LLM smoke + CLI hardening by Wool-xing · Pull Request #37 · Wool-xing/Test-Agent

Wool-xing · 2026-05-12T20:47:31Z

Summary

14 commits covering bundle2 sprint W1 through W5-1 of the V1.14.0-alpha calibration program. Bundles together the file-by-file fix list locked into project memory on 2026-05-13 (~210 findings, 26 sections, CP1-CP6 checkpoints).

Three thematic axes:

Anti-mock closure (W2/W3) · 3-layer wiring (RunnerResult.degraded source → orchestrator upstream_meta transfer → test_lead/bug_manager/report_generator decision consumers). LLM failures now surface as verdict: conditional with explicit warnings instead of silent green.
Path traversal / safe-by-default (W2-4 / W3-1 / W5-1) · resolve() + relative_to(allowed_root) guard for MCP servers (evidence_vault, test_orchestrator), env-var gate + try/finally cleanup for chaos_helper offensive utils.
First-mile UX (W4) · tagent doctor --llm-smoke single round-trip probe, tagent demo --real-llm with cost confirm + pre-flight smoke, REGISTRY auto-registration for backends + gateway, qwen provider prefix fix.

Commit map

Commit	Topic
`7d72d3c`	bundle1 trust + legal cleanup (LICENSE / NOTICE / SECURITY normalized)
`6b250c1`	W1 install.sh glob + 4 code bugs (mobile_driver / mq_helper / media_validator / push_test) + dynamic registry test
`355fe0a`	W2 anti-mock closure (RunnerResult.degraded, EXPERT_IMPL_STATUS, test_lead 4-state consumer)
`6dbe504`	W2-4 evidence_vault path traversal guard
`925c5e7`	W2-5/W2-6 retroactive: httpx[http2] + conftest sys.path
`9529c3a`	W3-1 MCP path-traversal scan + test_orchestrator guard
`0a9be6d`	W3-2 bug-manager + report-generator consume `_degraded_upstream`
`798e6d5`	W3-3 test_lead.user_prompt LLM path consumes upstream_meta
`9b215b0`	W4-1 router qwen prefix (`dashscope`, not `openai`)
`f84d32d`	W4-4 backends REGISTRY auto-registration
`d34fdd1`	W4-4b gateway REGISTRY (same-pattern fix)
`7c2544d`	W4-2+W4-3 doctor --llm-smoke + demo --real-llm pre-flight
`63d9980`	W4-6 tagent.yml.example vaporware trim + pentest scope contract
`80ccfa0`	W5-1 chaos_helper env-var gate + cleanup + path/host validation

Behavior change highlights

Anti-mock: RunnerResult.ok is no longer hardcoded True; semantics now ok=True/degraded=False (success) vs ok=True/degraded=True (mock fallback) vs ok=False (raised). Consumers (test_lead verdict, bug_manager priority, report-generator section) read _degraded_upstream and surface warnings to the user.

Path traversal: MCP servers accepting file path arguments now require resolve().relative_to(allowed_root); out-of-root paths raise immediately.

Safe-by-default for offensive utils: chaos_helper refuses all destructive ops unless TAGENT_CHAOS_AUTHORIZED=1; shift_clock requires the additional TAGENT_ALLOW_CLOCK_DRIFT=1. iptables / temp-file / process-kill ops now have try/finally cleanup, owner checks, and identifier validation. Same pattern queued for W5-2..W5-5 (api_security_scanner / db_test_helper / ai_adversarial / desktop_driver).

New CLI surfaces:

tagent doctor --llm-smoke · single Hello → 你好 round-trip, reports provider/model/latency/tokens/cost. 5s post-install verification.
tagent demo --real-llm · warns ~$1-3 / 60-120s, requires typer.confirm, runs pre-flight llm_smoke (fail-fast on broken creds), -y / --skip-smoke escapes for CI.

Registry registration: runtime/backends/__init__.py and runtime/gateway/__init__.py now explicitly import all submodules so @register("name") decorators actually fire (50% error rate in this pattern across the project — documented in inspiration library).

Known follow-ups (not in this PR)

W4-5: README hero CTA real-machine verification (git clone + install.sh + tagent demo) — requires user-driven Windows sandbox run.
W5-2..W5-5: remaining 4 HIGH utils (api_security_scanner / db_test_helper / ai_adversarial / desktop_driver) — same env-var gate + validate + cleanup pattern.
W7-fix(W7-1): tagent.yml schema split-brain 统一 #48: tagent.yml schema split-brain (init renderer schema ≠ safety.py schema) — surfaced during W4-6 audit, tracked separately.

Test plan

7 required status checks pass
Existing tests still green (pytest runtime/tests/)
Anti-mock contract: provoke an LLM failure → verify verdict: conditional instead of green
Path traversal: send out-of-root path to evidence_vault / test_orchestrator → verify rejection
tagent doctor --llm-smoke works with stub provider (instant return) + with real provider (real round-trip)
tagent demo --real-llm -y --skip-smoke runs full 16-agent DAG without prompts
chaos_helper refuses without TAGENT_CHAOS_AUTHORIZED=1; accepts with it
tagent.yml.example parses with runtime/config/safety.py; pentest fields documented even though gate is V1.x

Notes for reviewer

Sprint locked in project memory; no findings rolled back. CP1-CP6 checkpoints honored: each anti-mock fix wired through source/transfer/consumer; each path-guard applied as a reusable pattern. Inspiration library cross-references at D:\项目文件\灵感库\ (degraded-anti-mock-3layer.md, path-traversal-guard.md, decorator-registry-init.md, config-schema-split-brain.md, hermes-gbrain-借鉴维度.md, 自动回检机制-V7.md).

- README/README.zh-CN: fix broken `pip install -e .` hero CTA (now uses install.sh); remove misleading "Self-test 100%" badge; rewrite numbers to "16 expert agents + 33 business skills + 3 meta-skills"; add Skill Lifecycle blockquote (A·B·C); drop charter fragment inline refs in product deliverables. - NOTICE: row layout simplified; 3 MIT entries (local LICENSE files added to satisfy redistribution); private essence library entries removed; Python deps attribution completed. - darwin-skill / karpathy-guidelines: local LICENSE files added (best-effort MIT per upstream README declaration). - nuwa-skill: new subdir forked from alchaincyf/nuwa-skill (MIT); personal-promo assets and examples/ excluded; SKILL.md frontmatter renamed; X-link template footer removed; Modified-from-upstream notice added. - darwin-skill: README/README_EN.md removed (50%+ personal promo); SKILL.md sample data replaced with project skill names; project-style language replaces upstream-author-specific phrasing. - SECURITY: add "weaponized code usage boundary" section (5 attack-surface assets + operator authorization requirements + jurisdiction law refs); add "upstream attribution & idea-expression separation" section. - discussions/HANDOFF*.md: git mv to discussions/internal/ + .gitignore + git rm --cached so .gitignore takes effect; future internal docs untracked. - 01-测试主管: routing table extended with vertical experts (pentest-tester / automotive-tester); SECURITY.md authorization gate linked; ROADMAP-linked implementation-status blockquote added (anti-mock contract). - Number alignment across 8 files: install.sh banner V1.0.0 -> V1.14.0-alpha; 02-/03- README; 00-项目导航 nine-section size summary; CONTRIBUTING; runtime/INDEX. - NEW: ROADMAP.md - V1.15-V1.20 alpha rollout for 6 LLM-driven minimum-viable expert implementations + V2.x Skill Lifecycle meta-tool adaptation + anti-mock commitment. Status: 10 active experts (5 LLM-driven + 5 script-backed) + 6 in V1.x rollout. runtime/router anti-mock improvement scheduled for V1.15 sprint Day 0.

install.sh: - replace hardcoded 14/13 agent/skill loops with find/glob → auto-pick all 16 agents + 33 business skills + 3 upstream-derived subdirs - add section 8.5: deploy LICENSE / NOTICE / SECURITY / CONTRIBUTING / CODE_OF_CONDUCT / ROADMAP / README + README.zh-CN / CHANGELOG / VERSION to $PROJECT_ROOT (was missing — users installed but never got the legal + roadmap docs) 05-代码示例/mobile_driver.py: - L88: dead ternary `if not use_cloud else _resolve_hub_url()` removed; use_cloud=True now raises if cloud credentials absent (no silent fallback) - L167 _parse_gfxinfo_fps: was counting PROFILEDATA section markers (0-3 per dumpsys, NOT frames), now parses framestats CSV rows under each PROFILEDATA section; TODO marker for V2.x real FPS from timestamp delta 05-代码示例/mq_helper.py: - KafkaConsumerSimple.poll: was iterating consumer + immediate return, making the timeout `break` unreachable; switch to KafkaConsumer.poll API with timeout_ms so the `timeout` parameter actually takes effect 05-代码示例/media_validator.py: - L66 import: `from utils.visual_helper` → `from visual_helper` to align with the rest of 49 utils flat layout 05-代码示例/push_test.py: - send_apns: replace requests (HTTP/1.1) with httpx.Client(http2=True); APNs server requires HTTP/2 so previous code was non-functional; docstring deps updated to require 'httpx[http2]' runtime/tests/test_registry.py: - baseline "14 experts + 13 skills" assertion removed - now dynamically counts source files under 02-专家定义/ and 03-技能定义/, asserts registry catalog size >= source files (regression-safe and growth-resilient: adding new experts/skills won't break the test) Status: W1 done (6 files, bundle2 first batch). bundle1 + W1 committed locally, not yet pushed. Continue with W2 router/orchestrator anti-mock.

Three-tier anti-mock contract per ROADMAP.md V1.15 Day 0: W2-1 runtime/orchestrator/agents/base.py — degraded signal source - AgentRunner.run() ok=True hardcode removed - 4-state ok/degraded semantics: * stub/mock mode → ok=True, degraded=True (selftest fallback allowed) * real LLM + JSON ok → ok=True, degraded=False (genuine output) * real LLM + JSON parse error → ok=False, degraded=True (LLM responded but malformed) * exec LLM exception fallback → ok=False, degraded=True (no longer silently green) - RunnerResult adds degraded: bool field - logger.warning → logger.error on LLM failure (no longer treated as routine) - LLM error captured into RunnerResult.error for downstream consumption W2-3 runtime/orchestrator/adapters/experts.py + runtime/router/router.py — anti-mock at routing layer - EXPERT_IMPL_STATUS dict added (16 experts active/rollout per ROADMAP V1.15-V1.20) * active: 5 LLM-driven + 5 script-backed = 10 * rollout: env/mobile/visual/system/pentest/automotive = 6 - EXPERT_SCRIPT_MAP extended with pentest-tester + automotive-tester (None) - execute_node short-circuits rollout experts: returncode=2 + clear stderr pointing to ROADMAP.md; replaces the previous no-op fallback that was silently returning ok=True - router._validate_against_catalog flags rollout experts in issues list, triggering existing confidence downgrade pathway W2-2.5 runtime/orchestrator/agents/test_lead.py + base.py + experts.py — anti-mock at decision layer (closure) - RunnerContext adds upstream_meta: dict[str, dict] field carrying ok/degraded/error per upstream expert - experts.py adds _upstream_meta global cache; reset_upstream_cache clears both _upstream_outputs and _upstream_meta; execute_node populates _upstream_meta[name] alongside _upstream_outputs[name] - test_lead.mock_output() now inspects ctx.upstream_meta: * any upstream degraded → verdict forced to "conditional" (never "go") * upstream errors enumerated into known_risks * _degraded_upstream field surfaced for downstream report/bug-manager Outcome: when LLM fails / stub provider engaged / rollout experts touched, the chain (degraded source → upstream_meta link → test-lead consumer) now guarantees the final verdict can no longer be a false "go". Skin-in-the-game contract (CHARTER §10) honored in code, not just docs. Known gaps (W3 backlog): - bug-manager / report-generator do not yet consume _degraded_upstream - test_lead.user_prompt does not surface upstream degraded to real LLM context - _upstream_meta cross-thread race (baseline T.1.8) still present - W2-2 orchestrator no-op fallback no longer triggers post-W2-3 (deprioritize)

runtime/mcp/evidence_vault/server.py: - Add _validate_evidence_path() helper enforcing path must resolve under project_root; rejects /etc/passwd, ~/.ssh/id_rsa, ~/.aws/credentials etc. - tool_upload_evidence_path() now calls the helper; on rejection returns {"error": "path_blocked: ..."} and logs warning (audit trail) - Optional extension via TAGENT_EVIDENCE_EXTRA_DIRS env (comma-separated absolute paths) for CI scenarios that need /tmp/ci-artifacts uploads Closes baseline T.1.11 / SA5 F7. Independent P0 security fix outside the W2 anti-mock chain. Pattern reusable for other MCP servers exposing file-path parameters (W3 backlog: scan defect_tracker, knowledge_base, protocol_adapter, test_orchestrator for similar paths). Known gaps (W3): - project_root is broad; .env / .git / id_rsa under project_root still reachable. Tighten to workspace/ subtree + sensitive-filename blacklist.

W2-5 04-配置文件/requirements.txt: - Uncomment httpx + add [http2] extras - W1-6 changed push_test.py send_apns() to use httpx.Client(http2=True), but httpx was commented out in requirements.txt → deployment would ImportError at import time. Now real installed with HTTP/2 support. W2-6 04-配置文件/conftest.py: - Add _UTILS_CANDIDATES list injecting both deployed ($PROJECT_ROOT/utils/) and source-repo (../05-代码示例/) paths to sys.path - Without this, the 49 utils' flat-style imports (e.g. `from api_retry_util import call_with_retry` in zentao_bug_manager / `from visual_helper import compare_images` in media_validator) all silently fail at runtime; pytest appeared green only because no test imported them - This is the missing wiring that made W1-5's import-path fix meaningful — W1-5 alone touched 1 line; W2-6 makes the entire utils flat-import pattern work in both dev and deployed environments Pattern: retroactive review (V6 自动回检机制) caught two cross-link gaps that single-file fixes had introduced. Documented in 灵感库/通用准则/ 自动回检机制.md as a reusable pattern.

Scanned 5 MCP servers for path-receiving tools (evidence_vault already fixed by W2-4): - defect_tracker/server.py : 0 file ops (DB only) ✓ - knowledge_base/server.py : 0 file ops (DB+pgvector only) ✓ - protocol_adapter/server.py : 0 file ops ✓ - compliance_checker/server.py: already has guard at L36-48 (regex + relative_to) ✓ - test_orchestrator/server.py : ❌ FIX HERE — _build_artifact(target) accepted arbitrary paths and called parse_path(p) which reads .md/.pdf/ .docx bytes from disk. LLM/external MCP client could pass '/etc/passwd' or '~/.ssh/id_rsa' and get the contents back via the artifact text. Fix in test_orchestrator._build_artifact: - After p.exists() check, resolve(p) + relative_to(project_root) - In-tree paths still go to parse_path (legitimate use case: PRD files under workspace/, docs/, etc.) - Out-of-tree paths fall through to parse_text (treats target as a string, not a file) + logger.warning audit trail - Exception in guard logic also degrades to parse_text (fail-closed) Pattern reused from W2-4 evidence_vault. Documented in 灵感库/工程模式/path-traversal-guard.md for cross-project reuse. W3-1 outcome: 6 MCP servers audited, 1 fix landed (test_orchestrator), 1 pre-existing guard verified (compliance_checker), 4 confirmed no-op.

…pstream Complete the anti-mock closure first opened by W2-1/W2-3/W2-2.5: - W2-1 added RunnerResult.degraded - W2-3 EXPERT_IMPL_STATUS surfaced rollout experts - W2-2.5 test_lead consumed upstream_meta degraded - W3-2 (this): the remaining two consumers also honor the signal runtime/orchestrator/agents/bug_manager.py: - mock_output() inspects ctx.upstream_meta; if any upstream degraded, inserts a P0 "测试数据不完整" warning bug at the head of the bug list, labeled `degraded` + `test-coverage-insufficient`, summary.p0 set to 1. Downstream BugTracker (zentao/jira/etc.) now sees this warning bug before any functional bugs, alerting reviewers that the dataset is incomplete. - user_prompt() (real-LLM path) prepends a "上游 degraded 警示" block instructing the LLM to insert the same P0 warning bug. Both stub and real-LLM paths now produce the warning. - Output exposes _degraded_upstream list for downstream consumers. 05-代码示例/generate_report.py: - generate_test_report() reads data["_degraded_upstream"]; if non-empty, inserts a "⚠ 数据完整性警示" section between the executive summary and the bug statistics. Orange-colored warning + bullet list of degraded experts + red-colored "不应基于此报告直接发版" note. - _degraded_upstream comes through test_lead's mock_output (already in W2-2.5) → report renderer never assumed the field; now it does. Anti-mock closure complete across 4 LLM consumers: test_lead.mock_output ✓ (W2-2.5) test_lead.user_prompt pending W3-3 bug_manager.mock_output ✓ (W3-2a) bug_manager.user_prompt ✓ (W3-2a) generate_report.py ✓ (W3-2b) Pattern documented in 灵感库/工程模式/degraded-anti-mock-3layer.md.

W2-2.5 closed the anti-mock loop on test_lead.mock_output (stub provider path). W3-3 closes the remaining gap: the real-LLM path goes through user_prompt(ctx), which previously did not surface upstream degraded state to the model — so a stub-failure cascade could still produce a "go" verdict via the LLM despite mock_output guarding against it. runtime/orchestrator/agents/test_lead.py: - user_prompt() now inspects ctx.upstream_meta for any degraded upstream - When degraded upstream exists, prepends a "⚠ 上游 degraded 警示 (强制约束)" block to the prompt with 5 hard constraints: 1. verdict cannot be "go" 2. verdict must be "conditional" or "no-go" 3. known_risks must enumerate each degraded expert 4. rationale must explain incomplete data 5. fallback_plan must mention V1.x rollout Combined with W2-1 (degraded signal source), W2-3 (router rollout filter), W2-2.5 (test_lead.mock_output consumer), and W3-2 (bug_manager + generate_report consumers), the anti-mock closure is now 100% wired across all 5 LLM consumers in both stub and real-LLM paths. ROADMAP.md V1.15 Day 0 anti-mock commitment: COMPLETE. Pattern documented in 灵感库/工程模式/degraded-anti-mock-3layer.md.

runtime/router/llm_client.py:16: - PROVIDER_MODEL_MAP["qwen"] = "openai/qwen-plus" → "dashscope/qwen-plus" - LiteLLM routes qwen models via dashscope/, not openai/ — wrong prefix caused LiteLLM to attempt OpenAI-compatible endpoint with Qwen model name, resulting in 401/404 from upstream Retroactive review of v1 SA4 F23 finding: - F23 also flagged "claude-sonnet-4-6 doesn't exist" — RE-VERIFIED: claude-sonnet-4-6 IS a real Anthropic model ID (current Claude Sonnet family). v1 baseline misjudgment based on outdated model list. NOT changing claude line. settings.py:32 llm_model default = "claude-sonnet-4-6" — kept (correct value, dual-source coupling with PROVIDER_MODEL_MAP is baseline F37 backlog, lower priority than this 1-line correctness fix). Closes anti-blocker for users configuring DASHSCOPE_API_KEY. Without this, first real-LLM call with provider=qwen fails 401/404 from upstream. Inspired-library: D:\项目文件\灵感库\Test-Agent_决策档案\ 2026-05-13_全局视角10条优化方向.md (W4-1 is first of 10).

@register

runtime/backends/__init__.py: - Add explicit imports of 7 backend modules (local, docker, ssh, singularity, modal, daytona, vercel_sandbox) to trigger their @register("name") decorators at package import time. Before: REGISTRY was permanently empty because @register decorators on the backend subclasses never fired — base.py was the only module imported, and its REGISTRY dict was never populated. After: REGISTRY contains all 7 backends; get_backend("local") / get_backend("docker") / etc. resolve correctly. tagent.yml `backends: [local, docker]` configuration finally takes effect. Closes baseline T.1.1 / v1 SA4 F1. First-mile blocker for users who configure backends in tagent.yml — without this, startup raised KeyError on the first backend lookup. Inspired-library item #6 (Test-Agent_决策档案/ 2026-05-13_全局视角10条优化方向.md).

@register

…as W4-4) runtime/gateway/__init__.py: - Import runtime.gateway.platforms to trigger 8 platform subdirs' @register decorators on package import - Also export get_platform from base (was missing from public API per v1 SA4 F15) Before: from runtime.gateway import get_platform; get_platform("feishu") returned None because REGISTRY was empty — platforms/__init__.py was never imported through the gateway package. After: REGISTRY populated with 8 platforms (dingtalk/discord/email/ feishu/slack/telegram/webhook/wechat); get_platform() resolves; notification delivery (主宪章 §36 6 channels) finally wires up end-to-end from configuration to runtime. Discovered via V7 proactive scan after W4-4 fixed the same pattern in backends/__init__.py. Out of 4 packages using @register pattern: - agents/__init__.py ✓ already correct - gateway/platforms/__init__ ✓ already correct - backends/__init__.py ❌ W4-4 fixed - gateway/__init__.py ❌ W4-4b fixed (this commit) 50% error rate on this idiom → documented as anti-pattern in 灵感库/工程模式/decorator-registry-init.md for cross-project reuse. Closes baseline v1 SA4 F15 + F39.

W4-2: new runtime/healthcheck/llm_smoke.py · single 'Hello → 你好' round-trip, reports provider/model/latency/tokens/cost via litellm.completion_cost. Exposed as `tagent doctor --llm-smoke` flag (alongside existing --agents / --probe). Resolves first-mile pain: user can verify LLM connectivity in 5s post-install instead of running full 16-agent --probe (~$0.5). W4-3: `tagent demo --real-llm` switch · default still stub for 0-config flow. With --real-llm: warns cost (~$1-3 / 60-120s), requires typer.confirm (skip via -y), runs pre-flight llm_smoke probe before DAG execution (fail-fast if LLM unreachable, saves cost on broken creds). --skip-smoke escapes the pre-flight if needed. Step labels also dynamic (real LLM · ~$1-3 vs stub LLM · 0 成本). Closes user-facing gap: stub demo passing ≠ real LLM working.

…scope contract Audit removed config fields with 0 runtime reader: - subagent.* section (pool_size / allow_aux_provider_override · neither read anywhere) - profile.* section (industry / default_depth_level / compliance_profiles · 0 hits) - curator.interval_hours / auto_archive_skills (only `enabled` is gated) - scheduler.max_concurrent_jobs (not read) - backends.allow_arbitrary_exec (not read) - eval.max_captures_per_day (not read) - tutor.* section (CLI flag controls i18n/verbosity; yml not consumed) Kept only fields wired via runtime/config/safety.py gates: scheduler.enabled / cron_jobs_allowed, curator.enabled, eval.capture, backends.allowed, gateway.enabled_platforms, bug_tracker.default / enabled, destructive_ops.*. Added pentest.* section as legal contract placeholder (authorized / scope_in / scope_out / authorization_record / evidence_dir). EXPERT_IMPL_STATUS already blocks pentest expert at rollout; yml gate wires up in V1.x. Top-of-file SECURITY block separates legal contract (pentest) from technical switches. User-first config no longer promises capabilities project lacks. W4-6 audit surfaced split-brain between this example schema and the schema produced by `tagent init` (project/router/skills/...). Tracked as W7 task #48.

…lidation Default refuse for all destructive ops; explicit opt-in via env vars: - TAGENT_CHAOS_AUTHORIZED=1 · base gate for all chaos functions - TAGENT_ALLOW_CLOCK_DRIFT=1 · separate gate for shift_clock (breaks TLS/Kerberos) Per-function hardening: - block_outbound: try/finally cleanup ensures iptables DROP rule deletes even on KeyboardInterrupt/timeout; cleanup-failure logs explicit manual recovery cmd. sudo -n (no-prompt) so missing sudo fails fast instead of hanging. Host arg validated as IPv4 or RFC-compliant hostname (rejects shell metachars and 'foo -j ACCEPT' iptables-arg-injection style payloads). - stress_disk: file_path defaults to tempfile.mkstemp; explicit path must resolve under system tempdir (Path traversal guard). Always unlink in finally. - kill_process: rejects PID < 100; psutil owner check refuses cross-user kill. - kill_by_name: scans psutil with username + PID≥100 filter; only own processes. - kill_pod: pod_name + namespace validated against k8s name regex; subprocess timeout=60s. - shift_clock: double-gate (CHAOS_AUTHORIZED + CLOCK_DRIFT_ALLOWED) + ±86400s range cap. - All functions added input range validation (cores/duration/size bounds). Closes CI-runner residual risk: prior version could leave iptables DROP rules, killed system processes, drifted clocks, or written to /etc/* paths after a single invocation. Utils package must not import runtime.config.safety (utils independence rule), so gate is enforced via env var rather than yml gate. Same gate pattern (env var + validate + try/finally) will apply to remaining W5-2..W5-5 HIGH utils (api_security_scanner / db_test_helper / ai_adversarial / desktop_driver).

W5 sprint 完结后阶段 3 文档兜底: SECURITY.md 加新节, 列全 3 个非武器化但有 gate 的 utils (chaos / db_test / desktop), 与"武器化代码使用边界" 区分。说明授权范围 + 与生产环境隔离责任的关系。 PR 引用: chaos_helper #37 / db_test_helper #41 / desktop_driver #44。范式: env gate + opt-in kwarg + platform gate + 输入白名单 (5 PR 沉淀)。灵感库已私域同步范式总结 (D:\项目文件\灵感库\工程模式\ utils-env-var-gate-pattern-v2.md, 不入仓)。 Co-authored-by: xiaoxing0135 <706015750@qq.com>

…, journey mapping, multi-region monitor - #34: runtime/marketplace/discovery.py — importlib.metadata entry_points for third-party agent/skill/backend registration (group=tagent) - #35: data_synthesizer.py — PII auto-detection (email/phone/id/ip/credit_card) + deterministic masking + random subset extraction - #36: runtime/observability/apm_export.py — Datadog + Grafana dashboard JSON export (pass rate, MTTD/MTTR, expert health, flaky candidates) - #37: runtime/intelligence/journey_mapper.py — failure→business journey impact mapping (Registration/Login/Payment/Profile/...) - #38: .github/workflows/synthetic-monitor.yml — scheduled multi-region smoke test (every 6h, 4 regions) 155 tests pass. 9/9 DAG demo ok. 🎉 38/38 MASTER_PLAN items complete.

* fix: correct setuptools package discovery for editable install `where = ["."]` with `include = ["runtime*"]` couldn't find the runtime package because the runtime directory IS the package root (runtime/__init__.py is directly in .). Changed where to `[".."]` so setuptools scans the parent directory and finds `runtime/` as a package. Before: `pip install -e .` produced empty MAPPING — `import runtime` failed. After: `import runtime` works, `tagent demo` completes all 4 steps. * feat: add --version flag to tagent CLI Users expect `tagent --version` to print version info. Added callback that prints "Test-Agent Runtime v1.32.0" when --version is passed. * fix: auto-generate smoke PRD fixture when missing in demo Previously `tagent demo` step 3 would hard-fail with "fixture missing" if examples/_smoke_prd.md was deleted from disk. Now it auto-generates the fixture from an embedded template, showing a warning instead. This prevents demo breakage when the examples/ directory is accidentally cleaned or the user runs demo outside the repo root. * feat: english-ify tagent CLI help text and user-facing output Converted all CLI command descriptions, option help text, and user-facing console output from Chinese to English for international accessibility. Internal code comments, fixture data, and workspace paths unchanged. * feat: english-ify tagent CLI help text and user-facing output Convert CLI command descriptions, option help text, and user-facing console output from Chinese to English. Updated related tests. Includes: config subcommand help, demo flow output, selftest/doctor messages, init/export descriptions. * chore: bump version 1.32.0 → 1.32.1 + fix CONTRIBUTING.md stale 33→32 - 全项目版本号同步至 1.32.1 (17 files) - CONTRIBUTING.md: 16/33/49 → 16/32/49 (skill 数对齐 pre-commit/CI 实际 -eq 32) - CHANGELOG 新增 v1.32.1 条目 * fix: security hardening — shell injection, hardcoded creds, API auth, silent failures CRITICAL fixes: - backends/local.py: create_subprocess_shell → create_subprocess_exec (CWE-78) - backends/ssh.py: cat {path} → SFTP read; shlex.quote(cwd/env); known_hosts=() - config/settings.py: remove default db_url/password creds; api_host→127.0.0.1; add api_auth_token - api/main.py: bearer auth middleware (gated by TAGENT_API_AUTH_TOKEN); CORS restrict to localhost; file upload max 50MB + extension allowlist Silent failure fixes: - api/main.py: except Exception:continue → catch specific + logger.warning (list_history/dashboard); logger.exception in background thread; threading.Lock on _run_results - api/deps.py: persistence fail → logger.error; status persist DEBUG→WARNING; artifact read fail → [READ_ERROR] marker - api/parsers.py: PDF/DOCX extract fail → [PARSE_ERROR] marker - router/retrieval.py: retrieval fail DEBUG→WARNING - 05-代码示例/api_retry_util.py: bare except pass → logger.debug .gitignore hardening: - Add workspace/测试报告/, workspace/feedback/, workspace/自动化脚本/ - Add runtime/workspace/, runtime/web/tsconfig.tsbuildinfo - Add docs/审查报告/, docs/参考库/, docs/decisions/, archive/ - Remove 4 tracked test report .docx from git * fix: utils security hardening — owner check, XML escape, WS leak, CI pin - chaos_helper.py: kill_process psutil absent now raises RuntimeError instead of skipping owner check - i18n_checker.py: bare except Exception → specific (UnicodeDecodeError, PermissionError, OSError) + logger.warning - miniprogram_runner.py: WebSocket close wrapped in try/finally to prevent connection leak - protocol_helper.py: SOAP body_xml escaped with xml.sax.saxutils.escape() to prevent XML injection - ci.yml: pin ludeeus/action-shellcheck@master → @2.0.0 - install.sh: add security note recommending git clone over curl|bash * chore: fix pre-commit deprecated default_stages commit → pre-commit * chore: bump version 1.32.1 → 1.32.2 全项目版本号同步 + CHANGELOG 新增 v1.32.2 安全加固条目 * refactor: _stub_response dispatch table + fuzzer ALL_PAYLOADS hoist + bump 1.32.3 - router/llm_client.py: 77-line if/elif chain → _STUB_TARGETS table (8 entries) - fuzzer.py: sum(PAYLOAD_LIBRARY.values(), []) hoist to module-level ALL_PAYLOADS * docs: honesty pass — remove marketing numbers, clarify vision skills, drop internal references - README: 8640 combos → ~12 CI-validated; 95% aspirational → removed; 32 skills → 30 active + 2 vision - 00-项目导航: 9x 主宪章 §X → plain descriptions (external contributors don't know charter section numbers) - ROADMAP: 3x 主宪章 references removed * refactor: split overlong functions — generate_report (143→30) + mobile_driver (107→55) - generate_report.py: extract _write_docx_header/_summary/_degraded_warning/_bugs/_performance/_risks helpers - mobile_driver.py: extract _build_monkey_cmd + _analyze_monkey_log helpers * chore: bump version 1.32.3 → 1.32.4 Phase 1+2 收尾: 数字诚实化 + 内部引用清理 + 长函数拆分 * refactor: split CLI/main.py (680→39 lines) into 8 command modules - runtime/cli/_shared.py: kernel, console, helpers, fixtures - runtime/cli/commands/run.py: run + plan - runtime/cli/commands/catalog.py: catalog - runtime/cli/commands/doctor.py: doctor - runtime/cli/commands/selftest.py: selftest - runtime/cli/commands/market.py: search + list + install + uninstall + verify - runtime/cli/commands/demo.py: demo - runtime/cli/commands/init.py: init - runtime/cli/commands/export.py: export Pure mechanical split — no logic changes. 128 tests pass. * test: add 20 core smoke tests — CLI commands, API auth, build_artifact, catalog - test_cli_commands.py (5): all 13 commands registered, --version, catalog, doctor, --help - test_api_auth.py (6): health public, auth middleware blocks/allows, CORS headers - test_build_artifact.py (4): url/file/text input parsing - test_catalog.py (5): expert/skill counts and field validation * chore: bump version 1.32.4 → 1.32.5 CLI split + 20 smoke tests + CHANGELOG * fix: flaky test_execute_node_allows_production_skill — reset catalog/settings cache per test conftest _env_isolation now calls get_catalog(refresh=True) + resets settings cache to prevent cross-test state pollution from modules that create Kernel() at import time. * fix: on_failure=skip now correctly excludes node from failure count - tasks.py: skip nodes set summary.skipped=True, no longer counted as failed - flows.py: track skipped list separately, include in summary.skipped - direct.py: same skip tracking for direct executor path * feat: Phase 3 engine hardening — self-healing, retry, circuit breaker, skip fix, fixture isolation - #9: runtime/self_healing/ (retry.py + locator_store.py) — exponential-backoff retry wrapper for subprocess/LLM errors. scripts.py subprocess.run + direct.py _run_node both use with_retry(). - #10: direct.py executor-level retry — resubmits _run_node up to 2 extra times with 2^attempt backoff on unexpected exceptions. - #11: on_failure=skip nodes now set skipped=True, excluded from failure count. flows.py + direct.py track skipped separately. - #12: 04-配置文件/conftest.py test_data + browser_context session→function scope. test_data uses tmp_path to avoid parallel file collisions. - #13: MAX_FAILURES=3 circuit breaker in flows.py + direct.py. DAG progress logging per node. tasks.py timeout_seconds=3600. 148 tests pass. 9/9 DAG demo ok. * feat: Phase 4 test intelligence — dashboard, readiness score, flaky trends, impact analysis, traceability - #14: runtime/observability/dashboard.py — 3-row layout (decision→diagnostic→action) with MTTD/MTTR, expert heatmap, flaky candidates, env health, action items. api/main.py /dashboard endpoint rewired to new builder. - #15: runtime/orchestrator/release_readiness.py — weighted scoring (smoke×0.4+regression×0.3+perf×0.2+security×0.1→GREEN/YELLOW/RED). CLI: tagent readiness. Does not modify test_lead.py. - #16: flaky_detector.py — detect_trends() (P-F-P/F-P-F patterns), generate_quarantine(), generate_pytest_markers(). - #17: runtime/intelligence/impact_analyzer.py — AST import graph + git diff → impacted test list. Does not modify regression_scope.py. - #18: traceability_matrix.py — bidirectional Req↔TC↔Bug matrix with coverage stats, orphan detection, markdown export. 148 tests pass. 9/9 DAG demo ok. * feat: Phase 6 developer experience — bootstrap, debug mode, actionable errors, tutorial, shell completion - #24: tagent bootstrap — one-command check→configure→verify (Python/Git/pip/LLM) - #25: --debug CLI flag + TAGENT_LOG_LEVEL env + log_level setting - #26: Actionable error messages — "internal error" now includes run_id + log path + --debug hint. modal.py "not connected" → "call connect() first" - #27: docs/tutorial/TUTORIAL.md — 5-step interactive tutorial (10 min) - #28: tagent --install-completion (shell autocomplete) + --no-color flag 148 tests pass. * feat: Phase 5 enterprise readiness — RBAC, audit trail, multi-tenant, config validation, lifecycle hooks - #19: runtime/api/rbac.py — 4-role RBAC (admin/lead/tester/viewer) + require_role() decorator. Disabled by default (TAGENT_RBAC_ENABLED=0). Does not modify auth middleware. - #20: runtime/observability/audit.py — JSONL audit log (log_event / query_events). Thread-safe, append-only. - #21: runtime/api/tenancy.py — contextvars-based tenant propagation. Disabled by default. Does not modify DB schema. - #22: Settings.validate_startup() — checks LLM key, dirs, DB driver. Wired into tagent doctor. - #23: runtime/orchestrator/hooks.py — HookRegistry (before/after/on_error). Integrated into direct.py _run_node(). Hooks never break execution. 148 tests pass. * feat: Phase 7 methodology — branch coverage, static analysis, portability tests, risk matrix, classification tree - #29: pyproject.toml --cov-branch enabled - #30: pyproject.toml pylint + radon config (CC rank=B) - #31: 7 portability tests (ISO 25010: installability/coexistence/replaceability) + @pytest.mark.portability marker - #32: runtime/intelligence/risk_matrix.py — Bayesian calibrated risk matrix with mitigation tracking - #33: classification_tree.py — ISTQB CTM with pairwise generation + constraints 155 tests pass (148 + 7 portability). * feat: Phase 8 platform — plugin discovery, data synthesis, APM export, journey mapping, multi-region monitor - #34: runtime/marketplace/discovery.py — importlib.metadata entry_points for third-party agent/skill/backend registration (group=tagent) - #35: data_synthesizer.py — PII auto-detection (email/phone/id/ip/credit_card) + deterministic masking + random subset extraction - #36: runtime/observability/apm_export.py — Datadog + Grafana dashboard JSON export (pass rate, MTTD/MTTR, expert health, flaky candidates) - #37: runtime/intelligence/journey_mapper.py — failure→business journey impact mapping (Registration/Login/Payment/Profile/...) - #38: .github/workflows/synthetic-monitor.yml — scheduled multi-region smoke test (every 6h, 4 regions) 155 tests pass. 9/9 DAG demo ok. 🎉 38/38 MASTER_PLAN items complete. * fix: CI utils count 49→52 + remove --cov-branch from default pytest addopts - .github/workflows/ci.yml: expected utils count updated 49→52 - runtime/pyproject.toml: removed --cov-branch from addopts (requires pytest-cov which is not installed in CI). Coverage flags should be passed explicitly: pytest --cov --cov-branch * fix: CI pytest — add fastapi/python-multipart/httpx/pytest-cov deps, restore --cov-branch * fix: resolve CodeQL review comments — URL substring sanitization + workflow permissions --------- Co-authored-by: xiaoxing0135 <706015750@qq.com>

xiaoxing0135 added 14 commits May 13, 2026 01:20

Wool-xing merged commit 1be0ab7 into main May 13, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bundle2 Sprint W1→W5-1: anti-mock closure + path guards + LLM smoke + CLI hardening#37

Bundle2 Sprint W1→W5-1: anti-mock closure + path guards + LLM smoke + CLI hardening#37
Wool-xing merged 14 commits into
mainfrom
bundle2-sprint-W1-W5-1

Wool-xing commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wool-xing commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit map

Behavior change highlights

Known follow-ups (not in this PR)

Test plan

Notes for reviewer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Wool-xing commented May 12, 2026 •

edited

Loading