Bundle2 Sprint W1→W5-1: anti-mock closure + path guards + LLM smoke + CLI hardening#37
Merged
Conversation
added 14 commits
May 13, 2026 01:20
- README/README.zh-CN: fix broken `pip install -e .` hero CTA (now uses install.sh); remove misleading "Self-test 100%" badge; rewrite numbers to "16 expert agents + 33 business skills + 3 meta-skills"; add Skill Lifecycle blockquote (A·B·C); drop charter fragment inline refs in product deliverables. - NOTICE: row layout simplified; 3 MIT entries (local LICENSE files added to satisfy redistribution); private essence library entries removed; Python deps attribution completed. - darwin-skill / karpathy-guidelines: local LICENSE files added (best-effort MIT per upstream README declaration). - nuwa-skill: new subdir forked from alchaincyf/nuwa-skill (MIT); personal-promo assets and examples/ excluded; SKILL.md frontmatter renamed; X-link template footer removed; Modified-from-upstream notice added. - darwin-skill: README/README_EN.md removed (50%+ personal promo); SKILL.md sample data replaced with project skill names; project-style language replaces upstream-author-specific phrasing. - SECURITY: add "weaponized code usage boundary" section (5 attack-surface assets + operator authorization requirements + jurisdiction law refs); add "upstream attribution & idea-expression separation" section. - discussions/HANDOFF*.md: git mv to discussions/internal/ + .gitignore + git rm --cached so .gitignore takes effect; future internal docs untracked. - 01-测试主管: routing table extended with vertical experts (pentest-tester / automotive-tester); SECURITY.md authorization gate linked; ROADMAP-linked implementation-status blockquote added (anti-mock contract). - Number alignment across 8 files: install.sh banner V1.0.0 -> V1.14.0-alpha; 02-/03- README; 00-项目导航 nine-section size summary; CONTRIBUTING; runtime/INDEX. - NEW: ROADMAP.md - V1.15-V1.20 alpha rollout for 6 LLM-driven minimum-viable expert implementations + V2.x Skill Lifecycle meta-tool adaptation + anti-mock commitment. Status: 10 active experts (5 LLM-driven + 5 script-backed) + 6 in V1.x rollout. runtime/router anti-mock improvement scheduled for V1.15 sprint Day 0.
install.sh: - replace hardcoded 14/13 agent/skill loops with find/glob → auto-pick all 16 agents + 33 business skills + 3 upstream-derived subdirs - add section 8.5: deploy LICENSE / NOTICE / SECURITY / CONTRIBUTING / CODE_OF_CONDUCT / ROADMAP / README + README.zh-CN / CHANGELOG / VERSION to $PROJECT_ROOT (was missing — users installed but never got the legal + roadmap docs) 05-代码示例/mobile_driver.py: - L88: dead ternary `if not use_cloud else _resolve_hub_url()` removed; use_cloud=True now raises if cloud credentials absent (no silent fallback) - L167 _parse_gfxinfo_fps: was counting PROFILEDATA section markers (0-3 per dumpsys, NOT frames), now parses framestats CSV rows under each PROFILEDATA section; TODO marker for V2.x real FPS from timestamp delta 05-代码示例/mq_helper.py: - KafkaConsumerSimple.poll: was iterating consumer + immediate return, making the timeout `break` unreachable; switch to KafkaConsumer.poll API with timeout_ms so the `timeout` parameter actually takes effect 05-代码示例/media_validator.py: - L66 import: `from utils.visual_helper` → `from visual_helper` to align with the rest of 49 utils flat layout 05-代码示例/push_test.py: - send_apns: replace requests (HTTP/1.1) with httpx.Client(http2=True); APNs server requires HTTP/2 so previous code was non-functional; docstring deps updated to require 'httpx[http2]' runtime/tests/test_registry.py: - baseline "14 experts + 13 skills" assertion removed - now dynamically counts source files under 02-专家定义/ and 03-技能定义/, asserts registry catalog size >= source files (regression-safe and growth-resilient: adding new experts/skills won't break the test) Status: W1 done (6 files, bundle2 first batch). bundle1 + W1 committed locally, not yet pushed. Continue with W2 router/orchestrator anti-mock.
Three-tier anti-mock contract per ROADMAP.md V1.15 Day 0: W2-1 runtime/orchestrator/agents/base.py — degraded signal source - AgentRunner.run() ok=True hardcode removed - 4-state ok/degraded semantics: * stub/mock mode → ok=True, degraded=True (selftest fallback allowed) * real LLM + JSON ok → ok=True, degraded=False (genuine output) * real LLM + JSON parse error → ok=False, degraded=True (LLM responded but malformed) * exec LLM exception fallback → ok=False, degraded=True (no longer silently green) - RunnerResult adds degraded: bool field - logger.warning → logger.error on LLM failure (no longer treated as routine) - LLM error captured into RunnerResult.error for downstream consumption W2-3 runtime/orchestrator/adapters/experts.py + runtime/router/router.py — anti-mock at routing layer - EXPERT_IMPL_STATUS dict added (16 experts active/rollout per ROADMAP V1.15-V1.20) * active: 5 LLM-driven + 5 script-backed = 10 * rollout: env/mobile/visual/system/pentest/automotive = 6 - EXPERT_SCRIPT_MAP extended with pentest-tester + automotive-tester (None) - execute_node short-circuits rollout experts: returncode=2 + clear stderr pointing to ROADMAP.md; replaces the previous no-op fallback that was silently returning ok=True - router._validate_against_catalog flags rollout experts in issues list, triggering existing confidence downgrade pathway W2-2.5 runtime/orchestrator/agents/test_lead.py + base.py + experts.py — anti-mock at decision layer (closure) - RunnerContext adds upstream_meta: dict[str, dict] field carrying ok/degraded/error per upstream expert - experts.py adds _upstream_meta global cache; reset_upstream_cache clears both _upstream_outputs and _upstream_meta; execute_node populates _upstream_meta[name] alongside _upstream_outputs[name] - test_lead.mock_output() now inspects ctx.upstream_meta: * any upstream degraded → verdict forced to "conditional" (never "go") * upstream errors enumerated into known_risks * _degraded_upstream field surfaced for downstream report/bug-manager Outcome: when LLM fails / stub provider engaged / rollout experts touched, the chain (degraded source → upstream_meta link → test-lead consumer) now guarantees the final verdict can no longer be a false "go". Skin-in-the-game contract (CHARTER §10) honored in code, not just docs. Known gaps (W3 backlog): - bug-manager / report-generator do not yet consume _degraded_upstream - test_lead.user_prompt does not surface upstream degraded to real LLM context - _upstream_meta cross-thread race (baseline T.1.8) still present - W2-2 orchestrator no-op fallback no longer triggers post-W2-3 (deprioritize)
runtime/mcp/evidence_vault/server.py:
- Add _validate_evidence_path() helper enforcing path must resolve under
project_root; rejects /etc/passwd, ~/.ssh/id_rsa, ~/.aws/credentials etc.
- tool_upload_evidence_path() now calls the helper; on rejection returns
{"error": "path_blocked: ..."} and logs warning (audit trail)
- Optional extension via TAGENT_EVIDENCE_EXTRA_DIRS env (comma-separated
absolute paths) for CI scenarios that need /tmp/ci-artifacts uploads
Closes baseline T.1.11 / SA5 F7. Independent P0 security fix outside the
W2 anti-mock chain. Pattern reusable for other MCP servers exposing
file-path parameters (W3 backlog: scan defect_tracker, knowledge_base,
protocol_adapter, test_orchestrator for similar paths).
Known gaps (W3):
- project_root is broad; .env / .git / id_rsa under project_root still
reachable. Tighten to workspace/ subtree + sensitive-filename blacklist.
W2-5 04-配置文件/requirements.txt: - Uncomment httpx + add [http2] extras - W1-6 changed push_test.py send_apns() to use httpx.Client(http2=True), but httpx was commented out in requirements.txt → deployment would ImportError at import time. Now real installed with HTTP/2 support. W2-6 04-配置文件/conftest.py: - Add _UTILS_CANDIDATES list injecting both deployed ($PROJECT_ROOT/utils/) and source-repo (../05-代码示例/) paths to sys.path - Without this, the 49 utils' flat-style imports (e.g. `from api_retry_util import call_with_retry` in zentao_bug_manager / `from visual_helper import compare_images` in media_validator) all silently fail at runtime; pytest appeared green only because no test imported them - This is the missing wiring that made W1-5's import-path fix meaningful — W1-5 alone touched 1 line; W2-6 makes the entire utils flat-import pattern work in both dev and deployed environments Pattern: retroactive review (V6 自动回检机制) caught two cross-link gaps that single-file fixes had introduced. Documented in 灵感库/通用准则/ 自动回检机制.md as a reusable pattern.
Scanned 5 MCP servers for path-receiving tools (evidence_vault already fixed by W2-4): - defect_tracker/server.py : 0 file ops (DB only) ✓ - knowledge_base/server.py : 0 file ops (DB+pgvector only) ✓ - protocol_adapter/server.py : 0 file ops ✓ - compliance_checker/server.py: already has guard at L36-48 (regex + relative_to) ✓ - test_orchestrator/server.py : ❌ FIX HERE — _build_artifact(target) accepted arbitrary paths and called parse_path(p) which reads .md/.pdf/ .docx bytes from disk. LLM/external MCP client could pass '/etc/passwd' or '~/.ssh/id_rsa' and get the contents back via the artifact text. Fix in test_orchestrator._build_artifact: - After p.exists() check, resolve(p) + relative_to(project_root) - In-tree paths still go to parse_path (legitimate use case: PRD files under workspace/, docs/, etc.) - Out-of-tree paths fall through to parse_text (treats target as a string, not a file) + logger.warning audit trail - Exception in guard logic also degrades to parse_text (fail-closed) Pattern reused from W2-4 evidence_vault. Documented in 灵感库/工程模式/path-traversal-guard.md for cross-project reuse. W3-1 outcome: 6 MCP servers audited, 1 fix landed (test_orchestrator), 1 pre-existing guard verified (compliance_checker), 4 confirmed no-op.
…pstream Complete the anti-mock closure first opened by W2-1/W2-3/W2-2.5: - W2-1 added RunnerResult.degraded - W2-3 EXPERT_IMPL_STATUS surfaced rollout experts - W2-2.5 test_lead consumed upstream_meta degraded - W3-2 (this): the remaining two consumers also honor the signal runtime/orchestrator/agents/bug_manager.py: - mock_output() inspects ctx.upstream_meta; if any upstream degraded, inserts a P0 "测试数据不完整" warning bug at the head of the bug list, labeled `degraded` + `test-coverage-insufficient`, summary.p0 set to 1. Downstream BugTracker (zentao/jira/etc.) now sees this warning bug before any functional bugs, alerting reviewers that the dataset is incomplete. - user_prompt() (real-LLM path) prepends a "上游 degraded 警示" block instructing the LLM to insert the same P0 warning bug. Both stub and real-LLM paths now produce the warning. - Output exposes _degraded_upstream list for downstream consumers. 05-代码示例/generate_report.py: - generate_test_report() reads data["_degraded_upstream"]; if non-empty, inserts a "⚠ 数据完整性警示" section between the executive summary and the bug statistics. Orange-colored warning + bullet list of degraded experts + red-colored "不应基于此报告直接发版" note. - _degraded_upstream comes through test_lead's mock_output (already in W2-2.5) → report renderer never assumed the field; now it does. Anti-mock closure complete across 4 LLM consumers: test_lead.mock_output ✓ (W2-2.5) test_lead.user_prompt pending W3-3 bug_manager.mock_output ✓ (W3-2a) bug_manager.user_prompt ✓ (W3-2a) generate_report.py ✓ (W3-2b) Pattern documented in 灵感库/工程模式/degraded-anti-mock-3layer.md.
W2-2.5 closed the anti-mock loop on test_lead.mock_output (stub provider path). W3-3 closes the remaining gap: the real-LLM path goes through user_prompt(ctx), which previously did not surface upstream degraded state to the model — so a stub-failure cascade could still produce a "go" verdict via the LLM despite mock_output guarding against it. runtime/orchestrator/agents/test_lead.py: - user_prompt() now inspects ctx.upstream_meta for any degraded upstream - When degraded upstream exists, prepends a "⚠ 上游 degraded 警示 (强制约束)" block to the prompt with 5 hard constraints: 1. verdict cannot be "go" 2. verdict must be "conditional" or "no-go" 3. known_risks must enumerate each degraded expert 4. rationale must explain incomplete data 5. fallback_plan must mention V1.x rollout Combined with W2-1 (degraded signal source), W2-3 (router rollout filter), W2-2.5 (test_lead.mock_output consumer), and W3-2 (bug_manager + generate_report consumers), the anti-mock closure is now 100% wired across all 5 LLM consumers in both stub and real-LLM paths. ROADMAP.md V1.15 Day 0 anti-mock commitment: COMPLETE. Pattern documented in 灵感库/工程模式/degraded-anti-mock-3layer.md.
runtime/router/llm_client.py:16: - PROVIDER_MODEL_MAP["qwen"] = "openai/qwen-plus" → "dashscope/qwen-plus" - LiteLLM routes qwen models via dashscope/, not openai/ — wrong prefix caused LiteLLM to attempt OpenAI-compatible endpoint with Qwen model name, resulting in 401/404 from upstream Retroactive review of v1 SA4 F23 finding: - F23 also flagged "claude-sonnet-4-6 doesn't exist" — RE-VERIFIED: claude-sonnet-4-6 IS a real Anthropic model ID (current Claude Sonnet family). v1 baseline misjudgment based on outdated model list. NOT changing claude line. settings.py:32 llm_model default = "claude-sonnet-4-6" — kept (correct value, dual-source coupling with PROVIDER_MODEL_MAP is baseline F37 backlog, lower priority than this 1-line correctness fix). Closes anti-blocker for users configuring DASHSCOPE_API_KEY. Without this, first real-LLM call with provider=qwen fails 401/404 from upstream. Inspired-library: D:\项目文件\灵感库\Test-Agent_决策档案\ 2026-05-13_全局视角10条优化方向.md (W4-1 is first of 10).
runtime/backends/__init__.py: - Add explicit imports of 7 backend modules (local, docker, ssh, singularity, modal, daytona, vercel_sandbox) to trigger their @register("name") decorators at package import time. Before: REGISTRY was permanently empty because @register decorators on the backend subclasses never fired — base.py was the only module imported, and its REGISTRY dict was never populated. After: REGISTRY contains all 7 backends; get_backend("local") / get_backend("docker") / etc. resolve correctly. tagent.yml `backends: [local, docker]` configuration finally takes effect. Closes baseline T.1.1 / v1 SA4 F1. First-mile blocker for users who configure backends in tagent.yml — without this, startup raised KeyError on the first backend lookup. Inspired-library item #6 (Test-Agent_决策档案/ 2026-05-13_全局视角10条优化方向.md).
…as W4-4) runtime/gateway/__init__.py: - Import runtime.gateway.platforms to trigger 8 platform subdirs' @register decorators on package import - Also export get_platform from base (was missing from public API per v1 SA4 F15) Before: from runtime.gateway import get_platform; get_platform("feishu") returned None because REGISTRY was empty — platforms/__init__.py was never imported through the gateway package. After: REGISTRY populated with 8 platforms (dingtalk/discord/email/ feishu/slack/telegram/webhook/wechat); get_platform() resolves; notification delivery (主宪章 §36 6 channels) finally wires up end-to-end from configuration to runtime. Discovered via V7 proactive scan after W4-4 fixed the same pattern in backends/__init__.py. Out of 4 packages using @register pattern: - agents/__init__.py ✓ already correct - gateway/platforms/__init__ ✓ already correct - backends/__init__.py ❌ W4-4 fixed - gateway/__init__.py ❌ W4-4b fixed (this commit) 50% error rate on this idiom → documented as anti-pattern in 灵感库/工程模式/decorator-registry-init.md for cross-project reuse. Closes baseline v1 SA4 F15 + F39.
W4-2: new runtime/healthcheck/llm_smoke.py · single 'Hello → 你好' round-trip, reports provider/model/latency/tokens/cost via litellm.completion_cost. Exposed as `tagent doctor --llm-smoke` flag (alongside existing --agents / --probe). Resolves first-mile pain: user can verify LLM connectivity in 5s post-install instead of running full 16-agent --probe (~$0.5). W4-3: `tagent demo --real-llm` switch · default still stub for 0-config flow. With --real-llm: warns cost (~$1-3 / 60-120s), requires typer.confirm (skip via -y), runs pre-flight llm_smoke probe before DAG execution (fail-fast if LLM unreachable, saves cost on broken creds). --skip-smoke escapes the pre-flight if needed. Step labels also dynamic (real LLM · ~$1-3 vs stub LLM · 0 成本). Closes user-facing gap: stub demo passing ≠ real LLM working.
…scope contract Audit removed config fields with 0 runtime reader: - subagent.* section (pool_size / allow_aux_provider_override · neither read anywhere) - profile.* section (industry / default_depth_level / compliance_profiles · 0 hits) - curator.interval_hours / auto_archive_skills (only `enabled` is gated) - scheduler.max_concurrent_jobs (not read) - backends.allow_arbitrary_exec (not read) - eval.max_captures_per_day (not read) - tutor.* section (CLI flag controls i18n/verbosity; yml not consumed) Kept only fields wired via runtime/config/safety.py gates: scheduler.enabled / cron_jobs_allowed, curator.enabled, eval.capture, backends.allowed, gateway.enabled_platforms, bug_tracker.default / enabled, destructive_ops.*. Added pentest.* section as legal contract placeholder (authorized / scope_in / scope_out / authorization_record / evidence_dir). EXPERT_IMPL_STATUS already blocks pentest expert at rollout; yml gate wires up in V1.x. Top-of-file SECURITY block separates legal contract (pentest) from technical switches. User-first config no longer promises capabilities project lacks. W4-6 audit surfaced split-brain between this example schema and the schema produced by `tagent init` (project/router/skills/...). Tracked as W7 task #48.
…lidation Default refuse for all destructive ops; explicit opt-in via env vars: - TAGENT_CHAOS_AUTHORIZED=1 · base gate for all chaos functions - TAGENT_ALLOW_CLOCK_DRIFT=1 · separate gate for shift_clock (breaks TLS/Kerberos) Per-function hardening: - block_outbound: try/finally cleanup ensures iptables DROP rule deletes even on KeyboardInterrupt/timeout; cleanup-failure logs explicit manual recovery cmd. sudo -n (no-prompt) so missing sudo fails fast instead of hanging. Host arg validated as IPv4 or RFC-compliant hostname (rejects shell metachars and 'foo -j ACCEPT' iptables-arg-injection style payloads). - stress_disk: file_path defaults to tempfile.mkstemp; explicit path must resolve under system tempdir (Path traversal guard). Always unlink in finally. - kill_process: rejects PID < 100; psutil owner check refuses cross-user kill. - kill_by_name: scans psutil with username + PID≥100 filter; only own processes. - kill_pod: pod_name + namespace validated against k8s name regex; subprocess timeout=60s. - shift_clock: double-gate (CHAOS_AUTHORIZED + CLOCK_DRIFT_ALLOWED) + ±86400s range cap. - All functions added input range validation (cores/duration/size bounds). Closes CI-runner residual risk: prior version could leave iptables DROP rules, killed system processes, drifted clocks, or written to /etc/* paths after a single invocation. Utils package must not import runtime.config.safety (utils independence rule), so gate is enforced via env var rather than yml gate. Same gate pattern (env var + validate + try/finally) will apply to remaining W5-2..W5-5 HIGH utils (api_security_scanner / db_test_helper / ai_adversarial / desktop_driver).
This was referenced May 13, 2026
Wool-xing
added a commit
that referenced
this pull request
May 13, 2026
W5 sprint 完结后阶段 3 文档兜底: SECURITY.md 加新节, 列全 3 个非武器化 但有 gate 的 utils (chaos / db_test / desktop), 与"武器化代码使用边界" 区分。说明授权范围 + 与生产环境隔离责任的关系。 PR 引用: chaos_helper #37 / db_test_helper #41 / desktop_driver #44。 范式: env gate + opt-in kwarg + platform gate + 输入白名单 (5 PR 沉淀)。 灵感库已私域同步范式总结 (D:\项目文件\灵感库\工程模式\ utils-env-var-gate-pattern-v2.md, 不入仓)。 Co-authored-by: xiaoxing0135 <706015750@qq.com>
Wool-xing
pushed a commit
that referenced
this pull request
May 17, 2026
…, journey mapping, multi-region monitor - #34: runtime/marketplace/discovery.py — importlib.metadata entry_points for third-party agent/skill/backend registration (group=tagent) - #35: data_synthesizer.py — PII auto-detection (email/phone/id/ip/credit_card) + deterministic masking + random subset extraction - #36: runtime/observability/apm_export.py — Datadog + Grafana dashboard JSON export (pass rate, MTTD/MTTR, expert health, flaky candidates) - #37: runtime/intelligence/journey_mapper.py — failure→business journey impact mapping (Registration/Login/Payment/Profile/...) - #38: .github/workflows/synthetic-monitor.yml — scheduled multi-region smoke test (every 6h, 4 regions) 155 tests pass. 9/9 DAG demo ok. 🎉 38/38 MASTER_PLAN items complete.
Wool-xing
added a commit
that referenced
this pull request
May 17, 2026
* fix: correct setuptools package discovery for editable install
`where = ["."]` with `include = ["runtime*"]` couldn't find the runtime
package because the runtime directory IS the package root (runtime/__init__.py
is directly in .). Changed where to `[".."]` so setuptools scans the parent
directory and finds `runtime/` as a package.
Before: `pip install -e .` produced empty MAPPING — `import runtime` failed.
After: `import runtime` works, `tagent demo` completes all 4 steps.
* feat: add --version flag to tagent CLI
Users expect `tagent --version` to print version info. Added callback
that prints "Test-Agent Runtime v1.32.0" when --version is passed.
* fix: auto-generate smoke PRD fixture when missing in demo
Previously `tagent demo` step 3 would hard-fail with "fixture missing"
if examples/_smoke_prd.md was deleted from disk. Now it auto-generates
the fixture from an embedded template, showing a warning instead.
This prevents demo breakage when the examples/ directory is accidentally
cleaned or the user runs demo outside the repo root.
* feat: english-ify tagent CLI help text and user-facing output
Converted all CLI command descriptions, option help text, and user-facing
console output from Chinese to English for international accessibility.
Internal code comments, fixture data, and workspace paths unchanged.
* feat: english-ify tagent CLI help text and user-facing output
Convert CLI command descriptions, option help text, and user-facing
console output from Chinese to English. Updated related tests.
Includes: config subcommand help, demo flow output, selftest/doctor
messages, init/export descriptions.
* chore: bump version 1.32.0 → 1.32.1 + fix CONTRIBUTING.md stale 33→32
- 全项目版本号同步至 1.32.1 (17 files)
- CONTRIBUTING.md: 16/33/49 → 16/32/49 (skill 数对齐 pre-commit/CI 实际 -eq 32)
- CHANGELOG 新增 v1.32.1 条目
* fix: security hardening — shell injection, hardcoded creds, API auth, silent failures
CRITICAL fixes:
- backends/local.py: create_subprocess_shell → create_subprocess_exec (CWE-78)
- backends/ssh.py: cat {path} → SFTP read; shlex.quote(cwd/env); known_hosts=()
- config/settings.py: remove default db_url/password creds; api_host→127.0.0.1; add api_auth_token
- api/main.py: bearer auth middleware (gated by TAGENT_API_AUTH_TOKEN); CORS restrict to localhost; file upload max 50MB + extension allowlist
Silent failure fixes:
- api/main.py: except Exception:continue → catch specific + logger.warning (list_history/dashboard); logger.exception in background thread; threading.Lock on _run_results
- api/deps.py: persistence fail → logger.error; status persist DEBUG→WARNING; artifact read fail → [READ_ERROR] marker
- api/parsers.py: PDF/DOCX extract fail → [PARSE_ERROR] marker
- router/retrieval.py: retrieval fail DEBUG→WARNING
- 05-代码示例/api_retry_util.py: bare except pass → logger.debug
.gitignore hardening:
- Add workspace/测试报告/, workspace/feedback/, workspace/自动化脚本/
- Add runtime/workspace/, runtime/web/tsconfig.tsbuildinfo
- Add docs/审查报告/, docs/参考库/, docs/decisions/, archive/
- Remove 4 tracked test report .docx from git
* fix: utils security hardening — owner check, XML escape, WS leak, CI pin
- chaos_helper.py: kill_process psutil absent now raises RuntimeError instead of skipping owner check
- i18n_checker.py: bare except Exception → specific (UnicodeDecodeError, PermissionError, OSError) + logger.warning
- miniprogram_runner.py: WebSocket close wrapped in try/finally to prevent connection leak
- protocol_helper.py: SOAP body_xml escaped with xml.sax.saxutils.escape() to prevent XML injection
- ci.yml: pin ludeeus/action-shellcheck@master → @2.0.0
- install.sh: add security note recommending git clone over curl|bash
* chore: fix pre-commit deprecated default_stages commit → pre-commit
* chore: bump version 1.32.1 → 1.32.2
全项目版本号同步 + CHANGELOG 新增 v1.32.2 安全加固条目
* refactor: _stub_response dispatch table + fuzzer ALL_PAYLOADS hoist + bump 1.32.3
- router/llm_client.py: 77-line if/elif chain → _STUB_TARGETS table (8 entries)
- fuzzer.py: sum(PAYLOAD_LIBRARY.values(), []) hoist to module-level ALL_PAYLOADS
* docs: honesty pass — remove marketing numbers, clarify vision skills, drop internal references
- README: 8640 combos → ~12 CI-validated; 95% aspirational → removed; 32 skills → 30 active + 2 vision
- 00-项目导航: 9x 主宪章 §X → plain descriptions (external contributors don't know charter section numbers)
- ROADMAP: 3x 主宪章 references removed
* refactor: split overlong functions — generate_report (143→30) + mobile_driver (107→55)
- generate_report.py: extract _write_docx_header/_summary/_degraded_warning/_bugs/_performance/_risks helpers
- mobile_driver.py: extract _build_monkey_cmd + _analyze_monkey_log helpers
* chore: bump version 1.32.3 → 1.32.4
Phase 1+2 收尾: 数字诚实化 + 内部引用清理 + 长函数拆分
* refactor: split CLI/main.py (680→39 lines) into 8 command modules
- runtime/cli/_shared.py: kernel, console, helpers, fixtures
- runtime/cli/commands/run.py: run + plan
- runtime/cli/commands/catalog.py: catalog
- runtime/cli/commands/doctor.py: doctor
- runtime/cli/commands/selftest.py: selftest
- runtime/cli/commands/market.py: search + list + install + uninstall + verify
- runtime/cli/commands/demo.py: demo
- runtime/cli/commands/init.py: init
- runtime/cli/commands/export.py: export
Pure mechanical split — no logic changes. 128 tests pass.
* test: add 20 core smoke tests — CLI commands, API auth, build_artifact, catalog
- test_cli_commands.py (5): all 13 commands registered, --version, catalog, doctor, --help
- test_api_auth.py (6): health public, auth middleware blocks/allows, CORS headers
- test_build_artifact.py (4): url/file/text input parsing
- test_catalog.py (5): expert/skill counts and field validation
* chore: bump version 1.32.4 → 1.32.5
CLI split + 20 smoke tests + CHANGELOG
* fix: flaky test_execute_node_allows_production_skill — reset catalog/settings cache per test
conftest _env_isolation now calls get_catalog(refresh=True) + resets settings cache
to prevent cross-test state pollution from modules that create Kernel() at import time.
* fix: on_failure=skip now correctly excludes node from failure count
- tasks.py: skip nodes set summary.skipped=True, no longer counted as failed
- flows.py: track skipped list separately, include in summary.skipped
- direct.py: same skip tracking for direct executor path
* feat: Phase 3 engine hardening — self-healing, retry, circuit breaker, skip fix, fixture isolation
- #9: runtime/self_healing/ (retry.py + locator_store.py) — exponential-backoff
retry wrapper for subprocess/LLM errors. scripts.py subprocess.run + direct.py
_run_node both use with_retry().
- #10: direct.py executor-level retry — resubmits _run_node up to 2 extra times
with 2^attempt backoff on unexpected exceptions.
- #11: on_failure=skip nodes now set skipped=True, excluded from failure count.
flows.py + direct.py track skipped separately.
- #12: 04-配置文件/conftest.py test_data + browser_context session→function scope.
test_data uses tmp_path to avoid parallel file collisions.
- #13: MAX_FAILURES=3 circuit breaker in flows.py + direct.py. DAG progress logging
per node. tasks.py timeout_seconds=3600.
148 tests pass. 9/9 DAG demo ok.
* feat: Phase 4 test intelligence — dashboard, readiness score, flaky trends, impact analysis, traceability
- #14: runtime/observability/dashboard.py — 3-row layout (decision→diagnostic→action)
with MTTD/MTTR, expert heatmap, flaky candidates, env health, action items.
api/main.py /dashboard endpoint rewired to new builder.
- #15: runtime/orchestrator/release_readiness.py — weighted scoring
(smoke×0.4+regression×0.3+perf×0.2+security×0.1→GREEN/YELLOW/RED).
CLI: tagent readiness. Does not modify test_lead.py.
- #16: flaky_detector.py — detect_trends() (P-F-P/F-P-F patterns),
generate_quarantine(), generate_pytest_markers().
- #17: runtime/intelligence/impact_analyzer.py — AST import graph +
git diff → impacted test list. Does not modify regression_scope.py.
- #18: traceability_matrix.py — bidirectional Req↔TC↔Bug matrix
with coverage stats, orphan detection, markdown export.
148 tests pass. 9/9 DAG demo ok.
* feat: Phase 6 developer experience — bootstrap, debug mode, actionable errors, tutorial, shell completion
- #24: tagent bootstrap — one-command check→configure→verify (Python/Git/pip/LLM)
- #25: --debug CLI flag + TAGENT_LOG_LEVEL env + log_level setting
- #26: Actionable error messages — "internal error" now includes run_id + log path + --debug hint.
modal.py "not connected" → "call connect() first"
- #27: docs/tutorial/TUTORIAL.md — 5-step interactive tutorial (10 min)
- #28: tagent --install-completion (shell autocomplete) + --no-color flag
148 tests pass.
* feat: Phase 5 enterprise readiness — RBAC, audit trail, multi-tenant, config validation, lifecycle hooks
- #19: runtime/api/rbac.py — 4-role RBAC (admin/lead/tester/viewer) + require_role()
decorator. Disabled by default (TAGENT_RBAC_ENABLED=0). Does not modify auth middleware.
- #20: runtime/observability/audit.py — JSONL audit log (log_event / query_events).
Thread-safe, append-only.
- #21: runtime/api/tenancy.py — contextvars-based tenant propagation.
Disabled by default. Does not modify DB schema.
- #22: Settings.validate_startup() — checks LLM key, dirs, DB driver.
Wired into tagent doctor.
- #23: runtime/orchestrator/hooks.py — HookRegistry (before/after/on_error).
Integrated into direct.py _run_node(). Hooks never break execution.
148 tests pass.
* feat: Phase 7 methodology — branch coverage, static analysis, portability tests, risk matrix, classification tree
- #29: pyproject.toml --cov-branch enabled
- #30: pyproject.toml pylint + radon config (CC rank=B)
- #31: 7 portability tests (ISO 25010: installability/coexistence/replaceability)
+ @pytest.mark.portability marker
- #32: runtime/intelligence/risk_matrix.py — Bayesian calibrated risk matrix
with mitigation tracking
- #33: classification_tree.py — ISTQB CTM with pairwise generation + constraints
155 tests pass (148 + 7 portability).
* feat: Phase 8 platform — plugin discovery, data synthesis, APM export, journey mapping, multi-region monitor
- #34: runtime/marketplace/discovery.py — importlib.metadata entry_points for
third-party agent/skill/backend registration (group=tagent)
- #35: data_synthesizer.py — PII auto-detection (email/phone/id/ip/credit_card)
+ deterministic masking + random subset extraction
- #36: runtime/observability/apm_export.py — Datadog + Grafana dashboard JSON
export (pass rate, MTTD/MTTR, expert health, flaky candidates)
- #37: runtime/intelligence/journey_mapper.py — failure→business journey impact
mapping (Registration/Login/Payment/Profile/...)
- #38: .github/workflows/synthetic-monitor.yml — scheduled multi-region smoke
test (every 6h, 4 regions)
155 tests pass. 9/9 DAG demo ok.
🎉 38/38 MASTER_PLAN items complete.
* fix: CI utils count 49→52 + remove --cov-branch from default pytest addopts
- .github/workflows/ci.yml: expected utils count updated 49→52
- runtime/pyproject.toml: removed --cov-branch from addopts (requires
pytest-cov which is not installed in CI). Coverage flags should be
passed explicitly: pytest --cov --cov-branch
* fix: CI pytest — add fastapi/python-multipart/httpx/pytest-cov deps, restore --cov-branch
* fix: resolve CodeQL review comments — URL substring sanitization + workflow permissions
---------
Co-authored-by: xiaoxing0135 <706015750@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
14 commits covering bundle2 sprint W1 through W5-1 of the V1.14.0-alpha calibration program. Bundles together the file-by-file fix list locked into project memory on 2026-05-13 (~210 findings, 26 sections, CP1-CP6 checkpoints).
Three thematic axes:
verdict: conditionalwith explicit warnings instead of silent green.resolve() + relative_to(allowed_root)guard for MCP servers (evidence_vault, test_orchestrator), env-var gate + try/finally cleanup for chaos_helper offensive utils.tagent doctor --llm-smokesingle round-trip probe,tagent demo --real-llmwith cost confirm + pre-flight smoke, REGISTRY auto-registration for backends + gateway, qwen provider prefix fix.Commit map
7d72d3c6b250c1355fe0a6dbe504925c5e79529c3a0a9be6d_degraded_upstream798e6d59b215b0dashscope, notopenai)f84d32dd34fdd17c2544d63d998080ccfa0Behavior change highlights
Anti-mock:
RunnerResult.okis no longer hardcodedTrue; semantics nowok=True/degraded=False(success) vsok=True/degraded=True(mock fallback) vsok=False(raised). Consumers (test_lead verdict, bug_manager priority, report-generator section) read_degraded_upstreamand surface warnings to the user.Path traversal: MCP servers accepting file path arguments now require
resolve().relative_to(allowed_root); out-of-root paths raise immediately.Safe-by-default for offensive utils:
chaos_helperrefuses all destructive ops unlessTAGENT_CHAOS_AUTHORIZED=1;shift_clockrequires the additionalTAGENT_ALLOW_CLOCK_DRIFT=1. iptables / temp-file / process-kill ops now have try/finally cleanup, owner checks, and identifier validation. Same pattern queued for W5-2..W5-5 (api_security_scanner / db_test_helper / ai_adversarial / desktop_driver).New CLI surfaces:
tagent doctor --llm-smoke· singleHello → 你好round-trip, reports provider/model/latency/tokens/cost. 5s post-install verification.tagent demo --real-llm· warns ~$1-3 / 60-120s, requirestyper.confirm, runs pre-flightllm_smoke(fail-fast on broken creds),-y/--skip-smokeescapes for CI.Registry registration:
runtime/backends/__init__.pyandruntime/gateway/__init__.pynow explicitly import all submodules so@register("name")decorators actually fire (50% error rate in this pattern across the project — documented in inspiration library).Known follow-ups (not in this PR)
git clone + install.sh + tagent demo) — requires user-driven Windows sandbox run.tagent.ymlschema split-brain (init renderer schema ≠ safety.py schema) — surfaced during W4-6 audit, tracked separately.Test plan
pytest runtime/tests/)verdict: conditionalinstead of greentagent doctor --llm-smokeworks with stub provider (instant return) + with real provider (real round-trip)tagent demo --real-llm -y --skip-smokeruns full 16-agent DAG without promptschaos_helperrefuses withoutTAGENT_CHAOS_AUTHORIZED=1; accepts with ittagent.yml.exampleparses withruntime/config/safety.py; pentest fields documented even though gate is V1.xNotes for reviewer
Sprint locked in project memory; no findings rolled back. CP1-CP6 checkpoints honored: each anti-mock fix wired through source/transfer/consumer; each path-guard applied as a reusable pattern. Inspiration library cross-references at
D:\项目文件\灵感库\(degraded-anti-mock-3layer.md, path-traversal-guard.md, decorator-registry-init.md, config-schema-split-brain.md, hermes-gbrain-借鉴维度.md, 自动回检机制-V7.md).