Release What Changed · SafeRL-Lab/cheetahclaws

May 8, 2026 (v3.05.78): May 8, 2026: F-2/F-3 follow-ups + CI unblock (feature/fix-f2). Main has been red since 9c01237d (the trading-agent #99 merge) because tests/test_packaging.py::test_required_module_imports[modular.trading.ml] (issue #97 regression test) caught that modular/trading/ml/features.py and modular/trading/portfolio.py import numpy at module top while numpy is in the [trading] extra — pip install . shipped a broken wheel and #100 / #101 inherited the red. Two-commit fix on top of #101: (a) fix(ci) — drop the dead numpy import from features.py; defer numpy to inside stacker.py:train() / predict_proba() past their early-return paths; gate portfolio.py's numpy behind try/except; add pytest.mark.skipif on the optimizer / managed-portfolio / ML-training / factor-scan tests so lean-install CI skips them cleanly. Verified: clean venv with only [web,autosuggest] (the exact CI install) 1075 passed, 11 skipped; with full extras 1086 passed, no regressions. (b) fix(daemon) — five F-2/F-3 follow-ups: move monitor.scheduler.start(...) past the listener bind in cc_daemon/cli.py:cmd_serve (so a misconfigured fetch/deliver can't fail before the daemon is reachable); add _foreign_daemon_running() step-aside check at every scheduler loop tick to close the race where REPL /monitor start fires before the daemon writes its discovery file (both schedulers would otherwise race on last_run_at); flip cc_daemon/schema.py to PRAGMA synchronous=NORMAL (safe under WAL, 8× faster EventBus.publish — 305 μs/event → 39 μs/event, important for streaming agent output); clarify in jobs.py / monitor/store.py / docs/architecture.md that the JSON→SQLite migration is one-way (PR #101's wording implied a fallback read path that doesn't exist); update docs/RFC/0002-daemon-foundation-roadmap.md F-2/F-3 status from OPEN → MERGED. Branch: feature/fix-f2.
Research lab Phase A — autonomous multi-day research; WeChat smart-reply + /draft semi-auto reply; reliability + UX hardening across the lab pipeline. Two big surfaces shipped together: (a) the research lab is no longer single-shot — /lab resume <run_id> [<stage>] reconstructs LabState from SQLite to continue or rewind a run; /lab iterate <run_id> runs a 3-reviewer self-review on the final report (novelty / rigor / clarity / evidence, 1-10), routes the lowest-scoring dimension to the corresponding stage (novelty→QUESTIONING, rigor→IMPLEMENTATION, clarity→DRAFTING, evidence→EXPERIMENT), rewinds + re-runs, loops until target_score / max_iterations / plateau / budget; /lab backlog add <topic> --iterate --target=N --max=N --prio=N queues many topics, /lab daemon start runs them 24/7 in a single-worker loop with crash-recovery (reset_running_backlog unsticks stale rows on next start); /lab models prints the effective per-role model + which API key drove each pick + warns when reviewers span <N families (homogeneous review = no meta-loop signal); /lab migrate-paths [--apply] renames legacy lab_xxx/ output dirs to the human-readable <date>_<time>_<topic-slug>_<run_id_short> form (e.g. 2026-05-08_14-30_post-transformer-architectures-survey_b16036de/). (b) WeChat smart-reply panel — when a whitelisted contact sends an inbound message, an auxiliary cheap model drafts 3 candidate replies and pushes them as a panel to your filehelper (文件传输助手); reply with 1/2/3/AA 1 to send, freeform text to customise, x to skip, q for queue. SQLite-persisted at ~/.cheetahclaws/wx_smart_reply.db (in-memory fallback on init failure); contacts JSON at ~/.cheetahclaws/wx_contacts.json is mtime-hot-reloaded; bot-owner self-uid is auto-recorded on first inbound and excluded from smart-reply unconditionally, so your own messages always reach the agent regardless of whitelist contents. (c) /draft <message> slash command — semi-automatic reply suggestion path for cases where the bot can't intercept the inbound directly (bot account ≠ user main account on iLink ClawBot). 3 candidates drafted via the auxiliary model, optionally tone-conditioned via @<contact_uid_or_label> against wx_contacts.json; when invoked from a bridge channel (WeChat / Telegram / Slack), candidates are also echoed back to the originating uid + stashed in bridges.draft_cache so a digit-only reply (1/2/3) consumes the chosen text one-shot, no agent invocation, no smart-reply panel triggered. Reliability hardening on top of #88's MCP work: research/http.py now uses 429-aware backoff (10/30/60/120s vs 0.5/1/2/4s for 5xx) and honours Retry-After headers (capped at 180s); the lab surveyor stage grounds in real research.aggregator.research() hits before invoking the LLM (top-30 academic+tech results passed as context, persisted as survey_search_hits artifact for replay) — fabricated-citation rate drops sharply on tested topics; _dedupe_self_repeat() trims cheap-model degenerate sampling (text == text+text) before storage so reviewer prompts don't see doubled inputs; _extract_numbered dedupes by content (questioner emitting 1..5\n1..5 keeps 5, not 10); the citation verifier now has a per-citation 30s concurrent.futures hard wall-clock (kills slow-loris sockets that urllib's socket-timeout ignores) + a 5-min stage-level cap with progress callbacks surfaced to /lab logs (the 11-min hang we saw in the field is gone). REPL ergonomics: /lab daemon start and /lab start now print the eventual report.md path up front + live-stream stage transitions to the terminal as they happen; /lab status <run_id> shows both new + legacy paths so the user can find old reports too; /config parses JSON-style values (lists, dicts, signed numbers, quoted strings) — /config wechat_smart_reply_whitelist=["wxid_..."] no longer silently saved as a literal string; leading whitespace before / is now stripped before slash-dispatch (so a paste with a stray space still hits the dispatcher, not the agent). Tests: 884 passing (842 unit/integration + 22 e2e), zero regressions; ~80 new pytest cases covering iteration scoring, state reconstruction, backlog atomicity, verifier hard-timeout, slug edge cases, dedupe patterns, self-uid bypass.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What Changed

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!