Skip to content

v0.1.0

Choose a tag to compare

@X-iZhang X-iZhang released this 08 May 22:49
· 60 commits to main since this release
Immutable release. Only release title and notes can be modified.

EvoScientist v0.1.0 โ€” Async Sub-Agents, Official Docker Image, Personal WeChat & Sessions DB Compaction

G'Day! v0.1.0 is the biggest release since launch. The headline is async sub-agents โ€” writing-agent and data-analysis-agent now run on a managed langgraph dev subprocess, so 30-180s long-runners no longer block your chat, and you get proactive completion notifications without polling. We also ship an official multi-arch Docker image, a personal-WeChat (iLink) channel backend with QR login, the new /model-fallback command, a PruningCheckpointer that compacts multi-GB legacy sessions.db files in place, and a critical HITL fix that unblocks parallel sub-agents calling execute.


๐Ÿš€ Async Sub-Agents (default-on)

writing-agent and data-analysis-agent typically run 30-180s. With sync delegation, your main chat was blocked the whole time. v0.1.0 transparently deploys the main agent and any sub-agent flagged async: true onto an auto-managed langgraph dev subprocess; the supervisor returns immediately with a task_id and keeps talking (#200).

  • Single source of truth โ€” per-agent YAML in EvoScientist/subagents/ (replaces the monolithic subagent.yaml); both the in-process sync path and the deployed async path read from the same file. Adding a new async sub-agent: flip async: true, add a binding in langgraph_dev/graphs.py, register in langgraph_dev/langgraph.json.
  • Auto-managed subprocess โ€” langgraph_dev/manager.py handles start / health-check / stop. Uses filelock to serialize concurrent CLI invocations and psutil to detect stale PID files. Default port 6174 (Kaprekar's constant); onboarding has a "LangGraph Port" step that rejects already-occupied ports unless reused by EvoSci itself.
  • Workspace sync widget โ€” cli/widgets/workspace_sync_widget.py shows live progress when CLI workspace files sync to the langgraph dev subprocess.
  • Graceful fallback โ€” if langgraph dev fails to start, async sub-agents fall back to in-process synchronous delegation transparently.
  • Toggle โ€” set enable_async_subagents: false in ~/.config/evoscientist/config.yaml to keep everything sync.

๐Ÿ”” Async Sub-Agent Auto-Notifications

The main agent now learns about async completions automatically โ€” no more "check status" polling (#214).

โ•ญโ”€โ”€ โœฆ Agent Teams โœฆ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ โœ” writing-agent       success       โ”‚
โ”‚ โœ— data-analysis-agent error         โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
  • Watcher pattern โ€” every async launch spawns a background asyncio task subscribed to client.runs.join_stream(thread_id, run_id) (SDK-native SSE long-poll). Terminal state pushes an AsyncTaskNotification onto a thread-safe queue; CLI / TUI / serve poll loops drain, dedup, batch, and inject as a synthetic HumanMessage to wake the supervisor.
  • Token-efficient โ€” LLM input gets compact JSON per task line ({"agent": ..., "status": ..., "task_id": ...}); the decorative frame with colored โœ”/โœ—/โš  icons lives in a separate render path.
  • Update support โ€” update_async_task (continuing a conversation with a sub-agent) creates a new run_id on the same thread_id. _watcher_by_thread dict + replace semantics ensure the new watcher takes over; old watcher cancelled before the new run is awaited (no stale "success" race).
  • Per-thread routing โ€” notifications are scoped to their originating CLI thread, so /new between sub-agent launch and completion no longer injects updates into the wrong thread.
  • False-positive watcher fix โ€” production-observed bug where the SSE long-poll closed cleanly while the run was still alive on the server (httpx timeout, HTTP/2 GOAWAY, proxy idle keep-alive) caused fake โœ… success notifications. Watcher now uses a bounded reconnect loop and verifies via runs.get after every clean close (#216).
  • /model propagation โ€” /model switches now reach async sub-agents via RunnableConfig.configurable. Previously the deployed graphs froze the boot-time model, so async agents kept the old one forever (#217).

๐Ÿณ Official Docker Image

Multi-arch (amd64 / arm64) image at ghcr.io/evoscientist/evoscientist, built via GitHub Actions (#198 by @din0s, closes #175).

docker run -it --rm \
  -v evosci-data:/home/evosci/.evoscientist \
  -v $(pwd)/workspace:/workspace \
  ghcr.io/evoscientist/evoscientist
  • Multi-stage build off python:3.11-slim-bookworm, uv-managed venv with the all-channels extra installed via uv sync --frozen --no-dev --no-editable.
  • Node.js 24 LTS (copied from node:24-bookworm-slim) so npx works for the majority of MCP servers.
  • uv binary so the MCP registry can install Python MCP servers on demand at runtime.
  • One-volume persistence โ€” /home/evosci/.evoscientist holds sessions DB, global skills, memories, and config (XDG_CONFIG_HOME points inside it; UV_TOOL_DIR / UV_TOOL_BIN_DIR redirect uv-tool artifacts there too). MCP servers installed during onboarding survive docker run --rm.
  • Non-root user evosci for shell sandboxing.
  • See the README's Docker section for derivation recipes for unbundled extras (stt, oauth, TinyTeX) and proxy / cert handling.

๐Ÿ’ฌ Personal WeChat (iLink) Channel Backend

A third WeChat backend โ€” personal โ€” rides Tencent's iLink Bot long-poll gateway, letting a personal WeChat account act as a bot alongside the existing WeCom and Official Account backends (#212 by @MuXinCG2004).

  • No app-id / secret โ€” credentials come from a QR-code scan (python -m EvoScientist.channels.wechat.serve --qr-login) and are persisted under DATA_DIR/wechat_personal/accounts/. The wizard can drive the QR scan inline.
  • Backend picker in onboard โ€” first-time users pick WeCom / ๅพฎไฟกๅ…ฌไผ—ๅท / ไธชไบบๅพฎไฟก without leaving the wizard.
  • Group delivery is opt-in and disabled by default โ€” iLink rarely delivers group messages for QR-login bot identities.
  • AES-128-ECB CDN media protocol against novac2c.cdn.weixin.qq.com.
  • New deps: qrcode>=7.4 and certifi>=2024.0 added to the wechat and all-channels extras.

Why personal vs the existing backends: WeCom and Official Account both require a registered organisation / app and issue stable credentials โ€” fine for production bots, awkward for hobbyists. The iLink personal backend lowers that barrier: scan a QR with a personal WeChat, get a session token, run the bot.


๐Ÿ“ฑ QQ Bot QR-Code Onboarding

The config wizard can now auto-fill qq_app_id and qq_app_secret after the developer scans a QR code with a bound QQ account โ€” no more manually copying credentials from q.qq.com (#213 by @MuXinCG2004).

  • qr_register() flow drives q.qq.com's create_bind_task / poll_bind_result APIs.
  • AES-256-GCM with a client-generated key โ€” the bot's client_secret never travels in plaintext.
  • Manual fallback on scan failure / cancel โ€” no behavior change for users who'd rather paste credentials.
  • The bot must already be registered at q.qq.com; scanning binds your QQ account to the existing app.

๐Ÿ”„ /model-fallback Command

A new /model-fallback command (#196 by @mooshroom4422, closes #67) appends a model to the session-level fallback chain. If the primary model errors out (rate limit, transient 5xx, etc.), EvoSci automatically retries with the next model in the chain.

  • Same picker as /model โ€” pick fallback models from any registered provider.
  • /model-fallback save persists the chain to ~/.config/evoscientist/config.yaml.
  • /model-fallback help lists all sub-commands (add / remove / list / save).
  • Real-time status messages during fallback attempts.

๐Ÿ—œ๏ธ PruningCheckpointer โ€” Multi-GB sessions.db Compaction

LangGraph's AsyncSqliteSaver writes a full state snapshot per super-step. EvoScientist's resume / HITL / sub-agent paths only ever read the latest checkpoint per (thread_id, checkpoint_ns), so every intermediate snapshot is dead weight. Real-world observed: a single user's sessions.db grew to 2.6 GB with 8,337 checkpoints across 153 threads โ€” one thread alone held 1,086 checkpoints (#194).

  • Automatic per-write pruning โ€” PruningCheckpointer (subclass of AsyncSqliteSaver) overrides aput() to retain at most checkpoint_keep_per_thread (default 10) rows per (thread_id, checkpoint_ns). Outer asyncio.Lock makes super().aput() + prune atomic; just-written row is always kept under concurrent writers.
  • One-time migration sweep โ€” gated by PRAGMA user_version and a 100 MB size threshold. Prunes every existing (thread, ns) pair to N, schedules VACUUM via atexit. Live progress bar with ETA:
    ยท Compacting sessions DB (14.53 GB, 99 thread-namespace pairs, est. ~2m 15s)
    โ น Compacting 27% (27/99 pairs ยท 38s elapsed ยท ~1m 41s remaining)
    ยท โœ“ Compaction done in 2m 18s
    
  • Safe across providers โ€” filters by metadata.agent_name so co-located non-EvoSci agents are never touched.
  • Race fix (#195) โ€” sweep now runs synchronously inside get_checkpointer() before yielding the saver, so concurrent channel aput()s no longer race the sweep for SQLite's EXCLUSIVE lock and trigger Error: database is locked.
  • Diagnostic โ€” new EvoSci sessions stats CLI command surfaces DB state.

๐Ÿ›ก๏ธ HITL Fix โ€” Parallel Sub-Agents No Longer Crash

When the main agent dispatched 2+ parallel sub-agents (via task) that each called execute, LangGraph used to raise:

When there are multiple pending interrupts, you must specify the interrupt id when resuming.

Root cause: DeepAgents 0.5.5 propagated the top-level interrupt_on={"execute": True} we passed to create_deep_agent(...) into every declarative sub-agent and the auto-injected general-purpose sub-agent. Parallel sub-agents then produced multiple pending interrupts in the same checkpoint, which a single resume decision could not disambiguate.

Fix (#202): stop passing interrupt_on= to create_deep_agent. Instead, append HumanInTheLoopMiddleware directly to the main agent's stack in both _get_default_agent() and create_cli_agent(). Sub-agent shell commands remain protected by CustomSandboxBackend static checks (blocks sudo / chmod / dd / path traversal, 300s timeout, 100KB cap). Semantic model: user authorizes the high-level task goal โ†’ sub-agent executes within sandbox limits.


๐ŸชŸ Context Window Patch Table

A small per-model context-window patch table (llm/context_window.py) for the newest models that LangChain provider packages haven't registered profile data for yet โ€” claude-opus-4-7 (1M), gpt-5.5 (1.05M), kimi-k2.6 (262K), deepseek-v4-pro (1.05M), qwen3.6-flash (1M), mimo-v2.5 (1.05M), glm-5 (203K) (#191).

Without this, three things were broken for those models:

  1. Status bar percentage showed <used>/200K instead of the real <used>/1M (the only fallback was DEFAULT_CONTEXT_WINDOW_FALLBACK = 200_000).
  2. deepagents SummarizationMiddleware triggered way too early โ€” its hardcoded 170K tokens fallback meant a 1M-context model summarized at ~17% instead of ~85%.
  3. ContextEditingMiddleware clear-tool-uses trigger fell back to a fixed 100K, similarly pessimistic.

The table is meant to be trimmed over time โ€” once an entry shows up upstream in sst/models.dev, the corresponding row can be removed (the profile-reading layer wins automatically).


๐Ÿ”ง DeepSeek Cross-Provider Reasoning Fix

Fixes a 400 from DeepSeek V4 Pro thinking mode (deepseek-chat) when the conversation history contains assistant messages from another provider, from DeepSeek Flash, or from a session that predates the v0.0.9 reasoning_content capture patch (#192).

Trigger scenarios (any one is enough to break):

  • User chats with Anthropic / OpenAI for a few turns, then switches to DeepSeek V4 Pro
  • /resume of a thread previously driven by a non-thinking model
  • Resuming a thread created on an older EvoSci version

Fix: drop the is_reasoner gate in _patch_deepseek_reasoning_passback. Empty-string fallback now applies to every assistant message that lacks reasoning_content, regardless of which DeepSeek model is making the call. Safe because the patch is mounted only when provider == "deepseek", so all callers are guaranteed DeepSeek endpoints.


๐Ÿงน Refactors & Cleanup

  • ChannelRuntime replaces cli/channel.py module globals (#197 by @din0s, closes #182). _cli_agent / _cli_thread_id were poked directly by /model, /channel, _auto_start_channel, and the Rich/TUI session loops; tests needed _reset_channel_globals autouse fixtures to keep state from leaking. Now: a single ChannelRuntime() per session, threaded through CommandContext. /model rebinds via ctx.channel_runtime.agent = new_agent. Behavior unchanged.
  • Skill install manifest (#199 by @din0s). Onboarding's "install skills" step kept showing already-installed skills as still-installable. skills_manager now writes a per-tier .installed.yaml sidecar mapping installed dir name โ†’ original install source (URL, shorthand, or local path). _step_skills checks both the workspace and global tiers AND the manifest, so packs that explode into many child dirs (e.g. EvoScientist/EvoSkills@skills โ†’ paper-writing/, evo-memory/, โ€ฆ) are correctly detected.
  • Dead MessageBus.dispatch_outbound removed (#205 by @MuXinCG2004). Two implementations shared the same bus.outbound queue: MessageBus.dispatch_outbound (subscriber-based) and ChannelManager._dispatch_outbound (registry lookup). Only the latter was wired up in production โ€” the former was a footgun for anyone reading MessageBus in isolation. Net change: +14 / โˆ’139 across 3 files. No behavior change.
  • Markdown ATX heading spacing (#201) โ€” ensure proper spacing for ATX headings in Markdown rendering so streaming output renders correctly.

๐Ÿ“ฆ Dependency Bumps

  • deepagents 0.5.5 โ†’ 0.5.7 (#211)
    • 0.5.6 โ€” CompiledSubAgent now propagates lc_agent_name into metadata (langchain-ai/deepagents#3045). Direct improvement for our 6-level sub-agent name resolution chain (priority 1 reads metadata["lc_agent_name"]).
    • 0.5.7 โ€” auto-added GP sub-agent inherits parent permissions (langchain-ai/deepagents#3131). Preparatory for future FilesystemPermission adoption.
  • langchain-openrouter 0.2.1 โ†’ 0.2.3 โ€” includes upstream _merge_reasoning_details() (langchain-ai/langchain#36401) which consolidates streaming-fragmented reasoning_details before serialization. Our local _patch_openrouter_reasoning_details workaround is now removed (it dropped fragments lossily; upstream merges them losslessly).
  • Transitive bumps: langchain-anthropic 1.4.2 โ†’ 1.4.3, langsmith 0.7.38 โ†’ 0.8.0.

What's Changed

  • Feat/llm context window patch table by @X-iZhang in #191
  • fix(deepseek): add empty-string fallback for reasoning_content in croโ€ฆ by @X-iZhang in #192
  • Implement PruningCheckpointer for efficient checkpoint management andโ€ฆ by @X-iZhang in #194
  • Fix/sessions migration sweep race by @X-iZhang in #195
  • refactor(cli): replace channel module globals with ChannelRuntime by @din0s in #197
  • fix(onboard): detect installed skill packs via install manifest by @din0s in #199
  • feat(docker): official image with all runtime deps pre-installed by @din0s in #198
  • fix/cli hitl by @X-iZhang in #202
  • fix(markdown): ensure proper spacing for ATX headings in Markdown renโ€ฆ by @X-iZhang in #201
  • Refactor sub-agent architecture and introduce async support by @X-iZhang in #200
  • chore(deps): bump deepagents 0.5.7 and langchain-openrouter 0.2.3, drop redundant patch by @X-iZhang in #211
  • refactor(channels): remove dead MessageBus dispatcher by @MuXinCG2004 in #205
  • feat(cmd): add /model-fallback command by @mooshroom4422 in #196
  • feat(wechat): add personal-WeChat (iLink) backend with QR login by @MuXinCG2004 in #212
  • feat: Implement async sub-agent auto-notification system by @X-iZhang in #214
  • fix: Improve watcher logic to prevent false-positive notifications onโ€ฆ by @X-iZhang in #216
  • feat(qq): add QR-code scan-to-configure onboarding for QQ Bot by @MuXinCG2004 in #213
  • Fix/async subagent model switch by @X-iZhang in #217

Full Changelog: v0.0.9...v0.1.0