Fix JSON truncation causing "Unterminated string" errors (issue #12643) by Linux2010 · Pull Request #1 · Linux2010/hermes-agent

Linux2010 · 2026-04-19T18:46:25Z

Summary

Fixes NousResearch#12643 - vllm provider json error

The Bug

During compression/flush_memories, _prune_old_tool_results Pass 3 truncates large tool call arguments using raw string slicing:

args = args[:200] + "...[truncated]"

This breaks JSON validity when the slice position happens to be inside a JSON string value, creating invalid JSON like:

{"path": "file.txt", "content": "some long content here that got cut...[truncated]

(Note: missing closing quote and brace)

Downstream providers like vllm, MiniMax, and others strictly validate tool call arguments and reject malformed JSON with 400 "Unterminated string starting at" errors.

The Fix

Added _truncate_tool_call_args_json helper that:

Parses the JSON structure first
Shrinks long string values inside the parsed object (recursively)
Re-serializes with json.dumps(..., ensure_ascii=False)

This ensures truncated tool call arguments remain valid JSON that all providers can parse.

Test Coverage

Added TestTruncateToolCallArgsJson covering:

Short args unchanged
Long string values truncated properly
Nested dicts and lists handled
Non-string values (int, bool, float) preserved
Invalid JSON returned unchanged (graceful fallback)
CJK characters preserved (ensure_ascii=False)

Impact

Low risk: Changes only the truncation logic, no API signature changes
Hot path: Affects compression/flush_memories which users with custom vllm endpoints hit frequently
Clear reproduction: Issue [Bug]: vllm provider json error NousResearch/hermes-agent#12643 has detailed logs showing the error pattern

This is a surgical fix following the pattern from upstream PRs that addressed similar JSON truncation issues (MiniMax NousResearch#11762).

…ousResearch#8609) Three fixes for the (empty) response bug affecting open reasoning models: 1. Allow retries after prefill exhaustion — models like mimo-v2-pro always populate reasoning fields via OpenRouter, so the old 'not _has_structured' guard on the retry path blocked retries for EVERY reasoning model after the 2 prefill attempts. Now: 2 prefills + 3 retries = 6 total attempts before (empty). 2. Reset prefill/retry counters on tool-call recovery — the counters accumulated across the entire conversation, never resetting during tool-calling turns. A model cycling empty→prefill→tools→empty burned both prefill attempts and the third empty got zero recovery. Now counters reset when prefill succeeds with tool calls. 3. Strip think blocks before _truly_empty check — inline <think> content made the string non-empty, skipping both retry paths. Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models. Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead of proper function calls, causing content=None + tool_calls=None + reasoning with embedded <tool_call> XML. Prefill recovery works but counter accumulation caused permanent (empty) in long sessions.

The backup validation checked for 'hermes_state.db' and 'memory_store.db' as telltale markers of a valid Hermes backup zip. Neither name exists in a real Hermes installation — the actual database file is 'state.db' (hermes_state.py: DEFAULT_DB_PATH = get_hermes_home() / 'state.db'). A fresh Hermes installation produces: ~/.hermes/state.db (actual name) ~/.hermes/config.yaml ~/.hermes/.env Because the marker set never matched 'state.db', a backup zip containing only 'state.db' plus 'config.yaml' would fail validation with: 'zip does not appear to be a Hermes backup' and the import would exit with sys.exit(1), silently rejecting a valid backup. Fix: replace the wrong marker names with the correct filename. Adds TestValidateBackupZip with three cases: - state.db is accepted as a valid marker - old wrong names (hermes_state.db, memory_store.db) alone are rejected - config.yaml continues to pass (existing behaviour preserved)

Centralize container detection in hermes_constants.is_container() with process-lifetime caching, matching existing is_wsl()/is_termux() patterns. Dedup _is_inside_container() in config.py to delegate to the new function. Add _run_systemctl() wrapper that converts FileNotFoundError to RuntimeError for defense-in-depth — all 10 bare subprocess.run(_systemctl_cmd(...)) call sites now route through it. Make supports_systemd_services() return False in containers and when systemctl binary is absent (shutil.which check). Add Docker-specific guidance in gateway_command() for install/uninstall/start subcommands — exit 0 with helpful instructions instead of crashing. Make 'hermes status' show 'Manager: docker (foreground)' and 'hermes dump' show 'running (docker, pid N)' inside containers. Fix setup_gateway() to use supports_systemd instead of _is_linux for all systemd-related branches, and show Docker restart policy instructions in containers. Replace inline /.dockerenv check in voice_mode.py with is_container(). Fixes NousResearch#7420 Co-authored-by: teknium1 <teknium1@users.noreply.github.com>

…Research#8661) These were already declared in pyproject.toml but missing from the lockfile.

…n install Skills.sh installs hit the GitHub API 45 times per install because the same repo tree was fetched 6 times redundantly. Combined with search (23 API calls), this totals 68 — exceeding the unauthenticated rate limit of 60 req/hr, causing 'Could not fetch' errors for users without a GITHUB_TOKEN. Changes: - Add _get_repo_tree() cache to GitHubSource — repo info + recursive tree fetched once per repo per source instance, eliminating 10 redundant API calls (6 tree + 4 candidate 404s) - _download_directory_via_tree returns {} (not None) when cached tree shows path doesn't exist, skipping unnecessary Contents API fallback - _check_rate_limit_response() detects exhausted quota and sets is_rate_limited flag - do_install() shows actionable hint when rate limited: set GITHUB_TOKEN or install gh CLI Before: 45 API calls per install (68 total with search) After: 31 API calls per install (54 total with search — under 60/hr) Reported by community user from Vietnam (no GitHub auth configured).

…or search/install Add a CI-built skills index served from the docs site. The index is crawled daily by GitHub Actions, resolves all GitHub paths upfront, and is cached locally by the client. When the index is available: - Search uses the cached index (0 GitHub API calls, was 23+) - Install uses resolved paths from index (6 API calls for file downloads only, was 31-45 for discovery + downloads) Total: 68 → 6 GitHub API calls for a typical search + install flow. Unauthenticated users (60 req/hr) can now search and install without hitting rate limits. Components: - scripts/build_skills_index.py: Crawl all sources (skills.sh, GitHub taps, official, clawhub, lobehub), batch-resolve GitHub paths via tree API, output JSON index - tools/skills_hub.py: HermesIndexSource class — search/fetch/inspect backed by the index, with lazy GitHubSource for file downloads - parallel_search_sources() skips external API sources when index is available (0 GitHub calls for search) - .github/workflows/skills-index.yml: twice-daily CI build + deploy - .github/workflows/deploy-site.yml: also builds index during docs deploy Graceful degradation: when the index is unavailable (first run, network down, stale), all methods return empty/None and downstream sources handle the request via direct API as before.

Add _is_openclaw_running() and _warn_if_openclaw_running() to detect OpenClaw processes (via pgrep/tasklist) before hermes claw migrate. Warns the user that messaging platforms only allow one active session per bot token, and lets them cancel or continue. Fixes NousResearch#7907

…ctive prompt - Use PowerShell to inspect node.exe command lines on Windows, since tasklist output does not include them. - Also check for dedicated openclaw.exe/clawd.exe processes. - Skip the interactive prompt in non-interactive sessions so the preview-only behavior is preserved. - Update tests accordingly. Relates to NousResearch#7907

…al (fixes NousResearch#8502)

…port Combines detection from both PRs into _detect_openclaw_processes(): - Cross-platform process scan (pgrep/tasklist/PowerShell) from PR NousResearch#8102 - systemd service check from PR NousResearch#8555 - Returns list[str] with details about what's found Fixes in cleanup warning (from PR NousResearch#8555): - print_warning -> print_error/print_info (print_warning not in import chain) - Added isatty() guard for non-interactive sessions - Removed duplicate _check_openclaw_running() in favor of shared function Updated all tests to match new API.

…essages (NousResearch#8665) Four fixes for the Weixin/WeChat adapter, synthesized from the best aspects of community PRs NousResearch#8407, NousResearch#8521, NousResearch#8360, NousResearch#7695, NousResearch#8308, NousResearch#8525, NousResearch#7531, NousResearch#8144, NousResearch#8251. 1. Streaming cursor (▉) stuck permanently — WeChat doesn't support message editing, so the cursor appended during streaming can never be removed. Add SUPPORTS_MESSAGE_EDITING = False to WeixinAdapter and check it in gateway/run.py to use an empty cursor for non-edit platforms. (Fixes NousResearch#8307, NousResearch#8326) 2. Media upload failures — two bugs in _send_file(): a) upload_full_url path used PUT (404 on WeChat CDN); now uses POST. b) aes_key was base64(raw_bytes) but the iLink API expects base64(hex_string); images showed as grey boxes. (Fixes NousResearch#8352, NousResearch#7529) Also: unified both upload paths into _upload_ciphertext(), preferring upload_full_url. Added send_video/send_voice methods and voice_item media builder for audio/.silk files. Added video_md5 field. 3. Markdown links stripped — WeChat can't render [text](url), so format_message() now converts them to 'text (url)' plaintext. Code blocks are preserved. (Fixes NousResearch#7617) 4. Blank message prevention — three guards: a) _split_text_for_weixin_delivery('') returns [] not [''] b) send() filters empty/whitespace chunks before _send_text_chunk c) _send_message() raises ValueError for empty text as safety net Community credit: joei4cm (NousResearch#8407), lyonDan (NousResearch#8521), SKFDJKLDG (NousResearch#8360), tomqiaozc (NousResearch#7695), joshleeeeee (NousResearch#8308), luoxiao6645(NousResearch#8525), longsizhuo (NousResearch#7531), Astral-Yang (NousResearch#8144), QingWei-Li (NousResearch#8251).

Users who set up Nous auth without explicitly selecting a model via `hermes model` were silently falling back to anthropic/claude-opus-4.6 (the first entry in _PROVIDER_MODELS['nous']), causing unexpected charges on their Nous plan. Move xiaomi/mimo-v2-pro to the first position so unconfigured users default to a free model instead.

Read tts.speed (global) or tts.<provider>.speed (provider-specific) from config. Provider-specific takes precedence over global. - Edge TTS: converts speed float to SSML prosody rate string - OpenAI TTS: passes speed param clamped to 0.25-4.0 - MiniMax: wired into global tts.speed fallback for consistency Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>

12 tests covering: - Provider-specific speed overrides global speed - Global speed used as fallback - Default (no speed) preserves existing behavior - Edge SSML rate string conversion (positive/negative) - OpenAI speed clamping to 0.25-4.0 range

_check_compression_model_feasibility() called get_model_context_length() without passing config_context_length, so custom endpoints that do not support /models API queries always fell through to the 128K default, ignoring auxiliary.compression.context_length in config.yaml. Fix: read auxiliary.compression.context_length from config and pass it as config_context_length (highest-priority hint) so the user-configured value is always respected regardless of API availability. Fixes NousResearch#8499

- Test that auxiliary.compression.context_length from config is forwarded to get_model_context_length (positive case) - Test that invalid/non-integer config values are silently ignored - Fix _make_agent() to set config=None (cherry-picked code reads self.config)

aiohttp.ClientSession defaults to trust_env=False, ignoring HTTP_PROXY/ HTTPS_PROXY env vars. This causes QR login and all API calls to fail for users behind a proxy (e.g. Clash in fake-ip mode), which is common in China where Weixin and WeCom are primarily used. Added trust_env=True to all aiohttp.ClientSession instantiations that connect to external hosts (weixin: 3 places, wecom: 1, matrix: 1). WhatsApp sessions are excluded as they only connect to localhost. httpx-based adapters (dingtalk, signal, wecom_callback) are unaffected as httpx defaults to trust_env=True. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a one-line entry for HermesClaw (community WeChat bridge) to the Community section. It lets users run Hermes Agent and OpenClaw on the same WeChat account.

Some OpenAI-compatible clients (Open WebUI, LobeChat, etc.) send message content as an array of typed parts instead of a plain string: [{"type": "text", "text": "hello"}] The agent pipeline expects strings, so these array payloads caused silent failures or empty messages. Add _normalize_chat_content() with defensive limits (recursion depth, list size, output length) and apply it to both the Chat Completions and Responses API endpoints. The Responses path had inline normalization that only handled input_text/output_text — the shared function also handles the standard 'text' type. Salvaged from PR NousResearch#7980 (ikelvingo) — only the content normalization; the SSE and Weixin changes in that PR were regressions and are not included. Co-authored-by: ikelvingo <ikelvingo@users.noreply.github.com>

…sResearch#8681) * feat: add `hermes debug share` — upload debug report to pastebin Adds a new `hermes debug share` command that collects system info (via hermes dump), recent logs (agent.log, errors.log, gateway.log), and uploads the combined report to a paste service (paste.rs primary, dpaste.com fallback). Returns a shareable URL for support. Options: --lines N Number of log lines per file (default: 200) --expire N Paste expiry in days (default: 7, dpaste.com only) --local Print report locally without uploading Files: hermes_cli/debug.py - New module: paste upload + report collection hermes_cli/main.py - Wire cmd_debug + argparse subparser tests/hermes_cli/test_debug.py - 19 tests covering upload, collection, CLI * feat: upload full agent.log and gateway.log as separate pastes hermes debug share now uploads up to 3 pastes: 1. Summary report (system info + log tails) — always 2. Full agent.log (last ~500KB) — if file exists 3. Full gateway.log (last ~500KB) — if file exists Each paste uploads independently; log upload failures are noted but don't block the main report. Output shows all links aligned: Report https://paste.rs/abc agent.log https://paste.rs/def gateway.log https://paste.rs/ghi Also adds _read_full_log() with size-capped tail reading to stay within paste service limits (~512KB per file). * feat: prepend hermes dump to each log paste for self-contained context Each paste (agent.log, gateway.log) now starts with the hermes dump output so clicking any single link gives full system context without needing to cross-reference the summary report. Refactored dump capture into _capture_dump() — called once and reused across the summary report and each log paste. * fix: fall back to .1 rotated log when primary log is missing or empty When gateway.log (or agent.log) doesn't exist or is empty, the debug share now checks for the .1 rotation file. This is common — the gateway rotates logs and the primary file may not exist yet. Extracted _resolve_log_path() to centralize the fallback logic for both _read_log_tail() and _read_full_log(). * chore: remove unused display_hermes_home import

@bot

Port from openclaw/openclaw#64796: Per MSC3952 / Matrix v1.7, the m.mentions.user_ids field is the authoritative mention signal. Clients that populate m.mentions but don't duplicate @bot in the body text were being silently dropped when MATRIX_REQUIRE_MENTION=true. Cherry-picked from PR NousResearch#8673.

Port from openclaw/openclaw#64586: users who copy .env.example without changing placeholder values now get a clear error at startup instead of a confusing auth failure from the platform API. Also rejects placeholder API_SERVER_KEY when binding to a network-accessible address. Cherry-picked from PR NousResearch#8677.

Adds /debug as a slash command available in CLI, Telegram, Discord, Slack, and all other gateway platforms. Uploads debug report + full logs to paste services and returns shareable URLs. - commands.py: CommandDef in Info category (no cli_only/gateway_only) - gateway/run.py: async handler with run_in_executor for blocking I/O - cli.py: dispatch in process_command to run_debug_share

…usResearch#8725) Port from nearai/ironclaw#2304: Telegram's 4096 character limit is measured in UTF-16 code units, not Unicode codepoints. Characters outside the Basic Multilingual Plane (emoji like 😀, CJK Extension B, musical symbols) are surrogate pairs: 1 Python char but 2 UTF-16 units. Previously, truncate_message() used Python's len() which counts codepoints. This could produce chunks exceeding Telegram's actual limit when messages contain many astral-plane characters. Changes: - Add utf16_len() helper and _prefix_within_utf16_limit() for UTF-16-aware string measurement and truncation - Add _custom_unit_to_cp() binary-search helper that maps a custom-unit budget to the largest safe codepoint slice position - Update truncate_message() to accept optional len_fn parameter - Telegram adapter now passes len_fn=utf16_len when splitting messages - Fix fallback truncation in Telegram error handler to use _prefix_within_utf16_limit instead of codepoint slicing - Update send_message_tool.py to use utf16_len for Telegram platform - Add comprehensive tests: utf16_len, _prefix_within_utf16_limit, truncate_message with len_fn (emoji splitting, content preservation, code block handling) - Update mock lambdas in reply_mode tests to accept **kw for len_fn

…esearch#8724) When resuming a session with --resume or -c, the last assistant response was truncated to 200 chars / 3 lines just like older messages in the recap. This forced users to waste tokens re-asking for the response. Now the last assistant message in the recap is shown in full with non-dim styling, so users can see exactly where they left off. Earlier messages remain truncated for compact display. Changes: - Track un-truncated text for the last assistant entry during collection - Replace last entry with full text after history trimming - Render last assistant entry with bold (non-dim) styling - Update existing truncation tests to use multi-message histories - Add new tests for full last response display (char + multiline)

…rch#8723) Three changes that address the poor WhatsApp experience reported by users: 1. Reclassify WhatsApp from TIER_LOW to TIER_MEDIUM in display_config.py — enables streaming and tool progress via the existing Baileys /edit bridge endpoint. Users now see progressive responses instead of minutes of silence followed by a wall of text. 2. Lower MAX_MESSAGE_LENGTH from 65536 to 4096 and add proper chunking — send() now calls format_message() and truncate_message() before sending, then loops through chunks with a small delay between them. The base class truncate_message() already handles code block boundary detection (closes/reopens fences at chunk boundaries). reply_to is only set on the first chunk. 3. Override format_message() with WhatsApp-specific markdown conversion — converts **bold** to *bold*, ~~strike~~ to ~strike~, headers to bold text, and [links](url) to text (url). Code blocks and inline code are protected from conversion via placeholder substitution. Together these fix the two user complaints: - 'sends the whole code all the time' → now chunked at 4K with proper formatting - 'terminal gets interrupted and gets cooked' → streaming + tool progress give visual feedback so users don't accidentally interrupt with follow-up messages

…bridge (NousResearch#8745) * fix(telegram): use UTF-16 code units for message length splitting Port from nearai/ironclaw#2304: Telegram's 4096 character limit is measured in UTF-16 code units, not Unicode codepoints. Characters outside the Basic Multilingual Plane (emoji like 😀, CJK Extension B, musical symbols) are surrogate pairs: 1 Python char but 2 UTF-16 units. Previously, truncate_message() used Python's len() which counts codepoints. This could produce chunks exceeding Telegram's actual limit when messages contain many astral-plane characters. Changes: - Add utf16_len() helper and _prefix_within_utf16_limit() for UTF-16-aware string measurement and truncation - Add _custom_unit_to_cp() binary-search helper that maps a custom-unit budget to the largest safe codepoint slice position - Update truncate_message() to accept optional len_fn parameter - Telegram adapter now passes len_fn=utf16_len when splitting messages - Fix fallback truncation in Telegram error handler to use _prefix_within_utf16_limit instead of codepoint slicing - Update send_message_tool.py to use utf16_len for Telegram platform - Add comprehensive tests: utf16_len, _prefix_within_utf16_limit, truncate_message with len_fn (emoji splitting, content preservation, code block handling) - Update mock lambdas in reply_mode tests to accept **kw for len_fn * fix: resolve npm audit vulnerabilities in browser tools and whatsapp bridge Browser tools (agent-browser): - Override lodash to 4.18.1 (fixes prototype pollution CVEs in transitive dep via node-simctl → @appium/logger). Not reachable in Hermes's code path but cleans the audit report. - basic-ftp and brace-expansion updated via npm audit fix. WhatsApp bridge: - file-type updated (fixes infinite loop in ASF parser + ZIP bomb DoS) - music-metadata updated (fixes infinite loop in ASF parser) - path-to-regexp updated (fixes ReDoS, mitigated by localhost binding) Both components now report 0 npm vulnerabilities. Ref: https://gist.github.com/jacklevin74/b41b710d3e20ba78fb7e2d42e2b83819

…l_preview_length is 0 (NousResearch#8735) When tool_preview_length is 0 (default for platforms without a tier default, like Session), verbose mode was truncating args JSON to 200 characters. Since the user explicitly opted into verbose mode, they expect full tool call detail — the 200-char cap defeated the purpose. Now: tool_preview_length=0 means no truncation in verbose mode. Positive values still cap as before. Platform message-length limits handle overflow naturally.

…ousResearch#8794) OpenCode Zen was in _DOT_TO_HYPHEN_PROVIDERS, causing all dotted model names (minimax-m2.5-free, gpt-5.4, glm-5.1) to be mangled. The fix: Layer 1 (model_normalize.py): Remove opencode-zen from the blanket dot-to-hyphen set. Add an explicit block that preserves dots for non-Claude models while keeping Claude hyphenated (Zen's Claude endpoint uses anthropic_messages mode which expects hyphens). Layer 2 (run_agent.py _anthropic_preserve_dots): Add opencode-zen and zai to the provider allowlist. Broaden URL check from opencode.ai/zen/go to opencode.ai/zen/ to cover both Go and Zen endpoints. Add bigmodel.cn for ZAI URL detection. Also adds glm-5.1 to ZAI model lists in models.py and setup.py. Closes NousResearch#7710 Salvaged from contributions by: - konsisumer (PR NousResearch#7739, NousResearch#7719) - DomGrieco (PR NousResearch#8708) - Esashiero (PR NousResearch#7296) - sharziki (PR NousResearch#7497) - XiaoYingGee (PR NousResearch#8750) - APTX4869-maker (PR NousResearch#8752) - kagura-agent (PR NousResearch#7157)

…g skill (NousResearch#8800) PR NousResearch#4654 replaced ml-paper-writing with research-paper-writing, preserving the writing philosophy and reference files but dropping the dedicated 'Sources Behind This Guidance' attribution table from the SKILL.md body. Re-adds: - The researcher attribution table (Nanda, Farquhar, Gopen & Swan, Lipton, Steinhardt, Perez, Karpathy) with affiliations and links to SKILL.md - Orchestra Research credit as original compiler of the writing philosophy - 'Origin & Attribution' section in sources.md documenting the full chain: Nanda blog → Orchestra skill → teknium integration → SHL0MS expansion

…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).

…usResearch#10090) The existing recovery block sanitized self.api_key and self._client_kwargs['api_key'] but did not update self.client.api_key. The OpenAI SDK stores its own copy of api_key and reads it dynamically via the auth_headers property on every request. Without this fix, the retry after sanitization would still send the corrupted key in the Authorization header, causing the same UnicodeEncodeError. The bug manifests when an API key contains Unicode lookalike characters (e.g. ʋ U+028B instead of v) from copy-pasting out of PDFs, rich-text editors, or web pages with decorative fonts. httpx hard-encodes all HTTP headers as ASCII, so the non-ASCII char in the Authorization header triggers the error. Adds TestApiKeyClientSync with two tests verifying: - All three key locations are synced after sanitization - Recovery handles client=None (pre-init) without crashing

…ls, compression model fallback (NousResearch#9051, NousResearch#8620, NousResearch#9400) (NousResearch#10093) Four independent fixes: 1. Reset activity timestamp on cached agent reuse (NousResearch#9051) When the gateway reuses a cached AIAgent for a new turn, the _last_activity_ts from the previous turn (possibly hours ago) carried over. The inactivity timeout handler immediately saw the agent as idle for hours and killed it. Fix: reset _last_activity_ts, _last_activity_desc, and _api_call_count when retrieving an agent from the cache. 2. Detect uv-managed virtual environments (NousResearch#8620 sub-issue 1) The systemd unit generator fell back to sys.executable (uv's standalone Python) when running under 'uv run', because sys.prefix == sys.base_prefix. The generated ExecStart pointed to a Python binary without site-packages. Fix: check VIRTUAL_ENV env var before falling back to sys.executable. uv sets VIRTUAL_ENV even when sys.prefix doesn't reflect the venv. 3. Nudge model to continue after empty post-tool response (NousResearch#9400) Weaker models sometimes return empty after tool calls. The agent silently abandoned the remaining work. Fix: append assistant('(empty)') + user nudge message and retry once. Resets after each successful tool round. 4. Compression model fallback on permanent errors (NousResearch#8620 sub-issue 4) When the default summary model (gemini-3-flash) returns 503 'model_not_found' on custom proxies, the compressor entered a 600s cooldown, leaving context growing unbounded. Fix: detect permanent model-not-found errors (503, 404, 'model_not_found', 'no available channel') and fall back to using the main model for compression instead of entering cooldown. One-time fallback with immediate retry. Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass

…arch#10113) On some Python versions, argparse fails to route subcommand tokens when the parent parser has nargs='?' optional arguments (--continue). The symptom: 'hermes model' produces 'unrecognized arguments: model' even though 'model' is a registered subcommand. Fix: when argv contains a token matching a known subcommand, set subparsers.required=True to force deterministic routing. If that fails (e.g. 'hermes -c model' where 'model' is consumed as the session name for --continue), fall back to the default optional-subparsers behaviour. Adds 13 tests covering all key argument combinations. Reported via user screenshot showing the exact error on an installed version with the model subcommand listed in usage but rejected at parse time.

hermes doctor now checks whether the ~/.local/bin/hermes symlink exists and points to the correct venv entry point. With --fix, it creates or repairs the symlink automatically. Covers: - Missing symlink at ~/.local/bin/hermes (or $PREFIX/bin on Termux) - Symlink pointing to wrong target - Missing venv entry point (venv/bin/hermes or .venv/bin/hermes) - PATH warning when ~/.local/bin is not on PATH - Skipped on Windows (different mechanism) Addresses user report: 'python -m hermes_cli.main doesn't have an option to fix the local bin/install' 10 new tests covering all scenarios.

…ousResearch#10146) The /model picker called provider_model_ids() which fetches the FULL live API catalog (hundreds of models for Anthropic, Copilot, etc.) and only fell back to the curated list when the live fetch failed. This flips the priority: use the curated model list from list_authenticated_providers() (same lists as `hermes model` and gateway pickers), falling back to provider_model_ids() only when the curated list is empty (e.g. user-defined endpoints).

@user

…targets `_parse_target_ref` has explicit-reference branches for Telegram, Feishu, and numeric IDs, but none for Matrix. As a result, callers of `send_message(target="matrix:!roomid:server")` or `send_message(target="matrix:@user:server")` fall through to `(None, None, False)` and the tool errors out with a resolution failure — even though a raw Matrix room ID or MXID is the most unambiguous possible target. Three-line fix: recognize `!…` as a room ID and `@…` as a user MXID when platform is `matrix`, and return them as explicit targets. Alias-based targets (`#…`) continue to go through the normal resolve path.

…rch#6144

@user

…a examples (NousResearch#10151) Matrix room IDs contain ! and : which must be percent-encoded in URI path segments per the Matrix C-S spec. Without encoding, some homeservers reject the PUT request. Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org' to the tool schema examples so models know the correct target format.

…se (NousResearch#10217) With store=False (our default for the Responses API), the API does not persist response items. When reasoning items with 'id' fields were replayed on subsequent turns, the API attempted a server-side lookup for those IDs and returned 404: Item with id 'rs_...' not found. Items are not persisted when store is set to false. The encrypted_content blob is self-contained for reasoning chain continuity — the id field is unnecessary and triggers the failed lookup. Fix: strip 'id' from reasoning items in both _chat_messages_to_responses_input (message conversion) and _preflight_codex_input_items (normalization layer). The id is still used for local deduplication but never sent to the API. Reported by @zuogl448 on GPT-5.4.

…sResearch#10231) /browser connect set BROWSER_CDP_URL to http://localhost:9222, but Chrome's --remote-debugging-port only binds to 127.0.0.1 (IPv4). On macOS, 'localhost' can resolve to ::1 (IPv6) first, causing both _resolve_cdp_override's /json/version fetch and agent-browser's --cdp connection to fail when Chrome isn't listening on IPv6. The socket check in the connect handler already used 127.0.0.1 explicitly and succeeded, masking the mismatch. Use 127.0.0.1 in the default CDP URL to match what Chrome actually binds to.

…lood control Three fixes for the duplicate reply bug affecting all gateway platforms: 1. base.py: Suppress stale response when the session was interrupted by a new message that hasn't been consumed yet. Checks both interrupt_event and _pending_messages to avoid false positives. (NousResearch#8221, NousResearch#2483) 2. run.py (return path): Remove response_previewed guard from already_sent check. Stream consumer's already_sent alone is authoritative — if content was delivered via streaming, the duplicate send must be suppressed regardless of the agent's response_previewed flag. (NousResearch#8375) 3. run.py (queued-message path): Same fix — already_sent without response_previewed now correctly marks the first response as already streamed, preventing re-send before processing the queued message. The response_previewed field is still produced by the agent (run_agent.py) but is no longer required as a gate for duplicate suppression. The stream consumer's already_sent flag is the delivery-level truth about what the user actually saw. Concepts from PR NousResearch#8380 (konsisumer). Closes NousResearch#8375, NousResearch#8221, NousResearch#2483.

…oid dropping valid 0.0 values

…cefully instead of crashing with KeyError

…kflow (NousResearch#10242) * fix(gateway): suppress duplicate replies on interrupt and streaming flood control Three fixes for the duplicate reply bug affecting all gateway platforms: 1. base.py: Suppress stale response when the session was interrupted by a new message that hasn't been consumed yet. Checks both interrupt_event and _pending_messages to avoid false positives. (NousResearch#8221, NousResearch#2483) 2. run.py (return path): Remove response_previewed guard from already_sent check. Stream consumer's already_sent alone is authoritative — if content was delivered via streaming, the duplicate send must be suppressed regardless of the agent's response_previewed flag. (NousResearch#8375) 3. run.py (queued-message path): Same fix — already_sent without response_previewed now correctly marks the first response as already streamed, preventing re-send before processing the queued message. The response_previewed field is still produced by the agent (run_agent.py) but is no longer required as a gate for duplicate suppression. The stream consumer's already_sent flag is the delivery-level truth about what the user actually saw. Concepts from PR NousResearch#8380 (konsisumer). Closes NousResearch#8375, NousResearch#8221, NousResearch#2483. * fix(cron): include job_id in delivery and guide models on removal workflow Users reported cron reminders keep firing after asking the agent to stop. Root cause: the conversational agent didn't know the job_id (not in delivery) and models don't reliably do the list→remove two-step without guidance. 1. Include job_id in the cron delivery wrapper so users and agents can reference it when requesting removal. 2. Replace confusing footer ('The agent cannot see this message') with actionable guidance ('To stop or manage this job, send me a new message'). 3. Add explicit list→remove guidance in the cronjob tool schema so models know to list first and never guess job IDs.

Previously send_message only supported media delivery for Telegram. Discord users received a warning that media was omitted. - Add media_files parameter to _send_discord() - Upload media via Discord multipart/form-data API (files[0] field) - Handle Discord in _send_to_platform() same way as Telegram block - Remove Discord from generic chunk loop (now handled above) - Update error/warning strings to mention telegram and discord

- Fix file handle closed before POST: nest session.post() inside the 'with open()' block so aiohttp can read the file during upload - Update warning text to include weixin (also supports media delivery) - Add 8 unit tests covering: text+media, media-only, missing files, upload failures, multiple files, and _send_to_platform routing

…usResearch#10276) Users are confused about the difference between `hermes model` (terminal command for full provider setup) and `/model` (session command for switching between already-configured providers). This distinction was not documented anywhere. Changes across 4 doc pages: - cli-commands.md: Added warning callout explaining the difference, added --global flag docs, added 'only see OpenRouter models?' info box - slash-commands.md: Added notes on both TUI and messaging /model entries that /model only switches between configured providers - providers.md: Added 'Two Commands for Model Management' comparison table near top of page, added warning callout in switching section - faq.md: Added new FAQ entry '/model only shows one provider' with quick reference table Prompted by user feedback in Discord — new users consistently hit this confusion when trying to add providers from inside a session.

…facing text (NousResearch#10285) Tool schema descriptions and tool return values contained hardcoded ~/.hermes paths that the model sees and uses. When HERMES_HOME is set to a custom path (Docker containers, profiles), the agent would still reference ~/.hermes — looking at the wrong directory. Fixes 6 locations across 5 files: - tools/tts_tool.py: output_path schema description - tools/cronjob_tools.py: script path schema description - tools/skill_manager_tool.py: skill_manage schema description - tools/skills_tool.py: two tool return messages - agent/skill_commands.py: skill config injection text All now use display_hermes_home() which resolves to the actual HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X for profiles, ~/.hermes for default). Reported by: Sandeep Narahari (PrithviDevs)

## What broke Multi-turn conversations using codex/OpenAI Responses provider fail after a few turns with HTTP 400 error: ``` string_above_max_length: 408 > 64 for 'id' ``` The reasoning logic works correctly, but API rejects the request because reasoning item `id` exceeds the 64-character limit. ## Root cause In run_agent.py:4002-4007, when capturing reasoning items from codex responses, we preserve the `id` field verbatim. Codex backends return 408-character opaque ids (UUID-like). These ids end up in `assistant_message.codex_reasoning_items`, persisted in session DB. On later turns, the replay path (3592-3597) strips the id, so that's fine. BUT: The capture path doesn't validate id length. If a short id is captured on turn 1, and a 408-char id on turn 2, both are preserved. On turn 3, when we construct the request, if there's any path that doesn't strip properly, the API rejects. OpenAI Responses API validates: `id` <= 64 chars. ## Why this fix is minimal Single line change: added `len(item_id) <= 64` condition before preserving the id field. Long ids are dropped; server derives identity from `encrypted_content` anyway. No changes to replay logic (already strips id correctly). No changes to other reasoning capture paths. No opportunistic refactoring. ## What I tested Added test suite tests/test_reasoning_item_id_length.py: - test_short_id_preserved: ids <= 64 preserved - test_long_id_dropped: ids > 64 dropped (408-char case) - test_64_char_id_preserved: exactly 64 preserved - test_65_char_id_dropped: exactly 65 dropped - test_no_id_gracefully_handled: missing id handled - test_empty_id_gracefully_handled: empty id handled - test_encrypted_content_always_preserved: core data preserved - test_replay_strips_id: replay path verified - test_replay_preserves_encrypted_content: replay correctness - test_replay_preserves_summary: summary preserved All tests verify id validation without breaking core functionality. ## What I intentionally did not change - No changes to replay path (already strips id via `{k: v for k, v in ri.items() if k != "id"}`) - No changes to encrypted_content handling - No changes to other providers (Anthropic, etc.) - No opportunistic refactoring ## Evidence Before: Capture preserves 408-char id → HTTP 400 "string_above_max_length" After: Capture drops >64-char id → API accepts request Fixes NousResearch#10788

…h#12643) The bug: _prune_old_tool_results Pass 3 used raw string slicing on tool call arguments JSON (args[:200] + '...[truncated]'), which breaks JSON validity when the slice position is inside a string value. Downstream providers like vllm, MiniMax reject this with 400 'Unterminated string' errors, causing compression loops to fail. Fix: Add _truncate_tool_call_args_json helper that: 1. Parses the JSON structure 2. Shrinks long string values inside the parsed object 3. Re-serializes with json.dumps (preserving CJK/emoji) This ensures truncated tool call arguments remain valid JSON that all providers can parse. Tests: Added TestTruncateToolCallArgsJson covering: - Short args unchanged - Long string values truncated properly - Nested dicts and lists handled - Non-string values (int, bool, float) preserved - Invalid JSON returned unchanged - CJK characters preserved (ensure_ascii=False)

github-actions · 2026-04-19T18:46:39Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: base64 encoding/decoding detected

Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.

Matches (first 20):

8380:+        b64 = base64.b64encode(raw).decode("ascii")
9022:+        aes_key_for_api = base64.b64encode(aes_key.hex().encode("ascii")).decode("ascii")
25409:+        body = base64.urlsafe_b64decode(payload["body"]["data"]).decode("utf-8", errors="replace")
25413:+                body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
25418:+                    body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
33951:+        expected_aes = base64.b64encode(fake_hex_key.encode("ascii")).decode("ascii")

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

7837:+            proc = await asyncio.create_subprocess_exec(
17653:+    "Quick commands support two types: exec (run shell command directly) and alias (redirect to another command).",
47238:+            parent_check = self._exec(
47242:+                ls_result = self._exec(
47288:+        result = self._exec(cmd_sorted, timeout=60)
47298:+            result = self._exec(cmd_plain, timeout=60)
54955:+  while ((match = pattern.exec(text)) !== null) {
60000:+  while ((match = regex.exec(snippet)) !== null) {

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

11553:+                resp = httpx.post(
13765:+    with urllib.request.urlopen(req, timeout=30) as resp:
13801:+    with urllib.request.urlopen(req, timeout=30) as resp:
18320:+            with urllib.request.urlopen(req, timeout=_GATEWAY_HEALTH_TIMEOUT) as resp:
19245:+        with urllib.request.urlopen(req, timeout=20) as resp:
20976:+        with urllib.request.urlopen(req, timeout=15) as r:
21487:+        with urllib.request.urlopen(req, timeout=15) as r:
21547:+        with urllib.request.urlopen(url, timeout=10) as r:
36838:+            "hermes_cli.debug.urllib.request.urlopen",
40465:+        original_urlopen = ws.urllib.request.urlopen

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py
tests/hermes_cli/test_setup.py

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

62236:+   - Unsafe deserialization (pickle.loads, yaml.load without SafeLoader)

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/contributor-check.yml
.github/workflows/deploy-site.yml
.github/workflows/docker-publish.yml
.github/workflows/docs-site-checks.yml
.github/workflows/nix.yml
.github/workflows/skills-index.yml
.github/workflows/supply-chain-audit.yml
.github/workflows/tests.yml

⚠️ WARNING: Container build files modified

Changes to Dockerfiles or compose files can alter base images, add build steps, or expose ports. Verify base image pins and build commands.

Files:

.dockerignore
Dockerfile

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

package.json
pyproject.toml
scripts/whatsapp-bridge/package.json
web/package.json

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…ts (NousResearch#11745) Move moonshotai/kimi-k2.5 to position #1 in every model picker list: - OPENROUTER_MODELS (with 'recommended' tag) - _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface - _model_flow_kimi() Coding Plan model list in main.py kimi-coding-cn and moonshot lists already had kimi-k2.5 first.

Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one.

…s reached After key #1 is marked exhausted the retry still called the API with key #1 due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials. Fix: peek the pool and pass the active entry's key as explicit_api_key. Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError; extract_api_error_context parses "Resets in Xhr Ymin".

…ookies Mission-control style deploys reverse-proxy the dashboard at a path prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject X-Forwarded-Prefix: /hermes on every request. The SPA mount already honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__, but the OAuth gate didn't: 1. The gate's Location: header to /login and the 401 envelope's login_url were built bare ("/login?next=..."). Under a /hermes prefix the browser follows that to mission-control.tilos.com/login which the proxy doesn't route to the dashboard. 2. _redirect_uri (the OAuth callback URL handed to the IDP) used request.url_for() which doesn't honour X-Forwarded-Prefix (Starlette/uvicorn only proxy_headers Host + Proto + For). The IDP redirects back to /auth/callback instead of /hermes/auth/ callback → 404 in the user's browser. 3. Cookies were set with Path=/ which leaks them to other apps on the same origin and won't be sent back on requests under the prefix in the first place. Fix threads the normalised prefix through every boundary: * New hermes_cli/dashboard_auth/prefix.py — single source of truth for X-Forwarded-Prefix parsing. web_server._normalise_prefix becomes a re-export so the SPA mount, the gate, and the cookies helper all agree. * middleware._unauth_response builds login_url = f"{prefix}/login". * routes._redirect_uri splices the prefix into the path component of the IDP-bound URL (with full validation of the header). * cookies.{set,clear}_{session,pkce}_cookie now take prefix="". Path attribute switches to /hermes when set; cookie name switches name variant (see below). Every caller passes the request's normalised prefix. Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt the __Host- / __Secure- cookie name prefixes per draft-west-cookie- prefixes. The variant is selected from (use_https, prefix): * Loopback HTTP → bare "hermes_session_at" (both prefixes require Secure, incompatible with HTTP). * HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at". Strongest spec: bound to exact origin, no Domain attribute, Secure required. * HTTPS, behind a proxy prefix (Path=/hermes) → "__Secure-hermes_session_at". __Host- forbids Path != "/"; the explicit Path=/hermes covers same-origin app isolation. Setter and reader BOTH consult the prefix because the cookie *name* changes — a reader that looked up the bare name when the setter wrote __Secure- would never find the value. The reader falls back across all three variants so a request whose shape changed mid-session (e.g. post-deploy from no-prefix to /hermes) still picks up the existing cookie until it expires. Test coverage: - tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests pinning: • Location: /hermes/login on the gate's HTML redirect • 401 envelope login_url carries the prefix • Malformed X-Forwarded-Prefix is ignored (header-injection defence; the script-tag value is normalised to empty string) • _redirect_uri splices /hermes into the path (the property that prevents the IDP-returns-to-404 failure) • PKCE cookie uses Path=/hermes + __Secure- when proxied • Session cookies use __Host- when direct, __Secure- when proxied, bare on loopback HTTP • End-to-end round trip with hand-managed PKCE cookie carriage (TestClient can't simulate a Path=/hermes cookie automatically) - tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin each (use_https, prefix) shape produces its expected cookie name, plus reader-side coverage that __Host- and __Secure- variants are both recognised. - Existing tests across middleware / 401-reauth / etc. updated to match the new cookie names (substring contains instead of startswith). Mutation-tested: reverting _unauth_response to build the bare "/login" URL trips exactly the two tests that pin the prefix carriage, confirming the suite discriminates the regression.

teknium1 and others added 30 commits April 12, 2026 15:38

chore: sync uv.lock with matrix extra deps (aiosqlite, asyncpg) (Nous…

4c6ebd0

…Research#8661) These were already declared in pyproject.toml but missing from the lockfile.

fix(claw): warn and prompt if OpenClaw is still running before archiv…

76f7411

…al (fixes NousResearch#8502)

docs: add HermesClaw to community ecosystem

e8385f6

Adds a one-line entry for HermesClaw (community WeChat bridge) to the Community section. It lets users run Hermes Agent and OpenClaw on the same WeChat account.

kshitijk4poor and others added 24 commits April 14, 2026 22:21

fix(cli): restore messaging toolset for gateway platforms

da8bab7

test: add _parse_target_ref Matrix coverage for salvaged PR NousResea…

180b144

…rch#6144

fix(skills): use is None check for coordinates in find-nearby to av…

8bc9b5a

…oid dropping valid 0.0 values

fix(skills): handle missing fields in Google Workspace token file gra…

dedc460

…cefully instead of crashing with KeyError

fix(gateway): fix matrix lingering typing indicator

33ae403

feat(discord): add native send_animation for inline GIF playback

41e2d61

Linux2010 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix JSON truncation causing "Unterminated string" errors (issue #12643)#1

Fix JSON truncation causing "Unterminated string" errors (issue #12643)#1
Linux2010 wants to merge 292 commits into
mainfrom
fix/issue-12643-vllm-json-error

Linux2010 commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Linux2010 commented Apr 19, 2026

Summary

The Bug

The Fix

Test Coverage

Impact

Uh oh!

github-actions Bot commented Apr 19, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: base64 encoding/decoding detected

⚠️ WARNING: exec() or eval() usage

⚠️ WARNING: Outbound network calls (POST/PUT)

⚠️ WARNING: Install hook files modified

⚠️ WARNING: marshal/pickle/compile usage

⚠️ WARNING: CI/CD workflow files modified

⚠️ WARNING: Container build files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants