Skip to content

fix: Hermes provider symlink breaks after upgrade and bridge health can timeout during startup recovery #1744

@leavrcn

Description

@leavrcn

Bug Description | 问题描述

English

We found two related issues while using @memtensor/memos-local-plugin 2.0.4 with Hermes Agent v0.14.0:

  1. Hermes provider discovery breaks after a Hermes source-tree upgrade

    The MemOS Hermes installer links the Python provider into the Hermes source tree:

    ~/.hermes/hermes-agent/plugins/memory/memtensor -> ~/.hermes/memos-plugin/adapters/hermes/memos_provider
    

    After upgrading Hermes Agent, this symlink disappeared because it lives inside the Hermes checkout and is not part of upstream Hermes. The user config still had:

    memory:
      provider: memtensor
    plugins:
      enabled:
        - memtensor

    But Hermes could no longer discover/load the provider:

    hermes memory status
    Provider: memtensor
    Plugin: NOT installed ✗
    
    load_memory_provider('memtensor') => None
    

    Re-creating the symlink fixed discovery immediately:

    ln -s ~/.hermes/memos-plugin/adapters/hermes/memos_provider \
      ~/.hermes/hermes-agent/plugins/memory/memtensor
    cp ~/.hermes/memos-plugin/adapters/hermes/plugin.yaml \
      ~/.hermes/memos-plugin/adapters/hermes/memos_provider/plugin.yaml

    After that:

    hermes memory status
    Plugin: installed ✓
    Status: available ✓
    

    This suggests the current install location is fragile across Hermes upgrades. A more durable install path would likely be ~/.hermes/plugins/memtensor (Hermes user-installed provider path), or an official post-upgrade relink/check command.

  2. Bare bridge core.health can time out during startup dirty-episode recovery

    A direct bridge probe timed out:

    core.health did not respond within 45/60s
    session.open did not respond within 45/60s
    

    Running the bridge manually showed it got stuck before responding to core.health:

    [core.pipeline.memory-core] init.dirty_closed_episodes.rescore count=1
    

    While it was doing this startup recovery, core.health did not respond. Later DB inspection showed dirty_count 0, and initialized provider-level checks started working:

    core.health OK 0.01s
    memos_search OK
    memos_get OK
    

    This makes low-level health checks look like a hard bridge failure when the bridge is actually doing synchronous startup recovery. It would be helpful if core.health could answer before/while recovery is running, or if recovery ran asynchronously / emitted a clearer startup status.

中文

我们在 @memtensor/memos-local-plugin 2.0.4Hermes Agent v0.14.0 配合使用时发现两个相关问题:

  1. Hermes 升级后 provider discovery 失效

    当前 MemOS Hermes 安装脚本会把 Python provider 软链接到 Hermes 源码树:

    ~/.hermes/hermes-agent/plugins/memory/memtensor -> ~/.hermes/memos-plugin/adapters/hermes/memos_provider
    

    Hermes Agent 升级后,这个软链接消失了,因为它位于 Hermes checkout 内部,并不属于 Hermes 上游仓库。配置仍然保留:

    memory:
      provider: memtensor
    plugins:
      enabled:
        - memtensor

    但 Hermes 已经无法发现/加载 provider:

    hermes memory status
    Provider: memtensor
    Plugin: NOT installed ✗
    
    load_memory_provider('memtensor') => None
    

    手动重建软链接后 discovery 立即恢复:

    ln -s ~/.hermes/memos-plugin/adapters/hermes/memos_provider \
      ~/.hermes/hermes-agent/plugins/memory/memtensor
    cp ~/.hermes/memos-plugin/adapters/hermes/plugin.yaml \
      ~/.hermes/memos-plugin/adapters/hermes/memos_provider/plugin.yaml

    恢复后:

    hermes memory status
    Plugin: installed ✓
    Status: available ✓
    

    这说明当前安装位置对 Hermes 升级不够稳健。更持久的方案可能是安装到 ~/.hermes/plugins/memtensor(Hermes 用户插件路径),或者提供官方的升级后 relink/check 命令。

  2. 裸 bridge 的 core.health 在启动恢复 dirty episode 时可能超时

    直接 probe bridge 时出现:

    core.health did not respond within 45/60s
    session.open did not respond within 45/60s
    

    手动启动 bridge 后看到它在响应 core.health 前卡在:

    [core.pipeline.memory-core] init.dirty_closed_episodes.rescore count=1
    

    在这段启动恢复期间,core.health 不响应。后续 DB 检查显示 dirty_count 0,provider 级检查恢复正常:

    core.health OK 0.01s
    memos_search OK
    memos_get OK
    

    这会让低层 health check 看起来像 bridge 硬故障,但实际上 bridge 可能只是在同步执行启动恢复。建议让 core.health 能在恢复期间返回启动状态,或者把 dirty episode recovery 异步化/增加更明确的 startup 状态输出。


How to Reproduce | 如何重现

English

  1. Install MemOS Local Plugin for Hermes so the provider is linked into:

    ~/.hermes/hermes-agent/plugins/memory/memtensor
    
  2. Confirm config uses:

    memory:
      provider: memtensor
    plugins:
      enabled:
        - memtensor
  3. Upgrade Hermes Agent in a way that refreshes/replaces the Hermes source checkout.

  4. Check:

    hermes memory status
    python3 - <<'PY'
    from plugins.memory import load_memory_provider
    print(load_memory_provider('memtensor'))
    PY
  5. Provider is no longer found until the symlink is manually recreated.

For the bridge timeout part:

  1. Have at least one closed episode that matches MemOS dirty reward recovery conditions.

  2. Start the bridge directly and send a JSON-RPC core.health request.

  3. Observe that logs show:

    init.dirty_closed_episodes.rescore count=1
    

    and core.health may not respond until the synchronous recovery work finishes.

中文

  1. 为 Hermes 安装 MemOS Local Plugin,使 provider 被链接到:

    ~/.hermes/hermes-agent/plugins/memory/memtensor
    
  2. 确认配置为:

    memory:
      provider: memtensor
    plugins:
      enabled:
        - memtensor
  3. 升级 Hermes Agent,使 Hermes 源码 checkout 被刷新/替换。

  4. 检查:

    hermes memory status
    python3 - <<'PY'
    from plugins.memory import load_memory_provider
    print(load_memory_provider('memtensor'))
    PY
  5. provider 会找不到,直到手动重建软链接。

bridge timeout 部分:

  1. 准备至少一个满足 dirty reward recovery 条件的 closed episode。

  2. 直接启动 bridge 并发送 JSON-RPC core.health 请求。

  3. 日志出现:

    init.dirty_closed_episodes.rescore count=1
    

    此时 core.health 可能直到同步恢复任务完成后才响应。


Environment | 环境信息

  • OS: Linux 6.17.0-23-generic
  • Node.js: v22.22.2
  • npm: 10.9.7
  • MemOS package: @memtensor/memos-local-plugin 2.0.4
  • Hermes Agent: v0.14.0 (2026.5.16)
  • Hermes memory config: memory.provider: memtensor
  • MemOS runtime home: ~/.hermes/memos-plugin
  • MemOS DB: ~/.hermes/memos-plugin/data/memos.db

Actual Behavior | 实际表现

English

  • After Hermes upgrade, config still says memtensor, but Hermes reports the plugin is not installed.
  • load_memory_provider('memtensor') returns None.
  • Direct bridge health probes can time out while startup dirty-episode recovery is running.

中文

  • Hermes 升级后,配置仍写着 memtensor,但 Hermes 报插件未安装。
  • load_memory_provider('memtensor') 返回 None
  • bridge 启动时执行 dirty episode recovery,直接 health probe 可能超时。

Expected Behavior | 期望表现

English

  • MemOS Hermes provider install should survive Hermes Agent upgrades, or the installer should provide a durable user-plugin path / repair command.
  • core.health should remain responsive during startup recovery, or return a clear starting/recovering status instead of timing out.

中文

  • MemOS Hermes provider 的安装应能经受 Hermes Agent 升级,或者安装器应提供稳定的用户插件路径/修复命令。
  • core.health 在启动恢复期间仍应能响应,或返回明确的 starting/recovering 状态,而不是超时。

Additional Context | 其他信息

English

Manual repair and verification succeeded:

hermes memory status:
Plugin installed ✓ / Status available ✓

provider-level:
core.health OK 0.01s
memos_search OK
memos_get OK

CLI smoke:
store keyword -> recall from a fresh session -> DB traces with vec_summary is not null

A shutdown warning still appears sometimes:

MemOS: bridge process ... did not exit after stdin close, sending SIGTERM

but it did not block store/search/recall in our verification.

中文

手动修复与验证已成功:

hermes memory status:
Plugin installed ✓ / Status available ✓

provider 级:
core.health OK 0.01s
memos_search OK
memos_get OK

CLI smoke:
写入关键词 -> 新会话召回 -> DB trace 中 vec_summary is not null

有时仍会出现 shutdown 警告:

MemOS: bridge process ... did not exit after stdin close, sending SIGTERM

但在我们的验证中,它没有阻断写入/搜索/召回。


Willingness to Implement | 实现意愿

I can help test a proposed fix or provide more logs if needed.

我可以协助测试修复方案,或按需要提供更多日志。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions