Skip to content

fix(server): different graph name share the same backend#3027

Open
JisoLya wants to merge 3 commits into
apache:masterfrom
JisoLya:fix-parse-err
Open

fix(server): different graph name share the same backend#3027
JisoLya wants to merge 3 commits into
apache:masterfrom
JisoLya:fix-parse-err

Conversation

@JisoLya
Copy link
Copy Markdown
Contributor

@JisoLya JisoLya commented May 15, 2026

Purpose of the PR

During backend initialization, the server creates two graphs — ~sys_graph and hugegraph (default) — that share the same store backend. This leads to a key collision:

  • ~sys_graph writes server node info with id server-1
  • ~sys_graph reads back server-1 successfully
  • hugegraph writes server node info with the same id server-1
  • hugegraph reads from the shared table — retrieves ~sys_graph's record instead → type mismatch → parse error

Main Changes

Change ServerInfoManager.selfNodeId() which returns "server-1" previously to "{graphname}/server-1"

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • xxx

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修复方向正确,用图命名空间前缀隔离 server ID 能解决跨图 key 碰撞问题。但存在一个阻塞性逻辑缺陷 + 测试缺失,建议修复后再合并。


‼️ 测试覆盖不足(阻塞)

PR 勾选了"已有测试覆盖",但全量搜索 hugegraph-server/hugegraph-test/ 后,没有任何测试覆盖"两图共享同一 backend + 相同 nodeId"的场景。

建议补充回归测试,至少验证:

  1. ~sys_graphhugegraph 使用相同 nodeId(如 "server-1")初始化时,selfNodeId() 返回值互不相同
  2. 两个图能并发启动,不抛出 parse error 或 "already in cluster" 异常

⚠️ 升级兼容性说明缺失

存量部署 backend 中的 ServerInfo vertex key 格式为 server-1;升级后代码改为以 DEFAULT-hugegraph/server-1 查询,旧记录查不到便重新写入新记录,旧的 server-1 记录以孤儿形式留在 backend 中不被清理。对功能影响有限,但建议在 PR 描述或 Release Notes 中注明,避免运维困惑。


🧹 小建议:spaceGraphName 字符集

spaceGraphName() 的值由用户配置拼接而来,没有对 / 字符的校验。若用户将 graphSpace 配置为含 / 的字符串,生成的 ID 会出现多个 /,可能引发后续解析歧义。建议加一行断言或在文档中说明 graphSpace 不允许包含 /

// Scope server id to graph to avoid cross-graph collision
return IdGenerator.of(this.graph.spaceGraphName() + "/" +
this.globalNodeInfo.nodeId().asString());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Critical:initServerInfo() 中读写路径 ID 不一致(阻塞性问题)

此处修改后 selfNodeId() 返回命名空间化的 ID(如 DEFAULT-hugegraph/server-1),但调用方 initServerInfo() 的存在性检查仍使用原始裸 ID,两条路径的 key 不同:

// initServerInfo()(本 PR 未修改这部分)
Id serverId = nodeInfo.nodeId();                        // ❌ 裸 ID:"server-1"
HugeServerInfo existed = this.serverInfo(serverId);    // ❌ 用 "server-1" 查询
// ...
this.globalNodeInfo = nodeInfo;                        // line 140,赋值后 selfNodeId() 才变
this.saveServerInfo(this.selfNodeId(), ...);            // ✅ 保存 "DEFAULT-hugegraph/server-1"

后果:

  • lines 109–122 的 alive 检查(防止同名节点重复启动的保护)永远查不到已命名空间化的旧记录,保护逻辑被静默跳过
  • 重启时旧的命名空间记录不会被正确识别,会直接写入新记录,旧记录成为孤儿数据积累在 backend 中

建议修法:initServerInfo() 入口先设置 globalNodeInfo,再统一用 selfNodeId() 做后续查询,保持读写 key 一致:

public synchronized void initServerInfo(GlobalMasterInfo nodeInfo) {
    E.checkArgument(nodeInfo != null, "The global node info can't be null");
    this.globalNodeInfo = nodeInfo;   // 先赋值,selfNodeId() 立即返回命名空间 ID
    Id serverId = this.selfNodeId();  // ✅ 与 save / remove 路径使用同一个 key
    HugeServerInfo existed = this.serverInfo(serverId);
    if (existed != null && existed.alive()) {
        // ... 存活检查 & 等待逻辑不变 ...
    }
    E.checkArgument(existed == null || !existed.alive(),
                    "The server with name '%s' already in cluster", serverId);
    // master 唯一性检查不变 ...
    this.saveServerInfo(serverId, this.selfNodeRole());
}

这样读取和写入始终使用同一个 ID,alive 检查和 master 唯一性保护才能真正生效。

@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.25%. Comparing base (e108076) to head (b4ede79).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3027       +/-   ##
=============================================
+ Coverage     35.85%   93.25%   +57.40%     
+ Complexity      338       65      -273     
=============================================
  Files           802        9      -793     
  Lines         67995      267    -67728     
  Branches       8902       22     -8880     
=============================================
- Hits          24381      249    -24132     
+ Misses        41008        8    -41000     
+ Partials       2606       10     -2596     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hugegraph-server 启动不正常,查看日志有报错

2 participants