Skip to content

fix(mainnet): bind NyxIdSpecCatalog SpecFetchToken and skip refresh when absent#523

Merged
eanzhao merged 2 commits intodevfrom
fix/2026-04-29_nyxid-spec-fetch-token
Apr 29, 2026
Merged

fix(mainnet): bind NyxIdSpecCatalog SpecFetchToken and skip refresh when absent#523
eanzhao merged 2 commits intodevfrom
fix/2026-04-29_nyxid-spec-fetch-token

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 29, 2026

问题

生产 aismart-app-mainnet/aevatar-console-backend 日志中,NyxIdSpecCatalog 每 30 分钟刷新一次,每次 3 次重试全部 401,catalog 始终为空,导致:

  • nyxid_proxy LLM tool 完全不可用(任何 operation_id 查找失败)
  • nyxid_search_capabilities 走"catalog is empty" fallback 分支,本质失效

详细调研见 #522

根因

  1. NyxID 的 /api/v1/docs/openapi.jsonapi_v1_human_only 路由组,挂了 reject_service_account_tokens + reject_delegated_tokens只接受真人用户 JWT 或 API key,不再是公开端点。
  2. MainnetHostBuilderExtensions.AddNyxIdTools 调用只绑了 BaseUrl完全没有绑 SpecFetchToken;catalog 因此发不带 Authorization 的请求 → 401。
  3. NyxIdToolOptions.SpecFetchToken 的 XML doc 注释还停留在"public endpoint"假设,已经过时。
  4. 即便没有 token,catalog 也照样开 30 min 定时器去打 401,纯噪声。

改动

文件 改动
src/Aevatar.Mainnet.Host.Api/Hosting/MainnetHostBuilderExtensions.cs AddNyxIdTools 中绑定 Aevatar:NyxId:SpecFetchToken
src/Aevatar.AI.ToolProviders.NyxId/NyxIdSpecCatalog.cs 构造时若 SpecFetchToken 为空/空白,跳过 initial fetch + 定时器,记 Information 日志;FetchAndUpdateAsync 总是带 Bearer(删除冗余空判)
src/Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs 更新 SpecFetchToken XML doc,去掉 "public endpoint" 表述,明确 human-only 约束
test/Aevatar.AI.Tests/NyxIdSpecCatalogTests.cs 新增 4 个用例:无 BaseUrl / 无 Token / 空白 Token / 正常带 Token 走 Bearer

验证

dotnet build src/Aevatar.AI.ToolProviders.NyxId/Aevatar.AI.ToolProviders.NyxId.csproj  # 0 errors
dotnet build test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj                            # 0 errors
dotnet build src/Aevatar.Mainnet.Host.Api/Aevatar.Mainnet.Host.Api.csproj             # 0 errors

dotnet test test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj \
  --filter "FullyQualifiedName~NyxIdSpecCatalogTests"                                  # 4/4 passed
dotnet test test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj \
  --filter "FullyQualifiedName~NyxId|FullyQualifiedName~ConnectedService|FullyQualifiedName~OpenApiSpec"
                                                                                       # 205/205 passed

bash tools/ci/test_stability_guards.sh                                                 # passed

部署侧 follow-up(不在本 PR 范围)

合 PR 之后还需要 ops/SRE:

  1. 在 NyxID 上发一个用户级 API key(read scope 即可),落地到 mainnet 集群的 secret/configmap。
  2. 在 helm values / appsettings.*.json 里通过 Aevatar:NyxId:SpecFetchToken 把这个 key 注入 console-backend。
  3. 重启 pod 后用以下条件验证:
    • 日志里出现 NyxIdSpecCatalog updated: N operations(成功)或者 SpecFetchToken not configured; skipping...(仍未配置)
    • 不再出现 NyxIdSpecCatalog refresh attempt N/3 failed

长期方案(NyxID 侧)

human-only 这一约束让 catalog 不得不绑某个真实用户。更干净的做法是 NyxID 把 spec endpoint 移出 human_only,允许 service account token,或者出一个无鉴权的"catalog only / 不含敏感字段"的子集 spec 端点。这块留给 NyxID 团队评估,不在本 PR 范围

关联

NyxID's /api/v1/docs/openapi.json is human-only (rejects service-account
and delegated tokens), so unauthenticated fetches always return 401.
Production currently never wires SpecFetchToken, so the catalog churns
30-min 401 retries and stays empty — silently disabling nyxid_proxy and
nyxid_search_capabilities.

- Wire Aevatar:NyxId:SpecFetchToken in MainnetHostBuilderExtensions
- Skip the background refresh entirely when the token is missing
- Refresh the SpecFetchToken XML doc to drop the stale "public endpoint"
  claim
- Cover the new behavior with NyxIdSpecCatalogTests

Refs #522.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.65%. Comparing base (e8b44db) to head (78f3ef4).
⚠️ Report is 3 commits behind head on dev.

@@            Coverage Diff             @@
##              dev     #523      +/-   ##
==========================================
- Coverage   71.66%   71.65%   -0.01%     
==========================================
  Files        1240     1240              
  Lines       89709    89712       +3     
  Branches    11733    11733              
==========================================
- Hits        64287    64285       -2     
- Misses      20821    20822       +1     
- Partials     4601     4605       +4     
Flag Coverage Δ
ci 71.65% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Aevatar.AI.ToolProviders.NyxId/NyxIdSpecCatalog.cs 42.39% <100.00%> (+9.05%) ⬆️
...Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs 100.00% <ø> (ø)
...t.Host.Api/Hosting/MainnetHostBuilderExtensions.cs 95.00% <100.00%> (+0.06%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed PR #523. I left inline comments on two actionable issues.

/// as human-only (rejects service-account and delegated tokens), so this
/// must be a real user's API key or access token. When unset the catalog
/// stays empty and the background refresh is skipped — generic capability
/// discovery (<c>nyxid_search_capabilities</c>, <c>nyxid_proxy</c>) is
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该写 nyxid_proxy_execute,不是 nyxid_proxynyxid_proxy 直接按 slug/path 走代理,不依赖 NyxIdSpecCatalog;catalog 为空时真正不可用的是 operation_id 查找/执行链路(nyxid_search_capabilities -> nyxid_proxy_execute)。当前注释会误导排障,把直接代理工具也归为不可用。


using var catalog = new NyxIdSpecCatalog(options, http);

await handler.FirstRequestReceived.Task;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 await 没有超时。如果以后 constructor 回归为不触发 initial fetch,测试不会失败而是挂住 CI。建议用 await handler.FirstRequestReceived.Task.WaitAsync(TimeSpan.FromSeconds(2)); 之类的有界等待,并继续断言 header。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant