Skip to content

Add central auth pool load balancing#14

Draft
zzj3720 wants to merge 2 commits into
mainfrom
codex/auth-pool-lb
Draft

Add central auth pool load balancing#14
zzj3720 wants to merge 2 commits into
mainfrom
codex/auth-pool-lb

Conversation

@zzj3720
Copy link
Copy Markdown
Collaborator

@zzj3720 zzj3720 commented Apr 28, 2026

Summary

  • add AuthPoolService for Auth Profile selection, sticky session assignment, per-profile refresh singleflight, cooldown state, and atomic profile writes
  • add app-server chatgptAuthTokens injection plus account/chatgptAuthTokens/refresh response handling so refresh stays centralized in the broker
  • wire AUTH_POOL_LB=off|shadow|on into combined/worker runtimes; on leases a profile before a turn and serializes turns while using the single shared app-server auth state

Tests

  • pnpm test -- test/auth-pool-service.test.ts test/app-server-client.test.ts test/config.test.ts
  • pnpm build

Note: the targeted vitest invocation ran the full suite in this repo: 33 files / 168 tests passed.

@zzj3720
Copy link
Copy Markdown
Collaborator Author

zzj3720 commented May 6, 2026

重新 review 了一遍,先别 merge。两个 sticky/auth blocker:

  1. AuthPoolService.#selectProfile() 对已 assignment 的 session 只有在 profile available 时才复用;如果 profile cooldown/被删,会为同一 session 重新选 profile,破坏“同一个 session 始终同账号”。已分配 profile 不可用时应 fail/block/manual migration,而不是自动换。
  2. refreshForPreviousAccount() 只按 previousAccountId 找 profile,找不到还 fallback 重新 select。这个 refresh 是 active turn 的外部 token refresh,必须刷新当前 session/lease 绑定的 profile;account_id 在同 workspace/多份 auth.json 下不一定唯一,fallback 会切到另一个账号/credential set。建议 broker 层把 current lease/profileName 绑定到 refresh provider,refresh 不允许 fallback。

另外:如果任意端 logout 会 revoke 同一套 auth,refresh invalid_grant/401 这类失败应把 profile 标成 needs_relogin/永久隔离,直到重新上传 auth.json;现在 60s cooldown 后继续复用会反复失败。bot 最好单独登录独立 auth.json,避免人端 logout 波及 bot 池。

Validation: git diff --check origin/main...HEAD 通过;本机 pnpm test/pnpm build 卡住超时(测试进程开始拉起 broker),未得到完整结果。

Co-authored-by: Peng Xiao <pengxiao@outlook.com>
@zzj3720
Copy link
Copy Markdown
Collaborator Author

zzj3720 commented May 6, 2026

Implemented admin OAuth device-code profile login in 8cc6e28635f995b514b73b1080e198231cdc66bc.

Summary:

  • Added an admin Auth Profiles OAuth flow using Codex chatgptDeviceCode login.
  • The temporary login app-server runs with an isolated CODEX_HOME and bootstrapAuth: false, so it does not reuse or mutate the broker/global auth.
  • On successful login, the generated auth.json is imported as a managed auth profile; failed/cancelled/expired attempts clean up the temp auth directory.
  • Added API routes for start/status/cancel and UI controls in the admin Auth Profiles section.
  • Added client coverage for start/cancel device-code login.

Validation:

  • pnpm exec tsc -p tsconfig.json --noEmit --pretty false
  • targeted Vitest for the new app-server client device-code login test
  • git diff --check
  • admin inline script syntax check
  • GitHub CI Build and test is passing on this head commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants