Agentic merge to main#1883
Conversation
* add session server and trace store * update module hierarchy * decode arguments in case of jinja render exception * remove split_fn argument for more compact trie * remove tokenizer controller * fix * handle routed experts of string type * fix filling value
…nternLM#1811) * add routed api proxy * add session server and trace store * update module hierarchy * decode arguments in case of jinja render exception * remove split_fn argument for more compact trie * remove tokenizer controller * fix * handle routed experts of string type * update * fix lint --------- Co-authored-by: braisedpork1964 <497494458@qq.com>
…LM#1820) fix writing eof ahead of the response hook
* init sandbox agentic rollout execution * update * update * remove unused env * inject session id * update * mv rltask to sandbox agent loop
* init sandbox agentic rollout execution * update * update * init sandbox agentic rollout execution * update * update * remove unused env * inject session id * update * update * update * update * update * update * rename --------- Co-authored-by: liukuikun <641417025@qq.com>
…nLM#1832) prevent trace loss in stream mode by deferring SSL termination
* support localhost agent * inject session id and fix comment * update action init * fix detached running * fix sessionserver * fix tool call can not be jsonload * fix stream proxy crash on client disconnect * support session server timeout and sandbox retry * adapter return logprob and reveal sandbox detach --------- Co-authored-by: braisedpork1964 <497494458@qq.com>
* Add eval-mode sandbox rollouts and trajectory logging - Add TB2 eval dataloader and eval AgentInSandboxLoop config - Disable token/logprob/routed-expert returns for eval inference - Preserve text-only eval responses and tokenized response length stats - Separate eval replay buffer from training replay buffer - Add regression coverage for text-only eval trajectory saves * Add eval trajectory grouping updates * Simplify eval rollout trace handling * Store eval trajectories as structured artifacts --------- Co-authored-by: liukuikun <641417025@qq.com>
* fix lint and enable routed API proxy * fix doc
|
|
||
|
|
||
| def register_to_routedapiproxy(model_name: str, api_server_url: str) -> dict: | ||
| url = "http://s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000/v1/models/new" |
| self._global_train_step += len(workers_log_item[0]["train_metrics"]) | ||
|
|
||
|
|
||
| def add_apiproxy(self): |
There was a problem hiding this comment.
这个是不是也能放到misc.py中去呀?把rollout controller传进来
|
@claude review |
| } | ||
| states.append(normalized) | ||
| return states | ||
|
|
There was a problem hiding this comment.
Claude: Critical: Hardcoded internal service URLs and credentials.
These functions hardcode a cluster-specific URL (s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000) and an API key (sk-admin). This makes the library unusable outside this specific internal deployment and leaks credentials into source control.
These should be parameters (or loaded from environment variables). The same issue applies to the hardcoded routed_url in add_apiproxy in rl_trainer.py.
| self.dist_port = base_port | ||
| self.server_port = base_port + 1 | ||
| self.nccl_port = base_port + 2 | ||
| self.session_server_port = base_port + 3 |
There was a problem hiding this comment.
Claude: Critical: Port collision on multi-GPU nodes.
base_port is computed as dist_port_base + local_rank * 3 (line 555), allocating a stride of 3 ports per rank. But now there are 4 ports per rank (dist, server, nccl, session_server). This means:
- local_rank 0: ports [base+0, base+1, base+2, base+3]
- local_rank 1: ports [base+3, base+4, base+5, base+6]
local_rank 0's session_server_port collides with local_rank 1's dist_port, causing Address already in use on any node with >1 GPU.
Fix: change the stride from local_rank * 3 to local_rank * 4.
| import socket | ||
| import time | ||
| import typing | ||
| import urllib.error |
There was a problem hiding this comment.
Claude: Warning: import requests at module top level, but requests is not a declared dependency.
requests is not listed in pyproject.toml (neither core nor [rl] optional). This will cause an ImportError in environments where requests is not installed transitively.
Either add requests to the [rl] optional dependency, or make the import lazy (inside the functions that use it).
|
Claude: ## Review Summary This PR adds agentic RL support: sandbox/localhost agent loops, a SessionServer reverse proxy for trace recording, a Trie-based TraceStore for token caching, and trainer extensions for pre-tokenized sequences. ProduceBatchResult impact: Not structurally affected. The eval replay buffer is now separated from training, and RoutedExperts impact: After transfer from LMDeploy, routed experts are stored as IssuesCritical
Warning
Nit
VerdictREQUEST_CHANGES — The port collision (guaranteed |
No description provided.