Agentic merge to main by hhaAndroid · Pull Request #1883 · InternLM/xtuner

hhaAndroid · 2026-06-06T10:20:11Z

No description provided.

* add session server and trace store * update module hierarchy * decode arguments in case of jinja render exception * remove split_fn argument for more compact trie * remove tokenizer controller * fix * handle routed experts of string type * fix filling value

…nternLM#1811) * add routed api proxy * add session server and trace store * update module hierarchy * decode arguments in case of jinja render exception * remove split_fn argument for more compact trie * remove tokenizer controller * fix * handle routed experts of string type * update * fix lint --------- Co-authored-by: braisedpork1964 <497494458@qq.com>

…LM#1820) fix writing eof ahead of the response hook

* init sandbox agentic rollout execution * update * update * remove unused env * inject session id * update * mv rltask to sandbox agent loop

* init sandbox agentic rollout execution * update * update * init sandbox agentic rollout execution * update * update * remove unused env * inject session id * update * update * update * update * update * update * rename --------- Co-authored-by: liukuikun <641417025@qq.com>

…nLM#1832) prevent trace loss in stream mode by deferring SSL termination

* support localhost agent * inject session id and fix comment * update action init * fix detached running * fix sessionserver * fix tool call can not be jsonload * fix stream proxy crash on client disconnect * support session server timeout and sandbox retry * adapter return logprob and reveal sandbox detach --------- Co-authored-by: braisedpork1964 <497494458@qq.com>

* Add eval-mode sandbox rollouts and trajectory logging - Add TB2 eval dataloader and eval AgentInSandboxLoop config - Disable token/logprob/routed-expert returns for eval inference - Preserve text-only eval responses and tokenized response length stats - Separate eval replay buffer from training replay buffer - Add regression coverage for text-only eval trajectory saves * Add eval trajectory grouping updates * Simplify eval rollout trace handling * Store eval trajectories as structured artifacts --------- Co-authored-by: liukuikun <641417025@qq.com>

* fix lint and enable routed API proxy * fix doc

YanhuiDua · 2026-06-08T03:15:09Z

+
+
+def register_to_routedapiproxy(model_name: str, api_server_url: str) -> dict:
+    url = "http://s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000/v1/models/new"


这个地址直接放到主分支是不是不好啊

YanhuiDua · 2026-06-08T03:24:08Z

        self._global_train_step += len(workers_log_item[0]["train_metrics"])


+def add_apiproxy(self):


这个是不是也能放到misc.py中去呀？把rollout controller传进来

hhaAndroid · 2026-06-08T03:42:17Z

@claude review

claude · 2026-06-08T03:54:04Z

        }
        states.append(normalized)
    return states
+


Claude: Critical: Hardcoded internal service URLs and credentials.

These functions hardcode a cluster-specific URL (s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000) and an API key (sk-admin). This makes the library unusable outside this specific internal deployment and leaks credentials into source control.

These should be parameters (or loaded from environment variables). The same issue applies to the hardcoded routed_url in add_apiproxy in rl_trainer.py.

好的，下一个 pr 会整体重构

claude · 2026-06-08T03:54:53Z

        self.dist_port = base_port
        self.server_port = base_port + 1
        self.nccl_port = base_port + 2
+        self.session_server_port = base_port + 3


Claude: Critical: Port collision on multi-GPU nodes.

base_port is computed as dist_port_base + local_rank * 3 (line 555), allocating a stride of 3 ports per rank. But now there are 4 ports per rank (dist, server, nccl, session_server). This means:

local_rank 0: ports [base+0, base+1, base+2, base+3]

local_rank 1: ports [base+3, base+4, base+5, base+6]

local_rank 0's session_server_port collides with local_rank 1's dist_port, causing Address already in use on any node with >1 GPU.

Fix: change the stride from local_rank * 3 to local_rank * 4.

claude · 2026-06-08T03:55:38Z

 import socket
+import time
 import typing
+import urllib.error


Claude: Warning: import requests at module top level, but requests is not a declared dependency.

requests is not listed in pyproject.toml (neither core nor [rl] optional). This will cause an ImportError in environments where requests is not installed transitively.

Either add requests to the [rl] optional dependency, or make the import lazy (inside the functions that use it).

claude · 2026-06-08T03:56:49Z

Claude: ## Review Summary

This PR adds agentic RL support: sandbox/localhost agent loops, a SessionServer reverse proxy for trace recording, a Trie-based TraceStore for token caching, and trainer extensions for pre-tokenized sequences.

ProduceBatchResult impact: Not structurally affected. The eval replay buffer is now separated from training, and is_valid_for_training gains additional validation for input_ids/labels/logprobs lengths (potentially filtering more samples from rollout_states). Batch accounting semantics are tightened but not changed.

RoutedExperts impact: After transfer from LMDeploy, routed experts are stored as ray.ObjectRef values in trace store Trie nodes. They are freed by _release_trace_store() in finally blocks. The trainer handles a new list[RayObjectRef] case (one per trace segment), correctly concatenating and slicing with [1:] to remove the prepended dummy. No new leak paths identified under normal operation.

Issues

Critical

File	Issue
`rollout/worker.py:559`	Port collision — stride is `local_rank * 3` but 4 ports are now needed per rank. Session server port of rank N collides with dist port of rank N+1.
`rl/utils/misc.py:344`	Hardcoded internal cluster URLs and credentials (`sk-admin`, `ailab-evalservice.svc:4000`). Not portable and leaks credentials into source control.
`train/rl_trainer.py:1530`	Hardcoded routed URL in `add_apiproxy` (`pjh-service.org.cn`). Same issue as above.

Warning

File	Issue
`sandbox_agent_loop/sandbox.py:1027`	Shell injection — env var values interpolated into shell string without `shlex.quote()`.
`sandbox_agent_loop/hooks.py:181`	Shell injection — `remote_path` interpolated into shell commands without escaping.
`rl/utils/misc.py:7`	`import requests` at top level but not declared in `pyproject.toml` dependencies.
`rollout/session_server.py:221`	`assert` used for runtime validation in request handler — not safe with `-O` flag.
`train/rl_trainer.py:1556`	`_maybe_start_gateway` commented out unconditionally, breaking non-agentic configs. Debug remnants (`# time.sleep(1000000)`) left in.
`train/rl_trainer.py:1096`	Typo: `advatnages_val` → `advantages_val`.
`rollout/trace_store.py` (end of file)	`if __name__ == "__main__"` demo block with Chinese comments and emoji — should be in a test file or removed.
`rl/evaluator.py`	`pass@k` computed as `any(group[:k])` rather than the standard unbiased estimator from Chen et al. 2021. May be intentional but warrants documentation.

Nit

trace.py / session_server.py: Multiple assert statements that could crash Ray actors if violated — prefer explicit error handling in production paths.
sandbox.py:~695: Mutable default arguments (pre: list[Hook] = []).
sandbox.py:~1047: Fixed temp file path /tmp/_bundle.tar.gz — concurrent calls to same sandbox could race.
rl/utils/misc.py: Mix of requests and urllib.request — pick one for consistency.

Verdict

REQUEST_CHANGES — The port collision (guaranteed Address already in use on multi-GPU nodes) and hardcoded internal URLs/credentials must be fixed before merge.

braisedpork1964 and others added 15 commits June 6, 2026 10:10

Fix writing EOF ahead of the response hook for streaming call (Intern…

c1acde2

…LM#1820) fix writing eof ahead of the response hook

init sandbox agentic rollout execution (InternLM#1819)

f554062

* init sandbox agentic rollout execution * update * update * remove unused env * inject session id * update * mv rltask to sandbox agent loop

Prevent trace loss in stream mode by deferring SSL termination (Inter…

b49ac36

…nLM#1832) prevent trace loss in stream mode by deferring SSL termination

remove recipe (InternLM#1846)

8bb8a7f

add more info in sandbox (InternLM#1871)

3daa85c

sandbox_creates_per_sec (InternLM#1874)

d0f2c8c

fix lint and enable routed API proxy (InternLM#1882)

c8512e9

* fix lint and enable routed API proxy * fix doc

fix lint

d63fdb1

refine abort and timeout

c885e31

revert

f771687

YanhuiDua reviewed Jun 8, 2026

View reviewed changes

YanhuiDua approved these changes Jun 8, 2026

View reviewed changes

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/sandbox_agent_loop/sandbox.py

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/sandbox_agent_loop/hooks.py

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread xtuner/v1/rl/rollout/session_server.py

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread xtuner/v1/train/rl_trainer.py

fix

24e874c

hhaAndroid merged commit d7b7342 into InternLM:main Jun 8, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic merge to main#1883

Agentic merge to main#1883
hhaAndroid merged 16 commits into
InternLM:mainfrom
hhaAndroid:agentic_merge_to_main

hhaAndroid commented Jun 6, 2026

Uh oh!

YanhuiDua Jun 8, 2026

Uh oh!

YanhuiDua Jun 8, 2026

Uh oh!

hhaAndroid commented Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

hhaAndroid Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

hhaAndroid Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def register_to_routedapiproxy(model_name: str, api_server_url: str) -> dict:
		url = "http://s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000/v1/models/new"

		self._global_train_step += len(workers_log_item[0]["train_metrics"])


		def add_apiproxy(self):

Conversation

hhaAndroid commented Jun 6, 2026

Uh oh!

YanhuiDua Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

YanhuiDua Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

hhaAndroid commented Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

hhaAndroid Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

hhaAndroid Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 8, 2026

Issues

Critical

Warning

Nit

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants