Skip to content

Agentic merge to main#1883

Merged
hhaAndroid merged 16 commits into
InternLM:mainfrom
hhaAndroid:agentic_merge_to_main
Jun 8, 2026
Merged

Agentic merge to main#1883
hhaAndroid merged 16 commits into
InternLM:mainfrom
hhaAndroid:agentic_merge_to_main

Conversation

@hhaAndroid

Copy link
Copy Markdown
Collaborator

No description provided.

braisedpork1964 and others added 15 commits June 6, 2026 10:10
* add session server and trace store

* update module hierarchy

* decode arguments in case of jinja render exception

* remove split_fn argument for more compact trie

* remove tokenizer controller

* fix

* handle routed experts of string type

* fix filling value
…nternLM#1811)

* add routed api proxy

* add session server and trace store

* update module hierarchy

* decode arguments in case of jinja render exception

* remove split_fn argument for more compact trie

* remove tokenizer controller

* fix

* handle routed experts of string type

* update

* fix lint

---------

Co-authored-by: braisedpork1964 <497494458@qq.com>
* init sandbox agentic rollout execution

* update

* update

* remove unused env

* inject session id

* update

* mv rltask to sandbox agent loop
* init sandbox agentic rollout execution

* update

* update

* init sandbox agentic rollout execution

* update

* update

* remove unused env

* inject session id

* update

* update

* update

* update

* update

* update

* rename

---------

Co-authored-by: liukuikun <641417025@qq.com>
…nLM#1832)

prevent trace loss in stream mode by deferring SSL termination
* support localhost agent

* inject session id and fix comment

* update action init

* fix detached running

* fix sessionserver

* fix tool call can not be jsonload

* fix stream proxy crash on client disconnect

* support session server timeout and sandbox retry

* adapter return logprob and reveal sandbox detach

---------

Co-authored-by: braisedpork1964 <497494458@qq.com>
* Add eval-mode sandbox rollouts and trajectory logging

 - Add TB2 eval dataloader and eval AgentInSandboxLoop config
 - Disable token/logprob/routed-expert returns for eval inference
 - Preserve text-only eval responses and tokenized response length stats
 - Separate eval replay buffer from training replay buffer
 - Add regression coverage for text-only eval trajectory saves

* Add eval trajectory grouping updates

* Simplify eval rollout trace handling

* Store eval trajectories as structured artifacts

---------

Co-authored-by: liukuikun <641417025@qq.com>
* fix lint and  enable routed API proxy

* fix doc


def register_to_routedapiproxy(model_name: str, api_server_url: str) -> dict:
url = "http://s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000/v1/models/new"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地址直接放到主分支是不是不好啊

self._global_train_step += len(workers_log_item[0]["train_metrics"])


def add_apiproxy(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是也能放到misc.py中去呀?把rollout controller传进来

@hhaAndroid

Copy link
Copy Markdown
Collaborator Author

@claude review

}
states.append(normalized)
return states

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical: Hardcoded internal service URLs and credentials.

These functions hardcode a cluster-specific URL (s-20260104203038-22bhb-decode.ailab-evalservice.svc:4000) and an API key (sk-admin). This makes the library unusable outside this specific internal deployment and leaks credentials into source control.

These should be parameters (or loaded from environment variables). The same issue applies to the hardcoded routed_url in add_apiproxy in rl_trainer.py.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,下一个 pr 会整体重构

self.dist_port = base_port
self.server_port = base_port + 1
self.nccl_port = base_port + 2
self.session_server_port = base_port + 3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical: Port collision on multi-GPU nodes.

base_port is computed as dist_port_base + local_rank * 3 (line 555), allocating a stride of 3 ports per rank. But now there are 4 ports per rank (dist, server, nccl, session_server). This means:

  • local_rank 0: ports [base+0, base+1, base+2, base+3]
  • local_rank 1: ports [base+3, base+4, base+5, base+6]

local_rank 0's session_server_port collides with local_rank 1's dist_port, causing Address already in use on any node with >1 GPU.

Fix: change the stride from local_rank * 3 to local_rank * 4.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread xtuner/v1/rl/agent_loop/sandbox_agent_loop/sandbox.py
Comment thread xtuner/v1/rl/agent_loop/sandbox_agent_loop/hooks.py
import socket
import time
import typing
import urllib.error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: import requests at module top level, but requests is not a declared dependency.

requests is not listed in pyproject.toml (neither core nor [rl] optional). This will cause an ImportError in environments where requests is not installed transitively.

Either add requests to the [rl] optional dependency, or make the import lazy (inside the functions that use it).

Comment thread xtuner/v1/rl/rollout/session_server.py
Comment thread xtuner/v1/train/rl_trainer.py
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude: ## Review Summary

This PR adds agentic RL support: sandbox/localhost agent loops, a SessionServer reverse proxy for trace recording, a Trie-based TraceStore for token caching, and trainer extensions for pre-tokenized sequences.

ProduceBatchResult impact: Not structurally affected. The eval replay buffer is now separated from training, and is_valid_for_training gains additional validation for input_ids/labels/logprobs lengths (potentially filtering more samples from rollout_states). Batch accounting semantics are tightened but not changed.

RoutedExperts impact: After transfer from LMDeploy, routed experts are stored as ray.ObjectRef values in trace store Trie nodes. They are freed by _release_trace_store() in finally blocks. The trainer handles a new list[RayObjectRef] case (one per trace segment), correctly concatenating and slicing with [1:] to remove the prepended dummy. No new leak paths identified under normal operation.

Issues

Critical

File Issue
rollout/worker.py:559 Port collision — stride is local_rank * 3 but 4 ports are now needed per rank. Session server port of rank N collides with dist port of rank N+1.
rl/utils/misc.py:344 Hardcoded internal cluster URLs and credentials (sk-admin, ailab-evalservice.svc:4000). Not portable and leaks credentials into source control.
train/rl_trainer.py:1530 Hardcoded routed URL in add_apiproxy (pjh-service.org.cn). Same issue as above.

Warning

File Issue
sandbox_agent_loop/sandbox.py:1027 Shell injection — env var values interpolated into shell string without shlex.quote().
sandbox_agent_loop/hooks.py:181 Shell injection — remote_path interpolated into shell commands without escaping.
rl/utils/misc.py:7 import requests at top level but not declared in pyproject.toml dependencies.
rollout/session_server.py:221 assert used for runtime validation in request handler — not safe with -O flag.
train/rl_trainer.py:1556 _maybe_start_gateway commented out unconditionally, breaking non-agentic configs. Debug remnants (# time.sleep(1000000)) left in.
train/rl_trainer.py:1096 Typo: advatnages_valadvantages_val.
rollout/trace_store.py (end of file) if __name__ == "__main__" demo block with Chinese comments and emoji — should be in a test file or removed.
rl/evaluator.py pass@k computed as any(group[:k]) rather than the standard unbiased estimator from Chen et al. 2021. May be intentional but warrants documentation.

Nit

  • trace.py / session_server.py: Multiple assert statements that could crash Ray actors if violated — prefer explicit error handling in production paths.
  • sandbox.py:~695: Mutable default arguments (pre: list[Hook] = []).
  • sandbox.py:~1047: Fixed temp file path /tmp/_bundle.tar.gz — concurrent calls to same sandbox could race.
  • rl/utils/misc.py: Mix of requests and urllib.request — pick one for consistency.

Verdict

REQUEST_CHANGES — The port collision (guaranteed Address already in use on multi-GPU nodes) and hardcoded internal URLs/credentials must be fixed before merge.

@hhaAndroid hhaAndroid merged commit d7b7342 into InternLM:main Jun 8, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants