[ReplayBuffer] add ReplayBuffer with various StorageBackend#1490
Merged
YanhuiDua merged 3 commits intoInternLM:rl_designfrom Feb 25, 2026
Merged
[ReplayBuffer] add ReplayBuffer with various StorageBackend#1490YanhuiDua merged 3 commits intoInternLM:rl_designfrom
YanhuiDua merged 3 commits intoInternLM:rl_designfrom
Conversation
…aleness, or Database(implement in the future)
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a new ReplayBuffer abstraction in xtuner/v1/rl/base with pluggable storage backends (e.g., FIFO and staleness-priority), plus initial unit tests for basic FIFO and staleness ordering behavior.
Changes:
- Added
ReplayBuffer,StorageBackendinterface, and multiple backend implementations (FIFOStorageBackend,StalenessStorageBackend, plus stub/pseudocode backends). - Implemented
StorageIndicesto partition storage by(task_name, group_status, tags). - Added async unit tests covering FIFO behavior, staleness priority order, and multi-task isolation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
xtuner/v1/rl/base/replay_buffer.py |
Adds the replay buffer API and backend implementations (FIFO + staleness), with placeholder backends for future extensions. |
tests/ray/test_replay_buffer.py |
Adds async unit tests validating basic replay buffer behavior for FIFO and staleness backends. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hhaAndroid
approved these changes
Feb 25, 2026
| async with self._lock: | ||
| await self._storage.put(items, indices) | ||
|
|
||
| async def get(self, batch_size: int, task_name: str, group_status: Status, **kwargs) -> list[RolloutState]: |
Collaborator
There was a problem hiding this comment.
group_state 我们可以自己基于 items 算出来吗?我感觉让用户自己决定可能比较难,如果这是一套可以固定下来的逻辑,是不是我们自己处理就行。
hhaAndroid
approved these changes
Feb 25, 2026
jayhenry
reviewed
Feb 26, 2026
| return results | ||
|
|
||
|
|
||
| class FIFOBackend(NaiveStorage): |
Collaborator
There was a problem hiding this comment.
FIFO/Staleness的逻辑与SQL/Pandas/Naive的逻辑并列,不是继承关系,应该解耦开
YanhuiDua
added a commit
that referenced
this pull request
Apr 27, 2026
* [ReplayBuffer] add ReplayBuffer with various StorageBackend: FIFO, Staleness, or Database(implement in the future) * [ReplayBuffer] optimize implementation of ReplayBuffer * fix comments: add NaiveStorage and take fifo/staleness as policy for getting item
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ReplayBuffer 设计说明
StorageIndices
数据索引类,给不同的后端支持统一的索引的方法
Storage
抽象的存储后端,支持不同类型的存储系统,例如python原生list, pandas, SQL 等;目前只能用到
NaiveStorage,但是提供了PandasStorageBackend,SQLStorageBackend的伪代码作为参考;在此基础上,通过
FIFOBackend,StalenessBackend定义数据怎么放 + 数据怎么取;例如:FIFOBackend定义数据为先入先出,StalenessBackend定义数据按照新鲜度,新鲜度越旧(数值越大)的数据先出队ReplayBufffer