Skip to content

[ReplayBuffer] add ReplayBuffer with various StorageBackend#1490

Merged
YanhuiDua merged 3 commits intoInternLM:rl_designfrom
YanhuiDua:dyh/rl_design
Feb 25, 2026
Merged

[ReplayBuffer] add ReplayBuffer with various StorageBackend#1490
YanhuiDua merged 3 commits intoInternLM:rl_designfrom
YanhuiDua:dyh/rl_design

Conversation

@YanhuiDua
Copy link
Copy Markdown
Collaborator

@YanhuiDua YanhuiDua commented Feb 11, 2026

ReplayBuffer 设计说明

StorageIndices

数据索引类,给不同的后端支持统一的索引的方法

@dataclass
class StorageIndices:
    # 为不同存储后段提供统一的索引接口,目前只会用到task_name和group_status,其他的字段后续再扩展
    task_name: str | None = None
    group_status: Status | None = None
    tags: dict = field(default_factory=dict)  # 如果非等于的条件则使用 scores_gt > 0.8

Storage

抽象的存储后端,支持不同类型的存储系统,例如python原生list, pandas, SQL 等;目前只能用到 NaiveStorage,但是提供了PandasStorageBackendSQLStorageBackend的伪代码作为参考;

class Storage(ABC):
    @abstractmethod
    async def put(self, items: list[RolloutState], storage_indices: StorageIndices): ...
    @abstractmethod
    async def get(self, count: int, storage_indices: StorageIndices) -> list[RolloutState]: ...
    @abstractmethod
    async def __len__(self): ...

在此基础上,通过FIFOBackend, StalenessBackend 定义数据怎么放 + 数据怎么取;例如:FIFOBackend 定义数据为先入先出,StalenessBackend 定义数据按照新鲜度,新鲜度越旧(数值越大)的数据先出队

ReplayBufffer

class ReplayBuffer:
    def __init__(self, storage_backend: Storage = None):
        self._storage = FIFOBackend()  if storage_backend is None else storage_backend
        self._lock = asyncio.Lock()
    async def put(self, items: list[RolloutState], task_name: str, **kwargs):..
    async def get(self, batch_size: int, task_name: str, group_status: Status) -> list[RolloutState]:...

…aleness, or Database(implement in the future)
@YanhuiDua YanhuiDua changed the title [ReplayBuffer] add ReplayBuffer with various StorageBackend: FIFO, St… [ReplayBuffer] add ReplayBuffer with various StorageBackend Feb 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new ReplayBuffer abstraction in xtuner/v1/rl/base with pluggable storage backends (e.g., FIFO and staleness-priority), plus initial unit tests for basic FIFO and staleness ordering behavior.

Changes:

  • Added ReplayBuffer, StorageBackend interface, and multiple backend implementations (FIFOStorageBackend, StalenessStorageBackend, plus stub/pseudocode backends).
  • Implemented StorageIndices to partition storage by (task_name, group_status, tags).
  • Added async unit tests covering FIFO behavior, staleness priority order, and multi-task isolation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.

File Description
xtuner/v1/rl/base/replay_buffer.py Adds the replay buffer API and backend implementations (FIFO + staleness), with placeholder backends for future extensions.
tests/ray/test_replay_buffer.py Adds async unit tests validating basic replay buffer behavior for FIFO and staleness backends.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xtuner/v1/rl/base/replay_buffer.py
Comment thread xtuner/v1/rl/base/replay_buffer.py Outdated
Comment thread xtuner/v1/rl/base/replay_buffer.py Outdated
Comment thread xtuner/v1/rl/base/replay_buffer.py Outdated
Comment thread xtuner/v1/rl/base/replay_buffer.py Outdated
Comment thread xtuner/v1/rl/base/replay_buffer.py
Comment thread xtuner/v1/rl/base/replay_buffer.py
Comment thread xtuner/v1/rl/base/replay_buffer.py
Comment thread xtuner/v1/rl/base/replay_buffer.py
Comment thread xtuner/v1/rl/base/replay_buffer.py
async with self._lock:
await self._storage.put(items, indices)

async def get(self, batch_size: int, task_name: str, group_status: Status, **kwargs) -> list[RolloutState]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_state 我们可以自己基于 items 算出来吗?我感觉让用户自己决定可能比较难,如果这是一套可以固定下来的逻辑,是不是我们自己处理就行。

@YanhuiDua YanhuiDua merged commit 366723b into InternLM:rl_design Feb 25, 2026
0 of 3 checks passed
return results


class FIFOBackend(NaiveStorage):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO/Staleness的逻辑与SQL/Pandas/Naive的逻辑并列,不是继承关系,应该解耦开

YanhuiDua added a commit that referenced this pull request Apr 27, 2026
* [ReplayBuffer] add ReplayBuffer with various StorageBackend: FIFO, Staleness, or Database(implement in the future)

* [ReplayBuffer] optimize implementation of ReplayBuffer

* fix comments: add NaiveStorage and take fifo/staleness as policy for getting item
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants