Skip to content

Update lerobot datasource to support multiple lerobot datasets #49

@shorbaji

Description

@shorbaji

Summary

Add multi-root support to LeRobotDatasource and read_lerobot. Accept a single root or a list of roots; output looks like one large dataset with a dataset_index column identifying which root each row came from. episode_index and index retain per-root local values.

Key changes

  • root parameter widens to str | Path | list[str | Path]
  • One LeRobotDatasourceMetadata per root; validate that video_keys, fps, and feature names match across roots
  • _slice tags each range with a root_index; slicers themselves are unchanged
  • LeRobotReadTask takes segments: list[(root_idx, start, end)] + metas list
  • _read_fn loops over segments; each segment runs the existing pipeline independently
  • _build_batch appends dataset_index: int32
  • Single-root API is fully backward-compatible (dataset_index is always present, 0 for single-root)

Out of scope

No stats merging, no episode/row offset remapping, no new classes.

Acceptance criteria

  • read_lerobot(["/data/ds1", "/data/ds2"]) returns a single ray.data.Dataset
  • Each row has a dataset_index (int32) column
  • Single-root API unchanged
  • Tests cover multi-dataset round-trip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions