feat: add script to segment lerobot dataset#127
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a new script to segment a LeRobot dataset episode into multiple smaller episodes in a new output dataset. The implementation supports both v2.0 and v2.1 input formats and always outputs v2.1 format.
Changes:
- Added
src/opentau/scripts/segment_lerobot_dataset.pyscript to segment episodes by frame ranges - Added comprehensive test suite in
tests/datasets/test_segment_lerobot_dataset.pywith three test cases covering v2.1 input, v2.0->v2.1 conversion, and edge cases
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/opentau/scripts/segment_lerobot_dataset.py | Main segmentation script with CLI argument parsing, dataset validation, parquet data slicing, metadata creation, and video/task handling |
| tests/datasets/test_segment_lerobot_dataset.py | Test suite with three comprehensive tests validating segmentation logic, version conversion, and edge cases like overlapping/non-consecutive segments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WilliamYue37
left a comment
There was a problem hiding this comment.
Code looks good to me, but I think it could be better to have the user input a file with the segment start:end. For long recordings, adding an extra argument on the command line can get tedious/overwhelming.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WilliamYue37
left a comment
There was a problem hiding this comment.
Maybe we should have a segments.json example file in the examples folder.
What this does
Add functionality to segment one or more lerobot episodes into multiple episodes in a new dataset.
How it was tested
where
/tmp/segments.jsonreads{"0": [[5, 15], [10, 20]], "1": [[0, 30]]}How to checkout & try? (for the reviewer)
Run CPU tests and
where
/tmp/segments.jsonreads{"0": [[5, 15], [10, 20]], "1": [[0, 30]]}Checklist