Release v1.5.3: VLA Ops Enhancements; Ray Repartition Pipeline; Scalability & Robustness
LatestMajor Updates
📊 Stats: 14 PRs merged, from 8 contributors
📈 Code diff: 172 files changed, with 19,085 insertions and 2,144 deletions
🤖 VLA ops enhancements: Expanded embodied-AI / Vision-Language-Action processing capabilities with 10+ new and renamed operators — including new camera calibration methods (DeepCalib, DroidCalib, MoGe), atomic action segmentation, hand action computation & motion smoothing, clip reassembly, trajectory overlay, and LeRobot export — plus a complete VLA pipeline demo for ego-hand action annotation. #931
🔄 Ray repartition pipeline: A new ray_repartition_pipeline enables dataset-level block repartitioning in Ray mode, giving users fine-grained control over data distribution across workers. #985
⚡ Scalable Ray Data reads: Wired override_num_blocks through the full call chain, allowing users to control Ray Data's block parallelism via CLI — essential for processing PB-scale datasets without overwhelming the scheduler. #984
🧪 Test coverage expansion: Added 409 new test cases across 18 test files covering utils, ops, format, config, download, and pipeline DAG modules. #990
New OPs
export_to_lerobot_mapper: Exports processed data into the LeRobot dataset format for downstream robot learning. #931video_atomic_action_segment_mapper: Segments videos into atomic actions for fine-grained action annotation. #931video_camera_calibration_deepcalib_mapper(renamed fromvideo_camera_calibration_static_deepcalib_mapper): Computes camera intrinsics and FOV using DeepCalib. #931video_camera_calibration_droidcalib_mapper: Computes camera intrinsics and FOV using DroidCalib. #931video_camera_calibration_moge_mapper(renamed fromvideo_camera_calibration_static_moge_mapper): Computes camera intrinsics and FOV using MoGe-2. #931video_camera_pose_megasam_mapper(renamed fromvideo_camera_pose_mapper): Extracts camera poses using MegaSaM and MoGe-2. #931video_clip_reassembly_mapper: Reassembles video clips for flexible clip-level data organization. #931video_hand_action_compute_mapper: Computes hand action data from video for manipulation tasks. #931video_hand_motion_smooth_mapper: Smooths hand motion trajectories for cleaner action signals. #931video_trajectory_overlay_mapper: Overlays trajectory visualizations onto video frames for debugging and presentation. #931ray_repartition_pipeline: A Ray-only pipeline for dataset-level block repartitioning, registered inconfig_all.yamland operator docs. #985
Enhancements
override_num_blocksCLI argument for Ray Data: Previously implemented only at the lowest layer (read_json_stream()), this parameter is now wired through the full call chain, making it accessible via CLI for controlling block parallelism on very large datasets. #984num_prochandling for vllm and Ray mode:TextTaggingByPromptMapperwas unconditionally settingnum_proc = 1, which broke parallelism in Ray mode. Now properly respects the configured value. #973
Fixed Bugs
JSONStreamDatasourceschema mismatch across batches: The first batch's inferred schema was locked and reused for all subsequent batches. When an early batch inferred a field asnulland a later batch introduced a concrete type (e.g.,string), the forced cast failed withArrowInvalid. Schema is now unified across batches. #972- OP env
LATESTstrategy returning unpinned version: The conflict resolution strategy incorrectly fell back to an unpinned version when the union of two conflicting specifiers contained ranges without an upper bound (e.g.,numpy>=2.0vsnumpy<1.5). Now correctly resolves to a pinned version. #992 - FUSE-safe
rmtreefallback missing inPartitionedRayExecutor: PR #943 fixedshutil.rmtree()failures on FUSE-mounted OSS buckets inRayExecutor, but the same pattern was missing inray_executor_partitioned.py. All threermtreesites now have the fallback. #988 - Deprecated model names in tests, demos, and docs: Replaced deprecated model names (e.g.,
qwen2.5-72b-instruct,qwen2.5-vl-3b-instruct) with available alternatives across test files, demo configs, and docstrings. #994
Acknowledgements
- @macroguo-ghy contributed the new
ray_repartition_pipelinefor Ray mode. #985 - @ArdalanM fixed
num_prochandling for vllm and Ray mode. #973 - @CodyQin fixed the OP env
LATESTstrategy version resolution bug. #992 - @wenhaozhao011-cmd added FUSE-safe
rmtreefallback toPartitionedRayExecutor. #988
New Contributors
- @ArdalanM made their first contribution in #973
- @CodyQin made their first contribution in #992
- @wenhaozhao011-cmd made their first contribution in #988
Full Changelog: v1.5.2...v1.5.3