Skip to content

Release v1.5.3: VLA Ops Enhancements; Ray Repartition Pipeline; Scalability & Robustness

Latest

Choose a tag to compare

@cmgzn cmgzn released this 29 Jun 02:11
85e2e8e

Major Updates

📊 Stats: 14 PRs merged, from 8 contributors

📈 Code diff: 172 files changed, with 19,085 insertions and 2,144 deletions

🤖 VLA ops enhancements: Expanded embodied-AI / Vision-Language-Action processing capabilities with 10+ new and renamed operators — including new camera calibration methods (DeepCalib, DroidCalib, MoGe), atomic action segmentation, hand action computation & motion smoothing, clip reassembly, trajectory overlay, and LeRobot export — plus a complete VLA pipeline demo for ego-hand action annotation. #931

🔄 Ray repartition pipeline: A new ray_repartition_pipeline enables dataset-level block repartitioning in Ray mode, giving users fine-grained control over data distribution across workers. #985

Scalable Ray Data reads: Wired override_num_blocks through the full call chain, allowing users to control Ray Data's block parallelism via CLI — essential for processing PB-scale datasets without overwhelming the scheduler. #984

🧪 Test coverage expansion: Added 409 new test cases across 18 test files covering utils, ops, format, config, download, and pipeline DAG modules. #990

New OPs

  • export_to_lerobot_mapper: Exports processed data into the LeRobot dataset format for downstream robot learning. #931
  • video_atomic_action_segment_mapper: Segments videos into atomic actions for fine-grained action annotation. #931
  • video_camera_calibration_deepcalib_mapper (renamed from video_camera_calibration_static_deepcalib_mapper): Computes camera intrinsics and FOV using DeepCalib. #931
  • video_camera_calibration_droidcalib_mapper: Computes camera intrinsics and FOV using DroidCalib. #931
  • video_camera_calibration_moge_mapper (renamed from video_camera_calibration_static_moge_mapper): Computes camera intrinsics and FOV using MoGe-2. #931
  • video_camera_pose_megasam_mapper (renamed from video_camera_pose_mapper): Extracts camera poses using MegaSaM and MoGe-2. #931
  • video_clip_reassembly_mapper: Reassembles video clips for flexible clip-level data organization. #931
  • video_hand_action_compute_mapper: Computes hand action data from video for manipulation tasks. #931
  • video_hand_motion_smooth_mapper: Smooths hand motion trajectories for cleaner action signals. #931
  • video_trajectory_overlay_mapper: Overlays trajectory visualizations onto video frames for debugging and presentation. #931
  • ray_repartition_pipeline: A Ray-only pipeline for dataset-level block repartitioning, registered in config_all.yaml and operator docs. #985

Enhancements

  • override_num_blocks CLI argument for Ray Data: Previously implemented only at the lowest layer (read_json_stream()), this parameter is now wired through the full call chain, making it accessible via CLI for controlling block parallelism on very large datasets. #984
  • num_proc handling for vllm and Ray mode: TextTaggingByPromptMapper was unconditionally setting num_proc = 1, which broke parallelism in Ray mode. Now properly respects the configured value. #973

Fixed Bugs

  • JSONStreamDatasource schema mismatch across batches: The first batch's inferred schema was locked and reused for all subsequent batches. When an early batch inferred a field as null and a later batch introduced a concrete type (e.g., string), the forced cast failed with ArrowInvalid. Schema is now unified across batches. #972
  • OP env LATEST strategy returning unpinned version: The conflict resolution strategy incorrectly fell back to an unpinned version when the union of two conflicting specifiers contained ranges without an upper bound (e.g., numpy>=2.0 vs numpy<1.5). Now correctly resolves to a pinned version. #992
  • FUSE-safe rmtree fallback missing in PartitionedRayExecutor: PR #943 fixed shutil.rmtree() failures on FUSE-mounted OSS buckets in RayExecutor, but the same pattern was missing in ray_executor_partitioned.py. All three rmtree sites now have the fallback. #988
  • Deprecated model names in tests, demos, and docs: Replaced deprecated model names (e.g., qwen2.5-72b-instruct, qwen2.5-vl-3b-instruct) with available alternatives across test files, demo configs, and docstrings. #994

Acknowledgements

New Contributors

Full Changelog: v1.5.2...v1.5.3