Release v1.2.0

github-actions released this 05 Mar 03:40

· 243 commits to main since this release

619401a

Added

Composable pipeline API via CurationPhase and PipelineBuilder for declarative pipeline construction
OpenAI-compatible API captioning stage for using external LLM endpoints
LazyData for zero-copy split-field pipeline transport, reducing memory overhead
Automatic CPU and memory profiling for all pipeline stages
Stage replay for re-running individual stages without full pipeline re-execution
Unified write abstraction for local and remote storage
Multi-camera splitting pipeline (data model, task creation, download/remux, frame extraction, clip transcoding, clip
writer, and summary writer)
ARM64 CLI and container build support
GB200 support for loading Qwen3-VL-235B
Optional Ray token authentication
Upgrade vLLM to 0.15.1
Upgrade cosmos-xenna to 0.2.0
Upgrade ffmpeg to 8.0.1
QwenVideoClassifier stage for video classification using Qwen VL
Remove flash-attn dependency in favor of PyTorch SDPA

Fixed

Critical: fix caption ordering bug in inflight batching. When inflight batching was enabled (the default),
captions could be assigned to the wrong videos. The bug was introduced in v1.1.5, was dormant in v1.1.6 (inflight
batching temporarily removed), and has been active in v1.1.7–v1.1.11. If you used VLM captioning with any of those
releases, captions may be mismatched. Upgrade to v1.2.0 and re-run affected captioning jobs.
Enforce exact --limit semantics for storage listings and add num_input_videos_selected metric
Reset LazyData.nbytes on drop and eliminate tobytes copy in upload path
Update conda environment name from vllm to unified in Qwen filter stages
Harden NVCF split benchmark retries and count validation
Resolve Docker build failures from NVIDIA wheel timeouts and file permissions
Check for remote mounts in curator_submit
Handle clips with no stream
Pin setuptools<81 to preserve pkg_resources for ngcsdk
Add minimum version constraints for typer dependency
Ensure split_video_into_windows returns equal-length lists

Documentation

Add Ray Data runner design document
Update end user guide

Assets 2