Building a Video Processing Pipeline with Ray

⏱️ Time to complete: 1 hr

In this workshop we will build a multimodal video curation pipeline with Ray Data on Anyscale. It turns raw videos into clean, semantically-annotated clip datasets in a single streaming pipeline where CPU and GPU stages run concurrently with automatic backpressure.

Pipeline

Videos are streamed directly from the HuggingFaceFV/finevideo dataset, eliminating the need for local prefetching. Each video is split on-the-fly into multiple clips, which are then streamed, processed, and written to Parquet format.

HF parquet (mp4 bytes)
    |
    +--flat_map(process_video_bytes)    # 1 video -> ~10 clips
    |     scene detect + quality filter + keyframe extraction (fused)
    |
    +--vLLMEngineProcessor              # 1:1, attaches category/is_safe/desc
    |     Qwen2.5-VL-3B, one replica per GPU
    |
    +--filter(is_safe)                  # drops unsafe rows
    |
    +--map_batches(CLIPEmbedder)        # 1:1, attaches 512-d embedding
    |     CLIP ViT-B/32 on CPU actor pool
    |
    +--write_parquet                    # /mnt/cluster_storage/...

All processing functions for each pipeline stage are defined in stages.py. The notebook brings these in and constructs the Ray Data pipeline incrementally, adding one stage at a time.

The key idea is streaming execution with heterogeneous resources. Traditional staged pipelines run one stage at a time, GPUs sit idle during CPU stages. This pipeline chains all five stages so CPU and GPU work run concurrently:

Ray Data executes each operation on the specified compute type, streams data block-by-block between operations, and applies backpressure automatically.

How to run

Open ray_data_video_curation_pipeline.ipynb in your Anyscale workspace and execute each cell sequentially from start to finish.

Running at Production Scale

This pipeline can be run as an Anyscale Job at production scale. Follow the step-by-step tutorial here:

Anyscale Job Tutorial: Streaming Video Curation

References & Resources

Ray Data Documentation: Learn about Ray Data, its features, and pipeline construction.
Ray Data LLM API Guide: Official documentation for running LLM-based data operations with Ray Data.
HuggingFaceFV/finevideo Dataset: The open video dataset used for this pipeline.
Qwen2.5-VL-3B-Instruct Model: Multimodal model for video understanding and annotation.
OpenAI CLIP ViT-B/32 Model: Used for generating high-dimensional video clip embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
ray_data_video_curation_pipeline.ipynb		ray_data_video_curation_pipeline.ipynb
stages.py		stages.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Video Processing Pipeline with Ray

Pipeline

How to run

Running at Production Scale

References & Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building a Video Processing Pipeline with Ray

Pipeline

How to run

Running at Production Scale

References & Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages