# Video Pipeline Tutorial with NeMo Curator

This notebook demonstrates how to use NeMo Curator's video curation pipeline to process videos, extract clips, generate embeddings, and create captions.

## Table of Contents
1. [Installation and Setup](#installation-and-setup)
2. [Understanding the Video Pipeline](#understanding-the-video-pipeline)
3. [Basic Example: Reading Videos](#basic-example-reading-videos)
4. [Advanced Example: Complete Video Processing](#advanced-example-complete-video-processing)
5. [Pipeline Parameters Explained](#pipeline-parameters-explained)
6. [Troubleshooting](#troubleshooting)

---


## Installation and Setup

### Prerequisites

Before running the video pipeline, ensure you have:

- **NVIDIA GPU** with Volta™ or higher (compute capability 7.0+)
- **CUDA 12 or above**
- **FFmpeg 7+** (will be installed using the provided script)

### System Requirements

- **Memory**: 16GB+ RAM for basic processing
- **GPU Memory**: 16GB+ VRAM recommended (up to 38GB for full pipeline with captions)
- **Storage**: Sufficient space for input videos and output clips

### Installation Steps

1. **Install FFmpeg:**
First, install FFmpeg using the provided installation script:
```bash
# Download and run the FFmpeg installation script
curl -O https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh
chmod +x install_ffmpeg.sh
./install_ffmpeg.sh
```

2. **Install UV (if not already installed):**
UV is a fast Python package installer and resolver that's significantly faster than pip:
```bash
# Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or on Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

3. **Create and activate a virtual environment with UV:**
```bash
uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
```

4. **Install NeMo Curator with video support using UV:**
```bash
uv pip install "nemo-curator[video,video_cuda]"
```

5. **Verify installation:**
```bash
python -c "import nemo_curator; print('Installation successful!')"
```

### Download Required Models

The video pipeline requires several pre-trained models (e.g. [Cosmos Embed](https://huggingface.co/nvidia/Cosmos-Embed1-448p)). Models will be downloaded automatically based on the selected stages.


## Understanding the Video Pipeline

NeMo Curator's video pipeline is built on a **stage-based architecture** where each stage performs a specific processing step:

### Core Components

1. **Pipelines**: Ordered sequences of stages forming an end-to-end workflow
2. **Stages**: Individual processing units that perform single steps
3. **Tasks**: Data units that flow through the pipeline (`VideoTask` containing `Video` and `Clip` objects)
4. **Executors**: Components that run pipelines on distributed backends (Ray)

### Pipeline Stages

The video pipeline includes these stages (all optional - choose based on your needs):

1. **VideoReader**: Reads video files and extracts metadata
2. **Splitting Algorithm**: 
   - **Fixed Stride**: Splits videos into fixed-length clips
   - **TransNetV2**: Uses AI to detect scene transitions for intelligent splitting ([GitHub](https://github.com/soCzech/TransNetV2))
3. **ClipTranscodingStage**: Converts clips to standardized format
4. **MotionFilterStage**: Filters clips based on motion content
5. **ClipAestheticFilterStage**: Filters clips based on aesthetic quality using [CLIP](https://openai.com/research/clip) model
6. **Embedding Generation**: Creates vector embeddings for similarity search
   - **Cosmos-Embed1**: NVIDIA's state-of-the-art video embedding model (224p, 336p, 448p variants) ([Hugging Face](https://huggingface.co/nvidia/Cosmos-Embed1-448p))
   - **InternVideo2**: Advanced video understanding model for comprehensive embeddings ([GitHub](https://github.com/OpenGVLab/InternVideo2))
7. **Caption Generation**: Generates text descriptions of video content using [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL) model
8. **Caption Enhancement**: Improves and refines generated captions using [Qwen-LM](https://huggingface.co/Qwen/Qwen2.5-7B) for better quality
9. **ClipWriterStage**: Saves processed clips and metadata

### Data Flow

```
Input Videos → VideoReader → Splitting → Transcoding → Filtering → Embeddings → Captions → Caption Enhancement → Output
```

**Note**: All stages except VideoReader are optional. You can customize the pipeline by:
- **Basic**: VideoReader → Splitting → Transcoding → Output
- **With Quality Control**: Add Motion/Aesthetic filtering
- **With AI Features**: Add Embedding generation and/or Caption generation
- **Full Pipeline**: Include all stages for comprehensive video processing


## Understanding the Video Pipeline

NeMo Curator's video pipeline is built on a **stage-based architecture** where each stage performs a specific processing step:

### Core Components

1. **Pipelines**: Ordered sequences of stages forming an end-to-end workflow
2. **Stages**: Individual processing units that perform single steps
3. **Tasks**: Data units that flow through the pipeline (`VideoTask` containing `Video` and `Clip` objects)
4. **Executors**: Components that run pipelines on distributed backends (Ray)

### Pipeline Stages

The video pipeline includes these stages (all optional - choose based on your needs):

1. **VideoReader**: Reads video files and extracts metadata *(Required)*
2. **Splitting Algorithm** *(Optional)*: 
   - **Fixed Stride**: Splits videos into fixed-length clips
   - **TransNetV2**: Uses AI to detect scene transitions for intelligent splitting ([GitHub](https://github.com/soCzech/TransNetV2))
3. **ClipTranscodingStage** *(Optional)*: Converts clips to standardized format
4. **MotionFilterStage** *(Optional)*: Filters clips based on motion content
5. **ClipAestheticFilterStage** *(Optional)*: Filters clips based on aesthetic quality using [CLIP](https://openai.com/research/clip) model
6. **Embedding Generation** *(Optional)*: Creates vector embeddings for similarity search
   - **Cosmos-Embed1**: NVIDIA's state-of-the-art video embedding model (224p, 336p, 448p variants) ([Hugging Face](https://huggingface.co/nvidia/Cosmos-Embed1-448p))
   - **InternVideo2**: Advanced video understanding model for comprehensive embeddings ([GitHub](https://github.com/OpenGVLab/InternVideo2))
7. **Caption Generation** *(Optional)*: Generates text descriptions of video content using [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL) model
8. **Caption Enhancement** *(Optional)*: Improves and refines generated captions using [Qwen-LM](https://huggingface.co/Qwen/Qwen2.5-7B) for better quality
9. **ClipWriterStage** *(Required)*: Saves processed clips and metadata

### Data Flow

```
Input Videos → VideoReader → Splitting → Transcoding → Filtering → Embeddings → Captions → Caption Enhancement → Output
```

**Note**: All stages except VideoReader are optional. You can customize the pipeline by:
- **Basic**: VideoReader → Splitting → Transcoding → Output
- **With Quality Control**: Add Motion/Aesthetic filtering
- **With AI Features**: Add Embedding generation and/or Caption generation
- **Full Pipeline**: Include all stages for comprehensive video processing


### Running the Basic Example

[`video_read_example.py`](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/video/getting-started/video_read_example.py). To run this example:

```bash
python video_read_example.py --video-folder /path/to/your/videos --video-limit 5 --verbose
```

**Parameters:**
- `--video-folder`: Path to directory containing video files
- `--video-limit`: Maximum number of videos to process (-1 for unlimited)
- `--verbose`: Enable detailed logging

**What it does:**
- Reads video files from the specified directory
- Extracts metadata (duration, framerate, resolution, etc.)
- Processes videos in parallel using Ray
- Provides detailed logging of the process


### Running the Advanced Example

The comprehensive video processing example is available in [video_split_clip_example.py](https://github.com/NVIDIA-NeMo/Curator/blob/main/tutorials/video/getting-started/video_split_clip_example.py)

To run the comprehensive video processing pipeline, use the provided script:

Key features of the comprehensive pipeline:
- Video reading and metadata extraction
- Multiple splitting algorithms (Fixed Stride and TransNetV2)
- Clip transcoding with various encoders 
- Motion and aesthetic filtering
- Embedding generation (Cosmos-Embed1, InternVideo2)
- Caption generation (Qwen)
- Preview generation
- Flexible output options


```bash
python video_split_clip_example.py \
    --video-dir /path/to/your/videos \
    --model-dir /path/to/models \
    --output-clip-path /path/to/output/clips \
    --splitting-algorithm fixed_stride \
    --generate-embeddings \
    --video-limit 5 \
    --verbose
```

**Key Parameters:**
- `--video-dir`: Input video directory
- `--model-dir`: Model directory (uses cache if not specified)
- `--output-clip-path`: Output directory for processed clips
- `--splitting-algorithm`: Choose between "fixed_stride" or "transnetv2"
- `--generate-embeddings`: Enable embedding generation
- `--generate-captions`: Enable caption generation
- `--aesthetic-threshold`: Filter clips by aesthetic score (e.g., 3.5)
- `--motion-filter`: Motion filtering mode ("disable", "enable", "score-only")


## Pipeline Parameters Explained

### Splitting Algorithms

#### Fixed Stride Splitting
**What it does**: Splits videos into clips of fixed duration at regular intervals.
- **Parameters**:
  - `--fixed-stride-split-duration`: Duration of each clip in seconds (default: 10.0)
  - `--fixed-stride-min-clip-length-s`: Minimum clip length in seconds (default: 2.0)
  - `--limit-clips`: Maximum clips per video (0 = unlimited)

#### TransNetV2 Splitting
**What it does**: Uses AI to detect scene transitions and intelligently split videos at natural break points.
- **Parameters**:
  - `--transnetv2-threshold`: Probability threshold for scene transitions (default: 0.4)
  - `--transnetv2-min-length-s`: Minimum scene length in seconds (default: 2.0)
  - `--transnetv2-max-length-s`: Maximum scene length in seconds (default: 10.0)
  - `--transnetv2-max-length-mode`: How to handle long scenes ("truncate" or "stride")
  - `--transnetv2-crop-s`: Seconds to crop from start/end of scenes (default: 0.5)

### Transcoding Parameters

**What it does**: Converts video clips to a standardized format for consistent processing and storage.
- `--transcode-encoder`: Video encoder ("libopenh264", "h264_nvenc", "libx264")
- `--transcode-encoder-threads`: CPU threads per encoding operation
- `--transcode-ffmpeg-batch-size`: Number of clips to encode in parallel
- `--transcode-use-hwaccel`: Use GPU acceleration for decoding
- `--transcode-use-input-video-bit-rate`: Use input video's bit rate

### Filtering Parameters

#### Motion Filtering
**What it does**: Analyzes video motion content to filter out static or low-motion clips.
- `--motion-filter`: Mode ("disable", "enable", "score-only")
- `--motion-global-mean-threshold`: Global motion threshold (default: 0.00098)
- `--motion-per-patch-min-256-threshold`: Per-patch motion threshold (default: 0.000001)

#### Aesthetic Filtering
**What it does**: Uses AI to score video clips based on visual quality and aesthetic appeal.
- `--aesthetic-threshold`: Minimum aesthetic score (e.g., 3.5)
- `--aesthetic-reduction`: Score reduction method ("mean" or "min")

### Embedding Parameters

**What it does**: Generates vector embeddings from video clips for similarity search and clustering.
- `--embedding-algorithm`: Algorithm ("cosmos-embed1-224p", "cosmos-embed1-336p", "cosmos-embed1-448p", "internvideo2")
- `--embedding-gpu-memory-gb`: GPU memory allocation (default: 20.0)

### Captioning Parameters

**What it does**: Generates text descriptions of video content using AI vision-language models.
- `--generate-captions`: Enable caption generation
- `--captioning-algorithm`: Model variant ("qwen")
- `--captioning-batch-size`: Batch size for processing (default: 8)
- `--captioning-max-output-tokens`: Maximum tokens per caption (default: 512)
- `--captioning-sampling-fps`: Frames per second for sampling (default: 2.0)


## Example Usage Scenarios

### Scenario 1: Basic Video Splitting
For simple video splitting without advanced features:

```bash
python video_split_clip_example.py \
    --video-dir /path/to/videos \
    --model-dir /path/to/models \
    --output-clip-path /path/to/output \
    --splitting-algorithm fixed_stride \
    --fixed-stride-split-duration 15.0 \
    --video-limit 10
```

### Scenario 2: High-Quality Video Processing
For production-quality processing with all features:

```bash
python video_split_clip_example.py \
    --video-dir /path/to/videos \
    --model-dir /path/to/models \
    --output-clip-path /path/to/output \
    --splitting-algorithm transnetv2 \
    --transnetv2-threshold 0.3 \
    --transnetv2-min-length-s 3.0 \
    --transnetv2-max-length-s 15.0 \
    --generate-embeddings \
    --embedding-algorithm cosmos-embed1-336p \
    --generate-captions \
    --captioning-batch-size 4 \
    --aesthetic-threshold 3.5 \
    --motion-filter enable \
    --transcode-encoder h264_nvenc \
    --transcode-use-hwaccel \
    --video-limit 50
```

### Scenario 3: Quick Testing
For rapid testing with minimal resources:

```bash
python video_split_clip_example.py \
    --video-dir /path/to/videos \
    --model-dir /path/to/models \
    --output-clip-path /path/to/output \
    --splitting-algorithm fixed_stride \
    --fixed-stride-split-duration 5.0 \
    --transcode-encoder libopenh264 \
    --video-limit 3 \
    --dry-run
```


## Interactive End-to-End Example

Now let's put everything together! This section will walk you through a complete video processing pipeline from start to finish.

### What We'll Do

1. **Download sample videos** from the PE-Video dataset
2. **Process the videos** using NeMo Curator's video pipeline
3. **Explore the results** and understand the output structure

This hands-on example will help you understand how all the components work together in practice.

---


### Step 1: Download Sample Videos

First, let's download some sample videos from the [PE-Video](https://huggingface.co/datasets/facebook/PE-Video) dataset. This will give us real video content to work with.

The following code cell would download 10 videos from PE-Video dataset:


In [None]:
# Install required dependencies for this example
!pip install datasets

import os
from pathlib import Path
from datasets import load_dataset

# Create output directory for sample videos
output_dir = Path("./pe_video_samples")
output_dir.mkdir(exist_ok=True)

print(f"Downloading sample videos to: {output_dir.absolute()}")

# Load PE-Video dataset (streaming mode for efficiency)
dataset = load_dataset("facebook/PE-Video", split="train", streaming=True)

# Download 10 sample videos (adjust this number as needed)
count = 0
max_videos = 10

print(f"Downloading {max_videos} sample videos...")

for sample in dataset:
    if count >= max_videos:
        break
    
    video_data = sample.get('mp4')
    description = sample.get('json', {}).get('description', f'video_{count+1}')
    
    if video_data:
        # Create safe filename
        safe_name = "".join(c for c in description[:30] if c.isalnum() or c in (' ', '-', '_')).strip()
        filename = f"{safe_name}_{count+1}.mp4" if safe_name else f"video_{count+1}.mp4"
        
        # Save video
        with open(output_dir / filename, 'wb') as f:
            f.write(video_data)
        
        print(f"✓ Downloaded: {filename}")
        count += 1

print(f"Successfully downloaded {count} videos to {output_dir.absolute()}")
print(f"Video files:")
for video_file in output_dir.glob("*.mp4"):
    file_size = video_file.stat().st_size / (1024 * 1024)  # Size in MB
    print(f"   - {video_file.name} ({file_size:.1f} MB)")


### Step 2: Set Up Video Processing Pipeline

Now let's configure and run the video processing pipeline on our downloaded videos. We'll use a moderate configuration that demonstrates key features without requiring excessive resources.


### Command Breakdown

The following command runs the complete video processing pipeline. Here's what each parameter does:

**📁 Input/Output:**
- `--video-dir ./pe_video_samples` → Input directory containing our downloaded videos
- `--output-clip-path ./processed_clips` → Output directory where processed clips will be saved

**✂️ Video Splitting:**
- `--splitting-algorithm fixed_stride` → Split videos into clips using fixed time intervals
  - *Alternative: `transnetv2` for AI-based scene detection*
- `--fixed-stride-split-duration 8.0` → Each clip will be 8 seconds long
- `--fixed-stride-min-clip-length-s 2.0` → Discard clips shorter than 2 seconds

**🎥 Video Processing:**
- `--transcode-encoder libopenh264` → Use libopenh264 codec (good speed/quality balance)
  - *Alternatives: `h264_nvenc` (GPU, fastest), `libx264` (CPU, highest quality)*
- `--transcode-ffmpeg-batch-size 8` → Process 8 clips in parallel during transcoding

**🧠 AI Features:**
- `--generate-embeddings` → Generate vector embeddings for similarity search and clustering
- `--embedding-algorithm cosmos-embed1-224p` → Use NVIDIA's Cosmos-Embed1 model at 224p resolution
  - *Alternatives: `cosmos-embed1-336p`, `cosmos-embed1-448p`, `internvideo2`*
- `--embedding-gpu-memory-gb 8.0` → Allocate 8GB of GPU memory for embedding generation

**🔍 Quality Filtering:**
- `--motion-filter score-only` → Calculate motion scores but don't filter clips based on motion
  - *Alternatives: `enable` (filter low-motion clips), `disable` (no motion analysis)*
- `--aesthetic-threshold 3.0` → Filter out clips with aesthetic scores below 3.0 (1-5 scale)
  - *Higher values = more selective filtering*

**⚙️ Processing Control:**
- `--video-limit 3` → Process only 3 videos (for this example)
  - *Remove this parameter to process all videos*
- `--verbose` → Show detailed progress information during processing


In [None]:
!python video_split_clip_example.py \
    --video-dir ./pe_video_samples \
    --output-clip-path ./processed_clips \
    --splitting-algorithm fixed_stride \
    --fixed-stride-split-duration 8.0 \
    --fixed-stride-min-clip-length-s 2.0 \
    --transcode-encoder libopenh264 \
    --transcode-ffmpeg-batch-size 8 \
    --generate-embeddings \
    --embedding-algorithm cosmos-embed1-224p \
    --embedding-gpu-memory-gb 8.0 \
    --motion-filter score-only \
    --aesthetic-threshold 3.0 \
    --video-limit 3 \
    --verbose


### Step 3: Understanding the Output

The video pipeline produces several types of output:

#### 📁 Directory Structure
```
processed_clips/
├── clips/                    # Processed video clips (.mp4 files)
│   ├── video1_clip_0.mp4
│   ├── video1_clip_1.mp4
│   └── ...
├── metadata/                 # Metadata files (.json)
│   ├── video1_metadata.json
│   └── ...
└── iv2_embd/              # InternVideo2 Embedding files (if generated)
    └── ...
```

#### 📊 Metadata Fields
Each clip in the metadata includes:
- **Basic Info**: `clip_path`, `duration`, `fps`, `resolution`
- **Quality Scores**: `aesthetic_score`, `motion_score`
- **AI Features**: `embedding` (vector), `caption` (text description)
- **Processing Info**: `source_video`, `clip_index`, `timestamp`

#### 🎯 Next Steps
Now that you've seen the complete pipeline in action, you can:

1. **Experiment with parameters** - Try different splitting algorithms, thresholds, or models
2. **Scale up** - Process more videos or use higher-quality settings
3. **Customize the pipeline** - Add or remove stages based on your needs
4. **Use the results** - Leverage embeddings for similarity search or captions for content analysis


## Video Deduplication Pipeline

After processing videos and generating embeddings, you may want to remove duplicate or very similar video clips from your dataset. NeMo Curator provides a powerful semantic deduplication pipeline that uses the generated embeddings to identify and remove near-duplicate content.

### What is Semantic Deduplication?

Semantic deduplication goes beyond simple hash-based deduplication by understanding the *content* of videos. It uses the embeddings generated in the previous steps to:

- **Identify similar content** even when videos have different encoding, resolution, or slight variations
- **Group similar clips** using clustering algorithms
- **Remove duplicates** while preserving the most representative examples
- **Maintain metadata** for all processed clips

### When to Use Deduplication

- **Large datasets** with potential duplicate content
- **Video collections** from multiple sources
- **Content curation** where quality over quantity matters
- **Storage optimization** by removing redundant clips
- **Training data preparation** for machine learning models

### Deduplication Pipeline Parameters

The semantic deduplication pipeline offers several key parameters:

- **`n_clusters`**: Number of clusters for grouping similar content (default: 100)
- **`distance_metric`**: Method for measuring similarity ("cosine", "euclidean", "manhattan")
- **`eps`**: Maximum distance threshold for considering clips as duplicates (lower = more strict)
- **`which_to_keep`**: Strategy for selecting which clip to keep from duplicates ("random", "first", "last")
- **`random_state`**: Seed for reproducible results

### Running the Deduplication Pipeline

The following example shows how to run semantic deduplication on your processed video clips:


In [None]:
# Import required modules
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.deduplication.semantic import SemanticDeduplicationWorkflow
import os

# Configuration for deduplication
# Update these paths to match your actual processed video output
input_embeddings_path = "./processed_clips/iv2_embd_parquet"  # Path to your embedding parquet files
output_dedup_path = "./processed_clips/dedup_output"  # Path for deduplicated results

# Create output directory if it doesn't exist
os.makedirs(output_dedup_path, exist_ok=True)

# Create the deduplication pipeline
def create_video_dedup_pipeline():
    return SemanticDeduplicationWorkflow(
        input_path=input_embeddings_path,
        output_path=output_dedup_path,
        id_field="id",                    # Field containing unique clip identifiers
        embedding_field="embeddings",     # Field containing the vector embeddings
        metadata_fields=["id"],           # Additional metadata fields to preserve
        n_clusters=100,                   # Number of clusters for grouping similar content
        distance_metric="cosine",         # Distance metric for similarity calculation
        which_to_keep="random",           # Strategy for selecting which duplicate to keep
        random_state=42,                  # Random seed for reproducible results
        eps=0.002,                        # Maximum distance threshold for duplicates (lower = more strict)
        # Storage options for local filesystem
        read_kwargs={"storage_options": {}},
        write_kwargs={"storage_options": {}},
        verbose=True,                     # Enable detailed logging
    )

# Run the deduplication pipeline
print("Starting video deduplication pipeline...")
print(f"Input embeddings: {input_embeddings_path}")
print(f"Output directory: {output_dedup_path}")

# Create and run the pipeline
pipeline = create_video_dedup_pipeline()
pipeline.run()

print("Deduplication completed!")
print(f"Results saved to: {output_dedup_path}")


### Understanding Deduplication Results

After running the deduplication pipeline, you'll find:

#### 📁 Output Structure
```
dedup_output/
├── deduplicated_embeddings.parquet    # Deduplicated embedding data
├── cluster_assignments.parquet        # Cluster membership for each clip
└── duplicate_groups.parquet           # Groups of identified duplicates
```

#### 📊 Key Metrics
The pipeline provides several useful metrics:
- **Total clips processed**: Number of input clips
- **Duplicates found**: Number of clips identified as duplicates
- **Deduplication ratio**: Percentage of clips removed
- **Clusters created**: Number of similarity groups formed

#### 🎯 Customizing Deduplication

You can adjust the deduplication behavior by modifying these parameters:

**Strictness Control:**
- **`eps=0.001`**: Very strict (only nearly identical clips are considered duplicates)
- **`eps=0.005`**: Moderate (somewhat similar clips are considered duplicates)
- **`eps=0.01`**: Lenient (loosely similar clips are considered duplicates)

**Clustering Strategy:**
- **`n_clusters=50`**: Fewer, larger clusters (more aggressive deduplication)
- **`n_clusters=200`**: More, smaller clusters (more conservative deduplication)

**Distance Metrics:**
- **`"cosine"`**: Best for high-dimensional embeddings (recommended)
- **`"euclidean"`**: Good for normalized embeddings
- **`"manhattan"`**: Alternative for specific use cases

### Integration with Video Pipeline

The deduplication pipeline seamlessly integrates with the video processing pipeline:

1. **Process videos** → Generate embeddings using the video pipeline
2. **Run deduplication** → Remove duplicate clips using this pipeline
3. **Use results** → Apply deduplicated dataset for your specific use case

This two-step approach ensures you have both high-quality video content and an optimized, duplicate-free dataset.


## Summary

Now that you understand the basics of NeMo Curator's video pipeline, you can:

1. **Experiment with different parameters** to optimize for your specific use case
2. **Scale up processing** by increasing `--video-limit` and using more powerful hardware
3. **Customize the pipeline** by adding or removing stages based on your needs
4. **Integrate with other tools** by using the generated embeddings and metadata
5. **Explore advanced features** like caption enhancement and preview generation

### Additional Resources

- **Official Documentation**: [NeMo Curator Video Guide](https://docs.nvidia.com/nemo-curator/)
- **API Reference**: Detailed documentation of all stages and parameters
- **Examples**: More complex examples in the `tutorials/` directory
- **Community**: Join discussions and get help from the community

### Key Takeaways

- NeMo Curator provides a powerful, scalable framework for video curation
- The pipeline is modular and can be customized for different use cases
- GPU acceleration significantly improves performance for large-scale processing
- Proper parameter tuning is essential for optimal results
- The system handles distributed processing automatically through Ray

Happy video curating! 🎬✨