StarlightSearch · sonam-pankaj95 · Mar 11, 2026 · Dec 28, 2025 · Jan 1, 2026 · Jan 3, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/README.md b/README.md
@@ -91,6 +91,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
 - **AWS S3 Bucket:** : Directly import AWS S3 bucket files.
 - **Prebult Docker Image** : Just pull it: starlightsearch/embedanything-server
 - **SearchAgent** : Example of how you can use index for Searchr1 reasoning.
+- **Video guide** : Quick start for frame sampling: https://embed-anything.com/guides/video/
 
 ## 💡What is Vector Streaming
 
@@ -478,7 +479,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
 
 - [x] Images
 
-- [ ] Videos
+- [x] Videos (frame sampling; enable the `video` feature)
 
 - [ ] Graph
 
@@ -498,7 +499,7 @@ We now support both candle and Onnx backend<br/>
 We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
 
 ➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
-➡️ Video Embedding <br/>
+➡️ Video embedding improvements (temporal + audio) <br/>
 ➡️ Yolo Clip <br/>
 
 

diff --git a/docs/guides/video.md b/docs/guides/video.md
@@ -0,0 +1,86 @@
+# Video Embeddings (Frame Sampling)
+
+EmbedAnything supports video by sampling frames and embedding them with a vision model
+(CLIP/SigLIP). This is opt-in via the `video` feature flag and requires the `ffmpeg`
+CLI to be available on your system. If `ffmpeg` is not on `PATH`, set `FFMPEG_BIN`
+to the full path of the executable.
+
+## Recommended Config
+
+`VideoEmbedConfig` controls how frames are sampled:
+
+- `frame_step`: sample every Nth frame. Default `30`.
+- `max_frames`: maximum frames per video. Default `300`.
+- `batch_size`: frames per embedding batch. Default `32`.
+
+Suggested starting point:
+
+```python
+from embed_anything import VideoEmbedConfig
+
+config = VideoEmbedConfig(frame_step=30, max_frames=300, batch_size=16)
+```
+
+## Python Usage
+
+```python
+import embed_anything
+from embed_anything import VideoEmbedConfig
+
+model = embed_anything.EmbeddingModel.from_pretrained_hf(
+    model_id="openai/clip-vit-base-patch16"
+)
+
+config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)
+
+data = embed_anything.embed_video_file("path/to/video.mp4", embedder=model, config=config)
+```
+
+## Build with Video Support
+
+You must enable the `video` feature and have the `ffmpeg` CLI installed.
+
+### macOS
+
+```bash
+brew install ffmpeg
+cargo build --features video
+# Python (maturin)
+maturin develop --features "extension-module,video"
+```
+
+### Linux (Debian/Ubuntu)
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg
+cargo build --features video
+# Python (maturin)
+maturin develop --features "extension-module,video"
+```
+
+### Windows (prebuilt FFmpeg)
+
+```powershell
+1. Download a static build from https://www.gyan.dev/ffmpeg/builds/
+2. Extract it and set:
+
+```powershell
+$env:FFMPEG_BIN = "C:\path\to\ffmpeg.exe"
+```
+
+Then build:
+
+```powershell
+cargo build --features video
+# Python (maturin)
+maturin develop --features "extension-module,video"
+```
+```
+
+## Output Metadata
+
+Each embedding includes:
+
+- `video_path`: the source video file
+- `frame_index`: the sampled frame index (0-based)
diff --git a/docs/index.md b/docs/index.md
@@ -74,7 +74,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
 - **Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
 - **ONNX Backend**: Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
 - **Cloud Embedding Models:**: Supports OpenAI, Cohere, and Gemini.
-- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
+- **MultiModality** : Works with text sources like PDFs, txt, md, images, audio (.WAV), and videos (frame sampling; enable the `video` feature)
 - **GPU support** : Hardware acceleration on GPU as well.
 - **Chunking** : In-built chunking methods like semantic, late-chunking
 - **Vector Streaming:** Separate file processing, Indexing and Inferencing on different threads, reduces latency.
@@ -339,7 +339,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
 
 - [x] Images
 
-- [ ] Videos
+- [x] Videos (frame sampling; enable the `video` feature)
 
 - [ ] Graph
 
@@ -359,7 +359,7 @@ We now support both candle and Onnx backend<br/>
 We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
 
 ➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
-➡️ Video Embedding <br/>
+➡️ Video embedding improvements (temporal + audio) <br/>
 ➡️ Yolo Clip <br/>
 
 

diff --git a/docs/roadmap/roadmap.md b/docs/roadmap/roadmap.md
@@ -17,7 +17,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
 
 - [x] Images
 
-- [ ] Videos
+- [x] Videos (frame sampling; enable the `video` feature)
 
 - [ ] Graph
 
@@ -58,7 +58,7 @@ To address this, we’re excited to announce that we’re introducing Candle-ONN
 We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
 
 ☑️Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
-☑️Video Embedding <br/>
+☑️Video embedding improvements (temporal + audio) <br/>
 ☑️ Yolo Clip <br/>
 
 

diff --git a/examples/video.py b/examples/video.py
@@ -0,0 +1,37 @@
+import os
+from pathlib import Path
+
+import embed_anything
+from embed_anything import EmbedData, VideoEmbedConfig
+
+# Load a vision model (CLIP/SigLIP) for frame embeddings
+model = embed_anything.EmbeddingModel.from_pretrained_hf(
+    model_id="openai/clip-vit-base-patch16"
+)
+
+# Sample every 30th frame (~1 fps for 30 fps videos), cap to 200 frames
+config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)
+
+video_path = os.environ.get("VIDEO_PATH", "path/to/video.mp4")
+if not Path(video_path).exists():
+    raise FileNotFoundError(
+        f"Video not found: {video_path}. Set VIDEO_PATH env var to a valid file."
+    )
+
+# Embed a single video
+video_embeddings: list[EmbedData] = embed_anything.embed_video_file(
+    video_path,
+    embedder=model,
+    config=config,
+)
+print(f"Embedded {len(video_embeddings)} frames from video.")
+
+video_dir = os.environ.get("VIDEO_DIR")
+if video_dir:
+    dir_embeddings = embed_anything.embed_video_directory(
+        video_dir,
+        embedder=model,
+        config=config,
+    )
+    if dir_embeddings is not None:
+        print(f"Embedded {len(dir_embeddings)} total frames from directory.")
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -52,6 +52,7 @@ nav:
 - Guides:
   - guides/colpali.md
   - guides/images.md
+  - guides/video.md
   - guides/semantic.md
   - guides/adapters.md
   - guides/onnx_models.md

diff --git a/processors/Cargo.toml b/processors/Cargo.toml
@@ -30,9 +30,11 @@ pdf2image = "0.1.3"
 image = "0.25.6"
 thiserror = "2.0.12"
 tempfile = "3.19.1"
+# Video processing (uses external ffmpeg CLI)
 
 [dev-dependencies]
 tempdir = "0.3.7"
 
 [features]
 default = []
+video = []
diff --git a/processors/src/lib.rs b/processors/src/lib.rs
@@ -15,3 +15,7 @@ pub mod html_processor;
 
 /// This module contains the file processor for DOCX files.
 pub mod docx_processor;
+
+/// This module contains the file processor for video files.
+#[cfg(feature = "video")]
+pub mod video_processor;
diff --git a/processors/src/video_processor.rs b/processors/src/video_processor.rs
@@ -0,0 +1,145 @@
+#![cfg(feature = "video")]
+
+use anyhow::{anyhow, Result};
+use std::env;
+use std::path::{Path, PathBuf};
+use std::process::Command;
+use std::{fs, path};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VideoFrameFormat {
+    Jpeg,
+    Png,
+}
+
+impl VideoFrameFormat {
+    fn extension(self) -> &'static str {
+        match self {
+            VideoFrameFormat::Jpeg => "jpg",
+            VideoFrameFormat::Png => "png",
+        }
+    }
+}
+
+#[derive(Debug, Clone)]
+pub struct VideoFrame {
+    pub index: usize,
+    pub path: PathBuf,
+}
+
+#[derive(Debug, Clone)]
+pub struct VideoProcessor {
+    frame_step: usize,
+    max_frames: Option<usize>,
+    output_format: VideoFrameFormat,
+    ffmpeg_bin: Option<PathBuf>,
+}
+
+impl VideoProcessor {
+    pub fn new(frame_step: usize) -> Self {
+        Self {
+            frame_step: frame_step.max(1),
+            max_frames: None,
+            output_format: VideoFrameFormat::Jpeg,
+            ffmpeg_bin: None,
+        }
+    }
+
+    pub fn with_max_frames(mut self, max_frames: usize) -> Self {
+        self.max_frames = Some(max_frames);
+        self
+    }
+
+    pub fn with_output_format(mut self, output_format: VideoFrameFormat) -> Self {
+        self.output_format = output_format;
+        self
+    }
+
+    pub fn with_ffmpeg_bin<P: AsRef<Path>>(mut self, ffmpeg_bin: P) -> Self {
+        self.ffmpeg_bin = Some(ffmpeg_bin.as_ref().to_path_buf());
+        self
+    }
+
+    fn resolve_ffmpeg_bin(&self) -> Result<PathBuf> {
+        if let Some(bin) = &self.ffmpeg_bin {
+            return Ok(bin.clone());
+        }
+        if let Ok(bin) = env::var("FFMPEG_BIN") {
+            return Ok(PathBuf::from(bin));
+        }
+        Ok(PathBuf::from("ffmpeg"))
+    }
+
+    pub fn extract_frames_to_dir<P: AsRef<Path>, Q: AsRef<Path>>(
+        &self,
+        video_path: P,
+        output_dir: Q,
+    ) -> Result<Vec<VideoFrame>> {
+        let output_dir = output_dir.as_ref();
+        fs::create_dir_all(output_dir)?;
+
+        let ffmpeg_bin = self.resolve_ffmpeg_bin()?;
+        let frame_step = self.frame_step.max(1);
+        let filter = format!("select=not(mod(n\\,{}))", frame_step);
+        let output_pattern = output_dir.join(format!(
+            "frame_%06d.{}",
+            self.output_format.extension()
+        ));
+
+        let mut command = Command::new(ffmpeg_bin);
+        command
+            .arg("-hide_banner")
+            .arg("-loglevel")
+            .arg("error")
+            .arg("-i")
+            .arg(video_path.as_ref())
+            .arg("-vf")
+            .arg(filter)
+            .arg("-vsync")
+            .arg("vfr");
+
+        if let Some(max_frames) = self.max_frames {
+            command.arg("-vframes").arg(max_frames.to_string());
+        }
+
+        let status = command.arg(output_pattern).status()?;
+        if !status.success() {
+            return Err(anyhow!("ffmpeg failed with exit code {:?}", status.code()));
+        }
+
+        let mut frame_paths = fs::read_dir(output_dir)?
+            .filter_map(|entry| entry.ok())
+            .filter(|entry| entry.file_type().map(|t| t.is_file()).unwrap_or(false))
+            .map(|entry| entry.path())
+            .filter(|path| {
+                path.extension()
+                    .and_then(|ext| ext.to_str())
+                    .map(|ext| ext.eq_ignore_ascii_case(self.output_format.extension()))
+                    .unwrap_or(false)
+            })
+            .collect::<Vec<path::PathBuf>>();
+
+        frame_paths.sort();
+
+        if frame_paths.is_empty() {
+            return Err(anyhow!("No frames extracted from video"));
+        }
+
+        let frames = frame_paths
+            .into_iter()
+            .enumerate()
+            .map(|(index, path)| VideoFrame { index, path })
+            .collect();
+
+        Ok(frames)
+    }
+
+    pub fn extract_frames_to_temp_dir<P: AsRef<Path>>(
+        &self,
+        video_path: P,
+    ) -> Result<(tempfile::TempDir, Vec<VideoFrame>)> {
+        let temp_dir = tempfile::TempDir::new()?;
+        let frames = self.extract_frames_to_dir(video_path, temp_dir.path())?;
+        Ok((temp_dir, frames))
+    }
+}
diff --git a/python/Cargo.toml b/python/Cargo.toml
@@ -26,4 +26,4 @@ cudnn = ["embed_anything/cudnn"]
 metal = ["embed_anything/metal"]
 ort = ["embed_anything/ort"]
 audio = ["embed_anything/audio"]
-aws = ["embed_anything/aws"]
+aws = ["embed_anything/aws"]