Skip to content
Merged

Dev #201

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
- **AWS S3 Bucket:** : Directly import AWS S3 bucket files.
- **Prebult Docker Image** : Just pull it: starlightsearch/embedanything-server
- **SearchAgent** : Example of how you can use index for Searchr1 reasoning.
- **Video guide** : Quick start for frame sampling: https://embed-anything.com/guides/video/

## 💡What is Vector Streaming

Expand Down Expand Up @@ -478,7 +479,7 @@ We’re excited to share that we've expanded our platform to support multiple mo

- [x] Images

- [ ] Videos
- [x] Videos (frame sampling; enable the `video` feature)

- [ ] Graph

Expand All @@ -498,7 +499,7 @@ We now support both candle and Onnx backend<br/>
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.

➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
➡️ Video Embedding <br/>
➡️ Video embedding improvements (temporal + audio) <br/>
➡️ Yolo Clip <br/>


Expand Down
86 changes: 86 additions & 0 deletions docs/guides/video.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Video Embeddings (Frame Sampling)

EmbedAnything supports video by sampling frames and embedding them with a vision model
(CLIP/SigLIP). This is opt-in via the `video` feature flag and requires the `ffmpeg`
CLI to be available on your system. If `ffmpeg` is not on `PATH`, set `FFMPEG_BIN`
to the full path of the executable.

## Recommended Config

`VideoEmbedConfig` controls how frames are sampled:

- `frame_step`: sample every Nth frame. Default `30`.
- `max_frames`: maximum frames per video. Default `300`.
- `batch_size`: frames per embedding batch. Default `32`.

Suggested starting point:

```python
from embed_anything import VideoEmbedConfig

config = VideoEmbedConfig(frame_step=30, max_frames=300, batch_size=16)
```

## Python Usage

```python
import embed_anything
from embed_anything import VideoEmbedConfig

model = embed_anything.EmbeddingModel.from_pretrained_hf(
model_id="openai/clip-vit-base-patch16"
)

config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)

data = embed_anything.embed_video_file("path/to/video.mp4", embedder=model, config=config)
```

## Build with Video Support

You must enable the `video` feature and have the `ffmpeg` CLI installed.

### macOS

```bash
brew install ffmpeg
cargo build --features video
# Python (maturin)
maturin develop --features "extension-module,video"
```

### Linux (Debian/Ubuntu)

```bash
sudo apt-get update
sudo apt-get install -y ffmpeg
cargo build --features video
# Python (maturin)
maturin develop --features "extension-module,video"
```

### Windows (prebuilt FFmpeg)

```powershell
1. Download a static build from https://www.gyan.dev/ffmpeg/builds/
2. Extract it and set:

```powershell
$env:FFMPEG_BIN = "C:\path\to\ffmpeg.exe"
```

Then build:

```powershell
cargo build --features video
# Python (maturin)
maturin develop --features "extension-module,video"
```
```

## Output Metadata

Each embedding includes:

- `video_path`: the source video file
- `frame_index`: the sampled frame index (0-based)
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
- **Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
- **ONNX Backend**: Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
- **Cloud Embedding Models:**: Supports OpenAI, Cohere, and Gemini.
- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
- **MultiModality** : Works with text sources like PDFs, txt, md, images, audio (.WAV), and videos (frame sampling; enable the `video` feature)
- **GPU support** : Hardware acceleration on GPU as well.
- **Chunking** : In-built chunking methods like semantic, late-chunking
- **Vector Streaming:** Separate file processing, Indexing and Inferencing on different threads, reduces latency.
Expand Down Expand Up @@ -339,7 +339,7 @@ We’re excited to share that we've expanded our platform to support multiple mo

- [x] Images

- [ ] Videos
- [x] Videos (frame sampling; enable the `video` feature)

- [ ] Graph

Expand All @@ -359,7 +359,7 @@ We now support both candle and Onnx backend<br/>
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.

➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
➡️ Video Embedding <br/>
➡️ Video embedding improvements (temporal + audio) <br/>
➡️ Yolo Clip <br/>


Expand Down
4 changes: 2 additions & 2 deletions docs/roadmap/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ We’re excited to share that we've expanded our platform to support multiple mo

- [x] Images

- [ ] Videos
- [x] Videos (frame sampling; enable the `video` feature)

- [ ] Graph

Expand Down Expand Up @@ -58,7 +58,7 @@ To address this, we’re excited to announce that we’re introducing Candle-ONN
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.

☑️Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
☑️Video Embedding <br/>
☑️Video embedding improvements (temporal + audio) <br/>
☑️ Yolo Clip <br/>


Expand Down
37 changes: 37 additions & 0 deletions examples/video.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import os
from pathlib import Path

import embed_anything
from embed_anything import EmbedData, VideoEmbedConfig

# Load a vision model (CLIP/SigLIP) for frame embeddings
model = embed_anything.EmbeddingModel.from_pretrained_hf(
model_id="openai/clip-vit-base-patch16"
)

# Sample every 30th frame (~1 fps for 30 fps videos), cap to 200 frames
config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)

video_path = os.environ.get("VIDEO_PATH", "path/to/video.mp4")
if not Path(video_path).exists():
raise FileNotFoundError(
f"Video not found: {video_path}. Set VIDEO_PATH env var to a valid file."
)

# Embed a single video
video_embeddings: list[EmbedData] = embed_anything.embed_video_file(
video_path,
embedder=model,
config=config,
)
print(f"Embedded {len(video_embeddings)} frames from video.")

video_dir = os.environ.get("VIDEO_DIR")
if video_dir:
dir_embeddings = embed_anything.embed_video_directory(
video_dir,
embedder=model,
config=config,
)
if dir_embeddings is not None:
print(f"Embedded {len(dir_embeddings)} total frames from directory.")
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ nav:
- Guides:
- guides/colpali.md
- guides/images.md
- guides/video.md
- guides/semantic.md
- guides/adapters.md
- guides/onnx_models.md
Expand Down
2 changes: 2 additions & 0 deletions processors/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,11 @@ pdf2image = "0.1.3"
image = "0.25.6"
thiserror = "2.0.12"
tempfile = "3.19.1"
# Video processing (uses external ffmpeg CLI)

[dev-dependencies]
tempdir = "0.3.7"

[features]
default = []
video = []
4 changes: 4 additions & 0 deletions processors/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ pub mod html_processor;

/// This module contains the file processor for DOCX files.
pub mod docx_processor;

/// This module contains the file processor for video files.
#[cfg(feature = "video")]
pub mod video_processor;
145 changes: 145 additions & 0 deletions processors/src/video_processor.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#![cfg(feature = "video")]

use anyhow::{anyhow, Result};
use std::env;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::{fs, path};

#[derive(Debug, Clone, Copy)]
pub enum VideoFrameFormat {
Jpeg,
Png,
}

impl VideoFrameFormat {
fn extension(self) -> &'static str {
match self {
VideoFrameFormat::Jpeg => "jpg",
VideoFrameFormat::Png => "png",
}
}
}

#[derive(Debug, Clone)]
pub struct VideoFrame {
pub index: usize,
pub path: PathBuf,
}

#[derive(Debug, Clone)]
pub struct VideoProcessor {
frame_step: usize,
max_frames: Option<usize>,
output_format: VideoFrameFormat,
ffmpeg_bin: Option<PathBuf>,
}

impl VideoProcessor {
pub fn new(frame_step: usize) -> Self {
Self {
frame_step: frame_step.max(1),
max_frames: None,
output_format: VideoFrameFormat::Jpeg,
ffmpeg_bin: None,
}
}

pub fn with_max_frames(mut self, max_frames: usize) -> Self {
self.max_frames = Some(max_frames);
self
}

pub fn with_output_format(mut self, output_format: VideoFrameFormat) -> Self {
self.output_format = output_format;
self
}

pub fn with_ffmpeg_bin<P: AsRef<Path>>(mut self, ffmpeg_bin: P) -> Self {
self.ffmpeg_bin = Some(ffmpeg_bin.as_ref().to_path_buf());
self
}

fn resolve_ffmpeg_bin(&self) -> Result<PathBuf> {
if let Some(bin) = &self.ffmpeg_bin {
return Ok(bin.clone());
}
if let Ok(bin) = env::var("FFMPEG_BIN") {
return Ok(PathBuf::from(bin));
}
Ok(PathBuf::from("ffmpeg"))
}

pub fn extract_frames_to_dir<P: AsRef<Path>, Q: AsRef<Path>>(
&self,
video_path: P,
output_dir: Q,
) -> Result<Vec<VideoFrame>> {
let output_dir = output_dir.as_ref();
fs::create_dir_all(output_dir)?;

let ffmpeg_bin = self.resolve_ffmpeg_bin()?;
let frame_step = self.frame_step.max(1);
let filter = format!("select=not(mod(n\\,{}))", frame_step);
let output_pattern = output_dir.join(format!(
"frame_%06d.{}",
self.output_format.extension()
));

let mut command = Command::new(ffmpeg_bin);
command
.arg("-hide_banner")
.arg("-loglevel")
.arg("error")
.arg("-i")
.arg(video_path.as_ref())
.arg("-vf")
.arg(filter)
.arg("-vsync")
.arg("vfr");

if let Some(max_frames) = self.max_frames {
command.arg("-vframes").arg(max_frames.to_string());
}

let status = command.arg(output_pattern).status()?;
if !status.success() {
return Err(anyhow!("ffmpeg failed with exit code {:?}", status.code()));
}

let mut frame_paths = fs::read_dir(output_dir)?
.filter_map(|entry| entry.ok())
.filter(|entry| entry.file_type().map(|t| t.is_file()).unwrap_or(false))
.map(|entry| entry.path())
.filter(|path| {
path.extension()
.and_then(|ext| ext.to_str())
.map(|ext| ext.eq_ignore_ascii_case(self.output_format.extension()))
.unwrap_or(false)
})
.collect::<Vec<path::PathBuf>>();

frame_paths.sort();

if frame_paths.is_empty() {
return Err(anyhow!("No frames extracted from video"));
}

let frames = frame_paths
.into_iter()
.enumerate()
.map(|(index, path)| VideoFrame { index, path })
.collect();

Ok(frames)
}

pub fn extract_frames_to_temp_dir<P: AsRef<Path>>(
&self,
video_path: P,
) -> Result<(tempfile::TempDir, Vec<VideoFrame>)> {
let temp_dir = tempfile::TempDir::new()?;
let frames = self.extract_frames_to_dir(video_path, temp_dir.path())?;
Ok((temp_dir, frames))
}
}
2 changes: 1 addition & 1 deletion python/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ cudnn = ["embed_anything/cudnn"]
metal = ["embed_anything/metal"]
ort = ["embed_anything/ort"]
audio = ["embed_anything/audio"]
aws = ["embed_anything/aws"]
aws = ["embed_anything/aws"]
Loading
Loading