Welcome to Cut.py, my personal project where I'm building an intelligent video processing pipeline. I designed this tool to automatically generate engaging highlights from long-form videos using the power of local AI models.
I've created a robust, privacy-focused video editor that runs entirely on your local machine. No data leaves your server. Here's what makes it special:
- 🤖 Local AI Analysis: I integrated Llama 3 (via
llama.cpp) to "watch" the video transcripts and understand context. It doesn't just cut randomly; it finds the best parts. - 🗣️ User-Directed Cuts: I added a feature where you can tell the AI exactly what to look for. Want the "funniest moment" or "the part about the budget"? Just ask.
- 🐳 Dockerized FFmpeg: I solved the "it works on my machine" problem for video processing. All video cutting happens inside a Docker container, ensuring consistent results regardless of your host OS (and fixing those annoying Apple Silicon warnings).
- 📝 Precision Transcription: Using Faster Whisper to generate highly accurate subtitles and transcripts, which serve as the foundation for the AI's understanding.
- 🎬 Smart Scene Detection: Using
PySceneDetectto ensure cuts happen at natural scene boundaries, avoiding jarring transitions.
When you upload a video to my API, here's the journey it takes:
- Ingestion: The video is saved, and I generate a unique ID.
- Scene Detection: Scan the video for visual cuts to understand the visual structure.
- Transcription: Extract the audio and transcribe it using the Whisper model.
- AI Analysis: This is the cool part. I feed the transcript and scene data into a local LLM (Mistral/Llama).
- Default Mode: It looks for the most engaging segment.
- Prompt Mode: If you provided a prompt (e.g., "Find the demo"), it searches for that specific content.
- Intelligent Cutting: Once the best segment is identified, I calculate the exact timestamps.
- Processing: Spin up a Docker container to run FFmpeg and surgically extract the clip without re-encoding (stream copy) for blazing fast speed.
I'm sharing here how you can set up my project on your own machine.
- Python 3.13+
- Docker (for the video processing container)
- uv (my preferred package manager)
git clone <repo-url>
cd cut_py
uv syncI wrote a script to download the necessary GGUF models for the LLM.
./setup_models.shYou need the FFmpeg container image for the video editing service to work.
docker build -t ffmpeg-container -f docker/ffmpeg-multiarch.Dockerfile ../start.shThe API will be available at http://localhost:8000.
I've made the API very simple to use. Here is an example of how to generate a highlight.
Let the AI decide what's best.
curl -X POST "http://localhost:8000/highlight/process" \
-F "video=@/path/to/my_video.mp4" \
-F "target_duration=30" \
--output highlight.mp4Tell the AI what you want.
curl -X POST "http://localhost:8000/highlight/process" \
-F "video=@/path/to/podcast.mp4" \
-F "target_duration=60" \
-F "prompt=Find the segment where they discuss the release date" \
--output release_date_clip.mp4- Framework: FastAPI
- AI/LLM: Llama.cpp (Python bindings), Faster Whisper
- Video Processing: FFmpeg (Dockerized), PySceneDetect
- Package Manager: uv