ClipRank is an AI-assisted video analysis system that automatically identifies, ranks, and exports high-quality short-form clips (≈50–60 seconds) from longer video content.
It is designed to simulate how a human editor finds compelling moments — using transcription, structured segmentation, heuristic scoring, and diversity filtering to surface the most engaging segments of a video.
Modern content workflows rely heavily on repurposing long-form content into short-form clips for platforms like:
- YouTube Shorts
- TikTok
- Instagram Reels
- X (Twitter) video
ClipRank automates this process.
Instead of manually scrubbing through video timelines, ClipRank:
- analyzes spoken language
- identifies high-value segments
- ranks them using multiple signals
- outputs ready-to-use clips
The system operates as a multi-stage processing pipeline:
Video Input
↓
Transcription (faster-whisper)
↓
Timestamped Segments
↓
Candidate Window Generation
↓
Multi-Factor Scoring Engine
↓
Diversity Filtering (timeline-aware)
↓
Top Clip Selection
↓
FFmpeg Clip Export
↓
Report Generation
- Uses faster-whisper for speech-to-text
- Produces timestamped segments
- Detects language automatically
- Saves a human-readable transcript for review
- Saves structured transcript JSON for downstream use
- Exposes Whisper settings through
config.py
Example:
[0.0s → 7.6s] Opening statement...
[7.6s → 13.2s] Follow-up commentary...
- Builds candidate windows from adjacent transcript segments
- Prefers cleaner starts after pauses, sentence breaks, or stronger openers
- Scores likely end points before keeping windows
- Targets ~50–60 second clips while allowing a wider generation range
- Avoids flooding the scorer with near-duplicate candidates
Example run:
135 transcript segments
→ 46 candidate clip windows
Each clip is evaluated using multiple heuristics:
-
Hook Strength
- Detects attention-grabbing language
- Questions, tension, contrast, strong openings
-
Emotional Intensity
- Emphatic, reactive, or emotionally charged wording
-
Value Density
- Explanatory or insight-heavy language
-
Pacing
- Words per second vs clip duration
-
Duration Fit
- Alignment with ideal short-form length
total_score = hook + opening_hook + emotional + value + pacing + duration
Each clip also includes:
- scoring breakdown
- human-readable notes explaining WHY it ranked
Prevents redundant or overlapping clips.
- Limits timeline overlap ratio
- Enforces minimum start-time gap
- Checks lightweight transcript similarity
- Uses a stricter first pass with a fallback pass if too few clips survive
- Ensures clips are spread across the video timeline
This transforms raw ranking into usable output.
- Uses FFmpeg
- Extracts clips using timestamps
- Builds safer export filenames from source title + clip timing + clip id
- Outputs
.mp4files
workspace/runs/<source>_<timestamp>/clips/
Each run produces a detailed report:
workspace/runs/<source>_<timestamp>/reports/
Includes:
- timestamps
- full score breakdown
- transcript preview
- reasoning notes
- exported clip file path
The project is currently a working end-to-end local pipeline.
What is already implemented:
- local video file validation
- real Whisper-based transcription
- timestamped transcript segments
- saved transcript text and transcript JSON
- smarter candidate generation
- multi-factor heuristic scoring
- timeline-aware diversity filtering
- transcript-aware diversity filtering
- FFmpeg clip export
- text report generation with transcript previews
Recent validated run:
- Input:
workspace/input/MTG on Trump’s Iran war ‘Why would an American president do that’.mp4 - Result: 135 transcript segments, 46 candidate clips, 5 selected clips, 5 exported files
Additional real-world validation:
- Input:
workspace/input/Weve won - Trump speaks on Iran, Straight of Hormuz, NATO, executions, Israel.mp4 - Result: 112 transcript segments, 23 candidate clips, 5 selected clips, 5 exported files
Current conclusion:
- The system is robust enough to move on from heavy tuning
- The biggest remaining quality limiter is messy-source transcription accuracy
- A final stronger-model transcription pass was tested and rejected for this phase because the latency cost was too high relative to the gain
cliprank/
├── main.py # Pipeline entry point
├── streamlit_app.py # Streamlit demo interface
├── pipeline.py # Reusable engine runner for CLI/UI
├── profiles.py # Demo content profiles
├── run_demo.command # macOS demo launcher
├── run_demo.bat # Windows demo launcher
├── config.py # Global configuration
├── models.py # Data models
│
├── transcription/ # Speech-to-text logic
├── segmentation/ # Clip window generation
├── scoring/ # Heuristics + ranking + diversity
├── export/ # FFmpeg + reporting
├── ingest/ # Input validation
├── utils/ # Helpers
│
├── workspace/
│ ├── input/ # Source videos
│ ├── transcript/ # Generated transcripts
│ ├── reports/ # Analysis reports
│ ├── clips/ # Exported clips
│
└── docs/ # Documentation
Before running this project, ensure you have:
- Python 3.10+
- FFmpeg installed and available in your system
PATH
Mac (Homebrew):
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegWindows:
Download from https://ffmpeg.org/download.html and add it to your PATH.
Clone the repository and install dependencies:
git clone https://github.com/Squawk7200/ClipRank.git
cd cliprank
python3 -m venv venv
./venv/bin/pip install -r requirements.txtRuntime note:
faster-whispermodel files are downloaded on first use- The repo-local
venvis the expected Python environment for running ClipRank
Run the CLI tool on a video file:
./venv/bin/python main.py "workspace/input/your_video.mp4"Default validation run:
./venv/bin/python main.py "workspace/input/MTG on Trump’s Iran war ‘Why would an American president do that’.mp4"Example with creator controls:
./venv/bin/python main.py "workspace/input/your_video.mp4" --top-clips 6 --min-seconds 40 --max-seconds 70 --target-seconds 55 --profile news --keywords "iran, nato, executions"Recommended testing note:
- Use the MTG file above as the default validation input going forward
workspace/input/test.mp4is not a valid media file and should not be used for pipeline validation
ClipRank now also includes a Streamlit demo intended for easy portfolio use from a GitHub download.
What the demo supports:
- upload common video or audio formats
- choose how many clips to generate
- adjust minimum, maximum, and target clip length
- choose a content profile
- add custom keywords
- download clips, transcript files, and the report from the browser
Main demo file:
streamlit_app.pyGitHub-friendly launcher files:
run_demo.commandfor macOSrun_demo.batfor Windows
Typical demo flow:
- Download the repository ZIP from GitHub and unzip it.
- Ensure Python and FFmpeg are installed.
- Double-click the launcher for your platform.
- Wait for the first-run dependency install and model initialization.
- The Streamlit app opens in your browser locally.
Manual launch commands:
Mac/Linux:
python3 -m venv venv
./venv/bin/pip install -r requirements.txt
./venv/bin/streamlit run streamlit_app.pyWindows:
py -3 -m venv venv
venv\Scripts\python -m pip install -r requirements.txt
venv\Scripts\streamlit run streamlit_app.pyImportant note:
- this is a good demo experience for GitHub and portfolio sharing
- it is not the same as a fully packaged desktop app yet
- average creators may still need Python installed for this demo version
- 📁 Per-run folder →
workspace/runs/<source>_<timestamp>/ - 📄 Transcript text + JSON →
workspace/runs/<source>_<timestamp>/transcript/ - 📊 Report →
workspace/runs/<source>_<timestamp>/reports/ - 🎬 Clips →
workspace/runs/<source>_<timestamp>/clips/
Transcript outputs now include:
*_transcript.txtfor human-readable review*_transcript.jsonfor structured downstream use
Report output includes:
- selected clips
- score breakdown
- transcript preview
- exported file path
Exported clip filenames now look like:
mtg-on-trump-s-iran-war-why-would-an-american-president-do-that_0043s_0098s_clip_008.mp4
Edit config.py:
TOP_CLIP_COUNT = 5
Other tunable parameters:
- clip duration targets
- transcript paragraph formatting
- Whisper model settings
- diversity spacing
Current transcription-related settings include:
WHISPER_MODEL_SIZEWHISPER_COMPUTE_TYPEWHISPER_BEAM_SIZEWHISPER_VAD_FILTERWHISPER_WORD_TIMESTAMPS
CLI runtime options now include:
--top-clips--min-seconds--max-seconds--target-seconds--profile--keywords
Captured 135 transcript segment(s)
Created 46 candidate clip(s)
Scored 46 clip(s)
Kept 5 diverse clip(s)
Exported 5 clip files
- Python
- faster-whisper (speech-to-text transcription)
- pydantic (data modeling and validation)
- streamlit (demo UI)
- typer (CLI interface)
- FFmpeg (video/audio processing)
- End-to-end automated pipeline
- Timestamp-aware transcription
- Human-readable transcript output
- Structured transcript JSON output
- Smarter transcript-aware candidate generation
- Multi-factor scoring system with opening-hook emphasis
- Timeline and transcript-aware diversity filtering
- Automated clip export with safer filenames
- Structured reporting with transcript previews and export paths
- Better FFmpeg error handling and possibly re-encoding for cleaner cuts
- Cleaner Streamlit UX and stronger progress/error messaging
- Save or reopen prior runs more easily from the demo
- Batch processing
- Packaged desktop releases for macOS and Windows
- Optional future experimentation with stronger transcription models if runtime budget allows
ClipRank was built to:
- demonstrate real-world system design
- model production-grade content workflows
- explore AI-assisted media tooling
- serve as a portfolio-ready project
Developed as part of an evolving portfolio in:
- Software Development
- AI-assisted systems
- Media automation pipelines
ClipRank is not just a script — it is a modular, extensible system that bridges:
- AI (speech + heuristics)
- backend engineering
- media processing
It reflects real-world thinking around:
- pipelines
- ranking systems
- content automation
- more to follow