Skip to content

Squawk7200/ClipRank

Repository files navigation

🎬 ClipRank

ClipRank is an AI-assisted video analysis system that automatically identifies, ranks, and exports high-quality short-form clips (≈50–60 seconds) from longer video content.

It is designed to simulate how a human editor finds compelling moments — using transcription, structured segmentation, heuristic scoring, and diversity filtering to surface the most engaging segments of a video.


🚀 Overview

Modern content workflows rely heavily on repurposing long-form content into short-form clips for platforms like:

  • YouTube Shorts
  • TikTok
  • Instagram Reels
  • X (Twitter) video

ClipRank automates this process.

Instead of manually scrubbing through video timelines, ClipRank:

  • analyzes spoken language
  • identifies high-value segments
  • ranks them using multiple signals
  • outputs ready-to-use clips

🧠 Core Pipeline

The system operates as a multi-stage processing pipeline:

Video Input
    ↓
Transcription (faster-whisper)
    ↓
Timestamped Segments
    ↓
Candidate Window Generation
    ↓
Multi-Factor Scoring Engine
    ↓
Diversity Filtering (timeline-aware)
    ↓
Top Clip Selection
    ↓
FFmpeg Clip Export
    ↓
Report Generation

⚙️ How It Works

1. Transcription

  • Uses faster-whisper for speech-to-text
  • Produces timestamped segments
  • Detects language automatically
  • Saves a human-readable transcript for review
  • Saves structured transcript JSON for downstream use
  • Exposes Whisper settings through config.py

Example:

[0.0s → 7.6s] Opening statement...
[7.6s → 13.2s] Follow-up commentary...

2. Segmentation

  • Builds candidate windows from adjacent transcript segments
  • Prefers cleaner starts after pauses, sentence breaks, or stronger openers
  • Scores likely end points before keeping windows
  • Targets ~50–60 second clips while allowing a wider generation range
  • Avoids flooding the scorer with near-duplicate candidates

Example run:

135 transcript segments
→ 46 candidate clip windows

3. Scoring Engine (v2)

Each clip is evaluated using multiple heuristics:

🎯 Signals Used

  • Hook Strength

    • Detects attention-grabbing language
    • Questions, tension, contrast, strong openings
  • Emotional Intensity

    • Emphatic, reactive, or emotionally charged wording
  • Value Density

    • Explanatory or insight-heavy language
  • Pacing

    • Words per second vs clip duration
  • Duration Fit

    • Alignment with ideal short-form length

🧮 Final Score

total_score = hook + opening_hook + emotional + value + pacing + duration

Each clip also includes:

  • scoring breakdown
  • human-readable notes explaining WHY it ranked

4. Diversity Filtering (v2)

Prevents redundant or overlapping clips.

Current rules:

  • Limits timeline overlap ratio
  • Enforces minimum start-time gap
  • Checks lightweight transcript similarity
  • Uses a stricter first pass with a fallback pass if too few clips survive
  • Ensures clips are spread across the video timeline

This transforms raw ranking into usable output.


5. Clip Export

  • Uses FFmpeg
  • Extracts clips using timestamps
  • Builds safer export filenames from source title + clip timing + clip id
  • Outputs .mp4 files
workspace/runs/<source>_<timestamp>/clips/

6. Report Generation

Each run produces a detailed report:

workspace/runs/<source>_<timestamp>/reports/

Includes:

  • timestamps
  • full score breakdown
  • transcript preview
  • reasoning notes
  • exported clip file path

✅ Current State

The project is currently a working end-to-end local pipeline.

What is already implemented:

  • local video file validation
  • real Whisper-based transcription
  • timestamped transcript segments
  • saved transcript text and transcript JSON
  • smarter candidate generation
  • multi-factor heuristic scoring
  • timeline-aware diversity filtering
  • transcript-aware diversity filtering
  • FFmpeg clip export
  • text report generation with transcript previews

Recent validated run:

  • Input: workspace/input/MTG on Trump’s Iran war ‘Why would an American president do that’.mp4
  • Result: 135 transcript segments, 46 candidate clips, 5 selected clips, 5 exported files

Additional real-world validation:

  • Input: workspace/input/Weve won - Trump speaks on Iran, Straight of Hormuz, NATO, executions, Israel.mp4
  • Result: 112 transcript segments, 23 candidate clips, 5 selected clips, 5 exported files

Current conclusion:

  • The system is robust enough to move on from heavy tuning
  • The biggest remaining quality limiter is messy-source transcription accuracy
  • A final stronger-model transcription pass was tested and rejected for this phase because the latency cost was too high relative to the gain

📂 Project Structure

cliprank/
├── main.py                 # Pipeline entry point
├── streamlit_app.py        # Streamlit demo interface
├── pipeline.py             # Reusable engine runner for CLI/UI
├── profiles.py             # Demo content profiles
├── run_demo.command        # macOS demo launcher
├── run_demo.bat            # Windows demo launcher
├── config.py               # Global configuration
├── models.py               # Data models
│
├── transcription/          # Speech-to-text logic
├── segmentation/           # Clip window generation
├── scoring/                # Heuristics + ranking + diversity
├── export/                 # FFmpeg + reporting
├── ingest/                 # Input validation
├── utils/                  # Helpers
│
├── workspace/
│   ├── input/              # Source videos
│   ├── transcript/         # Generated transcripts
│   ├── reports/            # Analysis reports
│   ├── clips/              # Exported clips
│
└── docs/                   # Documentation

✅ Requirements

Before running this project, ensure you have:

  • Python 3.10+
  • FFmpeg installed and available in your system PATH

Install FFmpeg

Mac (Homebrew):

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows:

Download from https://ffmpeg.org/download.html and add it to your PATH.


📦 Installation

Clone the repository and install dependencies:

git clone https://github.com/Squawk7200/ClipRank.git
cd cliprank
python3 -m venv venv
./venv/bin/pip install -r requirements.txt

Runtime note:

  • faster-whisper model files are downloaded on first use
  • The repo-local venv is the expected Python environment for running ClipRank

▶️ Usage

Run the CLI tool on a video file:

./venv/bin/python main.py "workspace/input/your_video.mp4"

Default validation run:

./venv/bin/python main.py "workspace/input/MTG on Trump’s Iran war ‘Why would an American president do that’.mp4"

Example with creator controls:

./venv/bin/python main.py "workspace/input/your_video.mp4" --top-clips 6 --min-seconds 40 --max-seconds 70 --target-seconds 55 --profile news --keywords "iran, nato, executions"

Recommended testing note:

  • Use the MTG file above as the default validation input going forward
  • workspace/input/test.mp4 is not a valid media file and should not be used for pipeline validation

🌐 Streamlit Demo

ClipRank now also includes a Streamlit demo intended for easy portfolio use from a GitHub download.

What the demo supports:

  • upload common video or audio formats
  • choose how many clips to generate
  • adjust minimum, maximum, and target clip length
  • choose a content profile
  • add custom keywords
  • download clips, transcript files, and the report from the browser

Main demo file:

streamlit_app.py

GitHub-friendly launcher files:

  • run_demo.command for macOS
  • run_demo.bat for Windows

Typical demo flow:

  1. Download the repository ZIP from GitHub and unzip it.
  2. Ensure Python and FFmpeg are installed.
  3. Double-click the launcher for your platform.
  4. Wait for the first-run dependency install and model initialization.
  5. The Streamlit app opens in your browser locally.

Manual launch commands:

Mac/Linux:

python3 -m venv venv
./venv/bin/pip install -r requirements.txt
./venv/bin/streamlit run streamlit_app.py

Windows:

py -3 -m venv venv
venv\Scripts\python -m pip install -r requirements.txt
venv\Scripts\streamlit run streamlit_app.py

Important note:

  • this is a good demo experience for GitHub and portfolio sharing
  • it is not the same as a fully packaged desktop app yet
  • average creators may still need Python installed for this demo version

Output

  • 📁 Per-run folder → workspace/runs/<source>_<timestamp>/
  • 📄 Transcript text + JSON → workspace/runs/<source>_<timestamp>/transcript/
  • 📊 Report → workspace/runs/<source>_<timestamp>/reports/
  • 🎬 Clips → workspace/runs/<source>_<timestamp>/clips/

Transcript outputs now include:

  • *_transcript.txt for human-readable review
  • *_transcript.json for structured downstream use

Report output includes:

  • selected clips
  • score breakdown
  • transcript preview
  • exported file path

Exported clip filenames now look like:

mtg-on-trump-s-iran-war-why-would-an-american-president-do-that_0043s_0098s_clip_008.mp4

⚙️ Configuration

Edit config.py:

TOP_CLIP_COUNT = 5

Other tunable parameters:

  • clip duration targets
  • transcript paragraph formatting
  • Whisper model settings
  • diversity spacing

Current transcription-related settings include:

  • WHISPER_MODEL_SIZE
  • WHISPER_COMPUTE_TYPE
  • WHISPER_BEAM_SIZE
  • WHISPER_VAD_FILTER
  • WHISPER_WORD_TIMESTAMPS

CLI runtime options now include:

  • --top-clips
  • --min-seconds
  • --max-seconds
  • --target-seconds
  • --profile
  • --keywords

🧪 Example Run

Captured 135 transcript segment(s)
Created 46 candidate clip(s)
Scored 46 clip(s)
Kept 5 diverse clip(s)

Exported 5 clip files

🛠 Tech Stack

  • Python
  • faster-whisper (speech-to-text transcription)
  • pydantic (data modeling and validation)
  • streamlit (demo UI)
  • typer (CLI interface)
  • FFmpeg (video/audio processing)

🔥 Current Capabilities

  • End-to-end automated pipeline
  • Timestamp-aware transcription
  • Human-readable transcript output
  • Structured transcript JSON output
  • Smarter transcript-aware candidate generation
  • Multi-factor scoring system with opening-hook emphasis
  • Timeline and transcript-aware diversity filtering
  • Automated clip export with safer filenames
  • Structured reporting with transcript previews and export paths

🚧 Future Improvements

  • Better FFmpeg error handling and possibly re-encoding for cleaner cuts
  • Cleaner Streamlit UX and stronger progress/error messaging
  • Save or reopen prior runs more easily from the demo
  • Batch processing
  • Packaged desktop releases for macOS and Windows
  • Optional future experimentation with stronger transcription models if runtime budget allows

🎯 Purpose

ClipRank was built to:

  • demonstrate real-world system design
  • model production-grade content workflows
  • explore AI-assisted media tooling
  • serve as a portfolio-ready project

👤 Author

Developed as part of an evolving portfolio in:

  • Software Development
  • AI-assisted systems
  • Media automation pipelines

⭐ Summary

ClipRank is not just a script — it is a modular, extensible system that bridges:

  • AI (speech + heuristics)
  • backend engineering
  • media processing

It reflects real-world thinking around:

  • pipelines
  • ranking systems
  • content automation
  • more to follow

About

AI-assisted video clipping pipeline using transcription, scoring, and FFmpeg

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages