AI-assisted video editing pipeline — from reference analysis to rendered output, with full-stack orchestration, Shorts conversion, and one-click publishing.
AI Editor is a multi-stage media pipeline that accepts a reference video and a set of source clips, uses AI analysis to understand structure and style, builds a stage-based edit plan, renders a polished output via the Shotstack API, and optionally converts the result to a vertical Short and publishes to YouTube.
It is not a simple wrapper around an LLM. It integrates:
- computer-vision based video & scene analysis (EasyOCR, PaddleOCR, SceneDetect)
- AI-driven edit planning via Groq / conversational brief builder
- a structured multi-stage pipeline runner with per-job artifact storage
- a Shotstack rendering integration with timeline assembly logic
- a React frontend with job status tracking and Google Drive/YouTube OAuth
| Feature | Detail |
|---|---|
| 🎬 Reference video analysis | Scene detection, OCR extraction, structure parsing |
| 🤖 AI edit planning | Groq-powered conversational brief → structured edit plan |
| 🗂 Stage-based pipeline | Ordered stages with state persistence per job |
| 🎞 Shotstack rendering | Timeline assembly → cloud render → artifact storage |
| ✂️ Shorts conversion | 16:9 → 9:16 crop, reframe, and post-process |
| 📤 YouTube upload | OAuth 2.0 integration, metadata, direct publish |
| 🗄 Google Drive ingestion | Service-account or OAuth-based asset retrieval |
| 🧪 Unit tests | Coverage for normalization, overlay policy, text segments |
graph TD
User(["User / Browser"])
FE["React Frontend\nVite + REST"]
API["FastAPI Backend\napp.py"]
CHAT["Chatbot Interface\nGroq LLM"]
BRIEF["Edit Brief JSON"]
ANA["Analyzer\nEasyOCR · PaddleOCR · SceneDetect"]
PLAN["Edit Plan JSON"]
RUNNER["Pipeline Runner\npipeline/runner.py"]
DL["Downloader\nyt-dlp · Google Drive"]
EDITOR["Editor Builder\nShotstack Timeline"]
OVERLAY["Overlay Planner"]
SHORTS["Shorts Converter"]
SHOTSTACK["Shotstack Render API"]
ARTIFACTS["Artifact Storage\ntmp/jobs/job_id/"]
UPLOAD["YouTube Uploader\nGoogle OAuth"]
GDRIVE["Google Drive"]
User -->|"chat brief + clips"| FE
FE -->|"REST calls"| API
API --> CHAT
CHAT --> BRIEF
API --> ANA
ANA --> PLAN
BRIEF --> RUNNER
PLAN --> RUNNER
API --> RUNNER
RUNNER --> DL
RUNNER --> EDITOR
RUNNER --> OVERLAY
RUNNER --> SHORTS
DL --> GDRIVE
EDITOR -->|"render job"| SHOTSTACK
SHOTSTACK -->|"video URL"| ARTIFACTS
SHORTS --> UPLOAD
ARTIFACTS --> FE
UPLOAD --> FE
- User submits a brief via the React chat interface → Groq LLM refines it into a structured edit plan.
- Reference video is analyzed — scenes are detected, text overlays are OCR-extracted, structure is mapped.
- Pipeline runner executes ordered stages: asset download → edit assembly → overlay planning → render submission.
- Shotstack renders the timeline; the backend polls for completion and stores the artifact.
- Optional post-processing converts the render to a 9:16 Short and uploads to YouTube.
See docs/architecture.md for a full module breakdown.
| Layer | Technology |
|---|---|
| Backend API | Python 3.10+, FastAPI, Uvicorn |
| AI / Analysis | EasyOCR, PaddleOCR, SceneDetect, OpenCV, Groq API |
| Edit Planning | Custom planner + LLM-assisted brief builder |
| Rendering | Shotstack SDK (cloud video rendering) |
| Asset Ingestion | yt-dlp, Google Drive API (service account + OAuth) |
| Export | YouTube Data API v3, Google Auth OAuthlib |
| Frontend | React + Vite |
| Tests | pytest |
| Containerization | Docker |
AI_Editor/
├── app.py # FastAPI entrypoint — all HTTP routes
├── Dockerfile # Container build
├── requirements.txt # Python dependencies
├── .env.example # Environment variable reference
│
├── ai_editor/ # Core AI & media logic
│ ├── analyzer.py # Scene detection, OCR, video analysis
│ ├── chatbot_interface.py # Groq-powered brief builder
│ ├── downloader.py # yt-dlp + Google Drive asset fetching
│ ├── editor.py # Shotstack timeline assembly
│ ├── overlay_planner.py # Text/graphic overlay scheduling
│ ├── youtube_clipper.py # Clip extraction and trimming
│ ├── youtube_uploader.py # YouTube OAuth upload flow
│ └── google_auth.py # Google credential management
│
├── pipeline/ # Orchestration layer
│ ├── runner.py # Stage runner (main orchestrator — ~60 KB)
│ ├── state.py # Per-job state machine
│ ├── artifacts.py # Artifact path resolution and storage
│ ├── plans/ # Edit plan schemas and planners
│ └── storage/ # Job storage helpers
│
├── frontend/ # React UI (Vite)
│
├── docs/ # Documentation
│ ├── assets/ # Screenshots and demo GIF
│ ├── releases/ # Release note drafts
│ ├── API_EXAMPLES.md
│ ├── DEPLOYMENT.md
│ ├── PROJECT_STRUCTURE.md
│ ├── SETUP_GUIDE.md
│ ├── TROUBLESHOOTING.md
│ ├── architecture.md
│ └── pipeline_state.md
│
└── tests/
├── test_editor_normalization.py
├── test_overlay_policy.py
└── test_text_segments.py
- Python 3.10+
- Node.js 18+
- A Shotstack API key (Stage key is free for development)
- Optionally: Google Cloud service account for Drive ingestion, Groq API key
git clone https://github.com/CarlAmine/AI_Editor.git
cd AI_Editor
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env with your API keys (see Configuration section below)python app.py
# API available at http://localhost:8000
# Interactive docs at http://localhost:8000/docscd frontend
npm install
npm run dev
# UI available at http://localhost:5173docker build -t ai-editor .
docker run -p 8000:8000 --env-file .env ai-editorAll configuration is via environment variables. Copy .env.example to .env and fill in:
| Variable | Required | Description |
|---|---|---|
SHOTSTACK_KEY |
✅ | Shotstack API key (Stage or Production) |
GROQ |
✅ | Groq API key for conversational brief builder |
GOOGLE_APPLICATION_CREDENTIALS |
Optional | Path to service account JSON for Drive access |
VIDEO_FOLDER |
Optional | Google Drive folder ID for source assets |
MUSIC_URL |
Optional | Default background music track URL |
DEEPSEEK_KEY |
Optional | Reserved for future LLM integration |
See docs/SETUP_GUIDE.md for full configuration details.
The FastAPI backend exposes a REST API. Interactive docs are at http://localhost:8000/docs.
# Start a new edit job
curl -X POST http://localhost:8000/jobs \
-H 'Content-Type: application/json' \
-d '{"reference_url": "https://...", "brief": "60s highlight reel, energetic style"}'
# Poll job status
curl http://localhost:8000/jobs/{job_id}/status
# Get rendered artifact
curl http://localhost:8000/jobs/{job_id}/artifactSee docs/API_EXAMPLES.md for full request/response examples.
# Run all tests
pytest tests/ -v
# Run a specific suite
pytest tests/test_editor_normalization.py -vTest coverage:
test_editor_normalization.py— timeline normalization and clip boundary logictest_overlay_policy.py— overlay scheduling and policy enforcementtest_text_segments.py— text segment parsing and validation
- Multi-stage pipeline orchestration —
pipeline/runner.pycoordinates ordered stages with state transitions, retry logic, and per-job artifact isolation. - AI-assisted edit planning — Groq LLM powers the conversational brief builder; output is structured into a machine-readable edit plan JSON.
- Scene-aware video analysis — SceneDetect-based shot boundary detection combined with EasyOCR and PaddleOCR for text extraction from frames.
- Shotstack timeline assembly —
ai_editor/editor.pyprogrammatically constructs Shotstack render specs from clip lists, overlays, and timing metadata. - Overlay planning layer —
overlay_planner.pyschedules text/graphic elements respecting duration constraints and scene boundaries. - Shorts conversion flow — automatic 16:9 → 9:16 reframe and post-processing for vertical delivery.
- Full-stack architecture — FastAPI backend + React frontend, communicating over REST, with Docker support.
- Google ecosystem integration — OAuth 2.0 for YouTube upload, service account support for Drive ingestion.
- Unit test coverage — pytest suites covering normalization edge cases, overlay policy, and segment logic.
🚧 Benchmarking data to be added. Run the pipeline on representative inputs and open a PR to fill in the table.
| Metric | Value | Notes |
|---|---|---|
| Average job duration | — | End-to-end, reference → rendered artifact |
| Shotstack render turnaround | — | Dependent on clip count and resolution |
| OCR extraction latency | — | Per frame, GPU vs CPU |
| Scene detection latency | — | Per minute of video |
| Shorts conversion time | — | Post-render |
| Pipeline success rate | — | Under normal load |
- Shotstack rendering is asynchronous; long videos may require extended polling.
- PaddleOCR has a large install footprint; a lighter OCR backend is on the roadmap.
- Google Drive OAuth tokens require manual refresh in some environments.
- The frontend does not yet support drag-and-drop clip reordering.
- No built-in queue/worker system; concurrent jobs run in-process threads.
See docs/TROUBLESHOOTING.md for workarounds.
Short-term
- Structured logging and per-stage timing metrics
- Shotstack polling with exponential back-off
- Asset validation before pipeline start
- Expand test coverage to pipeline runner stages
- CI workflow (GitHub Actions)
Medium-term
- Lighter OCR backend option
- Richer timeline editing UI (drag-and-drop, waveform preview)
- Additional rendering backends (Creatomate, Remotion)
- Smarter shot selection via visual similarity scoring
- Automated caption generation (Whisper)
- Task queue (Celery / RQ) for concurrent job isolation
See docs/DEPLOYMENT.md for Docker-based deployment, reverse proxy setup, and production key configuration.
See CONTRIBUTING.md for development setup, coding conventions, and how to submit changes.
MIT — see LICENSE.




