AI Editor

AI-assisted video editing pipeline — from reference analysis to rendered output, with full-stack orchestration, Shorts conversion, and one-click publishing.

What It Does

AI Editor is a multi-stage media pipeline that accepts a reference video and a set of source clips, uses AI analysis to understand structure and style, builds a stage-based edit plan, renders a polished output via the Shotstack API, and optionally converts the result to a vertical Short and publishes to YouTube.

It is not a simple wrapper around an LLM. It integrates:

computer-vision based video & scene analysis (EasyOCR, PaddleOCR, SceneDetect)
AI-driven edit planning via Groq / conversational brief builder
a structured multi-stage pipeline runner with per-job artifact storage
a Shotstack rendering integration with timeline assembly logic
a React frontend with job status tracking and Google Drive/YouTube OAuth

Key Features

Feature	Detail
🎬 Reference video analysis	Scene detection, OCR extraction, structure parsing
🤖 AI edit planning	Groq-powered conversational brief → structured edit plan
🗂 Stage-based pipeline	Ordered stages with state persistence per job
🎞 Shotstack rendering	Timeline assembly → cloud render → artifact storage
✂️ Shorts conversion	16:9 → 9:16 crop, reframe, and post-process
📤 YouTube upload	OAuth 2.0 integration, metadata, direct publish
🗄 Google Drive ingestion	Service-account or OAuth-based asset retrieval
🧪 Unit tests	Coverage for normalization, overlay policy, text segments

Architecture

graph TD
    User(["User / Browser"])
    FE["React Frontend\nVite + REST"]
    API["FastAPI Backend\napp.py"]
    CHAT["Chatbot Interface\nGroq LLM"]
    BRIEF["Edit Brief JSON"]
    ANA["Analyzer\nEasyOCR · PaddleOCR · SceneDetect"]
    PLAN["Edit Plan JSON"]
    RUNNER["Pipeline Runner\npipeline/runner.py"]
    DL["Downloader\nyt-dlp · Google Drive"]
    EDITOR["Editor Builder\nShotstack Timeline"]
    OVERLAY["Overlay Planner"]
    SHORTS["Shorts Converter"]
    SHOTSTACK["Shotstack Render API"]
    ARTIFACTS["Artifact Storage\ntmp/jobs/job_id/"]
    UPLOAD["YouTube Uploader\nGoogle OAuth"]
    GDRIVE["Google Drive"]

    User -->|"chat brief + clips"| FE
    FE -->|"REST calls"| API
    API --> CHAT
    CHAT --> BRIEF
    API --> ANA
    ANA --> PLAN
    BRIEF --> RUNNER
    PLAN --> RUNNER
    API --> RUNNER
    RUNNER --> DL
    RUNNER --> EDITOR
    RUNNER --> OVERLAY
    RUNNER --> SHORTS
    DL --> GDRIVE
    EDITOR -->|"render job"| SHOTSTACK
    SHOTSTACK -->|"video URL"| ARTIFACTS
    SHORTS --> UPLOAD
    ARTIFACTS --> FE
    UPLOAD --> FE

Request Flow

User submits a brief via the React chat interface → Groq LLM refines it into a structured edit plan.
Reference video is analyzed — scenes are detected, text overlays are OCR-extracted, structure is mapped.
Pipeline runner executes ordered stages: asset download → edit assembly → overlay planning → render submission.
Shotstack renders the timeline; the backend polls for completion and stores the artifact.
Optional post-processing converts the render to a 9:16 Short and uploads to YouTube.

See docs/architecture.md for a full module breakdown.

Tech Stack

Layer	Technology
Backend API	Python 3.10+, FastAPI, Uvicorn
AI / Analysis	EasyOCR, PaddleOCR, SceneDetect, OpenCV, Groq API
Edit Planning	Custom planner + LLM-assisted brief builder
Rendering	Shotstack SDK (cloud video rendering)
Asset Ingestion	yt-dlp, Google Drive API (service account + OAuth)
Export	YouTube Data API v3, Google Auth OAuthlib
Frontend	React + Vite
Tests	pytest
Containerization	Docker

Repository Structure

AI_Editor/
├── app.py                    # FastAPI entrypoint — all HTTP routes
├── Dockerfile                # Container build
├── requirements.txt          # Python dependencies
├── .env.example              # Environment variable reference
│
├── ai_editor/                # Core AI & media logic
│   ├── analyzer.py           # Scene detection, OCR, video analysis
│   ├── chatbot_interface.py  # Groq-powered brief builder
│   ├── downloader.py         # yt-dlp + Google Drive asset fetching
│   ├── editor.py             # Shotstack timeline assembly
│   ├── overlay_planner.py    # Text/graphic overlay scheduling
│   ├── youtube_clipper.py    # Clip extraction and trimming
│   ├── youtube_uploader.py   # YouTube OAuth upload flow
│   └── google_auth.py        # Google credential management
│
├── pipeline/                 # Orchestration layer
│   ├── runner.py             # Stage runner (main orchestrator — ~60 KB)
│   ├── state.py              # Per-job state machine
│   ├── artifacts.py          # Artifact path resolution and storage
│   ├── plans/                # Edit plan schemas and planners
│   └── storage/              # Job storage helpers
│
├── frontend/                 # React UI (Vite)
│
├── docs/                     # Documentation
│   ├── assets/               # Screenshots and demo GIF
│   ├── releases/             # Release note drafts
│   ├── API_EXAMPLES.md
│   ├── DEPLOYMENT.md
│   ├── PROJECT_STRUCTURE.md
│   ├── SETUP_GUIDE.md
│   ├── TROUBLESHOOTING.md
│   ├── architecture.md
│   └── pipeline_state.md
│
└── tests/
    ├── test_editor_normalization.py
    ├── test_overlay_policy.py
    └── test_text_segments.py

Quick Start

Prerequisites

Python 3.10+
Node.js 18+
A Shotstack API key (Stage key is free for development)
Optionally: Google Cloud service account for Drive ingestion, Groq API key

1 — Clone and install

git clone https://github.com/CarlAmine/AI_Editor.git
cd AI_Editor
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

2 — Configure environment

cp .env.example .env
# Edit .env with your API keys (see Configuration section below)

3 — Run the backend

python app.py
# API available at http://localhost:8000
# Interactive docs at http://localhost:8000/docs

4 — Run the frontend

cd frontend
npm install
npm run dev
# UI available at http://localhost:5173

Docker (optional)

docker build -t ai-editor .
docker run -p 8000:8000 --env-file .env ai-editor

Configuration

All configuration is via environment variables. Copy .env.example to .env and fill in:

Variable	Required	Description
`SHOTSTACK_KEY`	✅	Shotstack API key (Stage or Production)
`GROQ`	✅	Groq API key for conversational brief builder
`GOOGLE_APPLICATION_CREDENTIALS`	Optional	Path to service account JSON for Drive access
`VIDEO_FOLDER`	Optional	Google Drive folder ID for source assets
`MUSIC_URL`	Optional	Default background music track URL
`DEEPSEEK_KEY`	Optional	Reserved for future LLM integration

See docs/SETUP_GUIDE.md for full configuration details.

API Examples

The FastAPI backend exposes a REST API. Interactive docs are at http://localhost:8000/docs.

# Start a new edit job
curl -X POST http://localhost:8000/jobs \
  -H 'Content-Type: application/json' \
  -d '{"reference_url": "https://...", "brief": "60s highlight reel, energetic style"}'

# Poll job status
curl http://localhost:8000/jobs/{job_id}/status

# Get rendered artifact
curl http://localhost:8000/jobs/{job_id}/artifact

See docs/API_EXAMPLES.md for full request/response examples.

Screenshots

Chat Interface

Job Status

Timeline Plan

Render Flow

Testing

# Run all tests
pytest tests/ -v

# Run a specific suite
pytest tests/test_editor_normalization.py -v

Test coverage:

test_editor_normalization.py — timeline normalization and clip boundary logic
test_overlay_policy.py — overlay scheduling and policy enforcement
test_text_segments.py — text segment parsing and validation

Technical Highlights

Multi-stage pipeline orchestration — pipeline/runner.py coordinates ordered stages with state transitions, retry logic, and per-job artifact isolation.
AI-assisted edit planning — Groq LLM powers the conversational brief builder; output is structured into a machine-readable edit plan JSON.
Scene-aware video analysis — SceneDetect-based shot boundary detection combined with EasyOCR and PaddleOCR for text extraction from frames.
Shotstack timeline assembly — ai_editor/editor.py programmatically constructs Shotstack render specs from clip lists, overlays, and timing metadata.
Overlay planning layer — overlay_planner.py schedules text/graphic elements respecting duration constraints and scene boundaries.
Shorts conversion flow — automatic 16:9 → 9:16 reframe and post-processing for vertical delivery.
Full-stack architecture — FastAPI backend + React frontend, communicating over REST, with Docker support.
Google ecosystem integration — OAuth 2.0 for YouTube upload, service account support for Drive ingestion.
Unit test coverage — pytest suites covering normalization edge cases, overlay policy, and segment logic.

Performance & Benchmarks

🚧 Benchmarking data to be added. Run the pipeline on representative inputs and open a PR to fill in the table.

Metric	Value	Notes
Average job duration	—	End-to-end, reference → rendered artifact
Shotstack render turnaround	—	Dependent on clip count and resolution
OCR extraction latency	—	Per frame, GPU vs CPU
Scene detection latency	—	Per minute of video
Shorts conversion time	—	Post-render
Pipeline success rate	—	Under normal load

Limitations & Known Issues

Shotstack rendering is asynchronous; long videos may require extended polling.
PaddleOCR has a large install footprint; a lighter OCR backend is on the roadmap.
Google Drive OAuth tokens require manual refresh in some environments.
The frontend does not yet support drag-and-drop clip reordering.
No built-in queue/worker system; concurrent jobs run in-process threads.

See docs/TROUBLESHOOTING.md for workarounds.

Roadmap

Short-term

Structured logging and per-stage timing metrics
Shotstack polling with exponential back-off
Asset validation before pipeline start
Expand test coverage to pipeline runner stages
CI workflow (GitHub Actions)

Medium-term

Lighter OCR backend option
Richer timeline editing UI (drag-and-drop, waveform preview)
Additional rendering backends (Creatomate, Remotion)
Smarter shot selection via visual similarity scoring
Automated caption generation (Whisper)
Task queue (Celery / RQ) for concurrent job isolation

Deployment

See docs/DEPLOYMENT.md for Docker-based deployment, reverse proxy setup, and production key configuration.

Contributing

See CONTRIBUTING.md for development setup, coding conventions, and how to submit changes.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
ai_editor		ai_editor
docs		docs
frontend		frontend
pipeline		pipeline
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
YOUTUBE_CLIPPER_DOCS.md		YOUTUBE_CLIPPER_DOCS.md
app.py		app.py
requirements.txt		requirements.txt
verify_google_credentials.py		verify_google_credentials.py

Folders and files

Latest commit

History

Repository files navigation

AI Editor

What It Does

Key Features

Architecture

Request Flow

Tech Stack

Repository Structure

Quick Start

Prerequisites

1 — Clone and install

2 — Configure environment

3 — Run the backend

4 — Run the frontend

Docker (optional)

Configuration

API Examples

Screenshots

Chat Interface

Job Status

Timeline Plan

Render Flow

Testing

Technical Highlights

Performance & Benchmarks

Limitations & Known Issues

Roadmap

Deployment

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages