Automatically generate viral-ready vertical short clips from long-form gameplay footage using AI-powered scene analysis, GPU-accelerated rendering, and optional AI voiceovers.
AutoShorts analyzes your gameplay videos to identify the most engaging momentsβaction sequences, funny fails, or highlight achievementsβthen automatically crops, renders, and adds subtitles or AI voiceovers to create ready-to-upload short-form content.
Here are some shorts automatically generated from gameplay footage:
| sample 1 | sample 2 | sample 3 | sample 4 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
AutoShorts automatically adapts its editing style, captions, and voiceover personality based on the content and target language. Here are some examples generated entirely by the pipeline:
| Content | Style | Language | Video |
|---|---|---|---|
| Fortnite | Story Roast | πΊπΈ English | Watch Part 1 |
| Indiana Jones | GenZ Slang | πΊπΈ English | Watch Part 1 |
| Battlefield 6 | Dramatic Story | π―π΅ Japanese | Watch Part 1 |
| Indiana Jones | Story News | π¨π³ Chinese | Watch Part 1 |
| Fortnite | Story Roast | πͺπΈ Spanish | Watch Part 1 |
| Fortnite | Story Roast | π·πΊ Russian | Watch Part 1 |
| Indiana Jones | Auto Gameplay | π§π· Portuguese | Watch Part 1 |
- Multi-Provider Support: Choose between OpenAI (GPT-5-mini, GPT-4o) or Google Gemini for scene analysis, or run in
localmode with heuristic scoring (no API needed) - Gemini Deep Analysis Mode π§ : Upload full video to Gemini for context-aware scene detection β the AI sees the whole game, not just short clips
- 7 Semantic Types (all analyzed automatically):
actionβ Combat, kills, intense gameplay, close callsfunnyβ Fails, glitches, unexpected humor, comedic timingclutchβ 1vX situations, comebacks, last-second winswtfβ Unexpected events, "wait what?" moments, random chaosepic_failβ Embarrassing deaths, tragic blunders, game-losing mistakeshypeβ Celebrations, "LET'S GO" energy, peak excitementskillβ Trick shots, IQ plays, advanced mechanics, impressive techniques
- Speech Mode: Uses OpenAI Whisper to transcribe voice/commentary
- AI Captions Mode: AI-generated contextual captions for gameplay without voice
- Caption Styles:
- Classic:
gaming,dramatic,funny,minimal - GenZ Mode β¨:
genz- Slang-heavy reactions ("bruh π", "no cap", "finna") - Story Modes β¨: Narrative-style captions
story_news- Professional esports broadcasterstory_roast- Sarcastic roasting commentarystory_creepypasta- Horror/tension narrativestory_dramatic- Epic cinematic narration
auto- Auto-match style to detected semantic type
- Classic:
- PyCaps Integration: Multiple visual templates including
hype,retro-gaming,neo-minimal - AI Enhancement: Semantic tagging and emoji suggestions (e.g., "HEADSHOT! ππ₯")
- Voice Design Engine: Powered by Qwen3-TTS 1.7B-VoiceDesign for creating unique voices from natural language descriptions
- Dynamic Voice Generation: AI automatically generates voice persona based on caption style + caption content
- Style-Adaptive Voices: Each caption style has a unique voice preset:
- GenZ β Casual energetic voice with modern slang
- Story News β Professional broadcaster
- Story Roast β Sarcastic playful narrator
- Story Creepypasta β Deep ominous voice with tension
- Story Dramatic β Epic movie-trailer narrator
- Natural Language Instructions: Define voice characteristics via text prompts without needing reference audio
- Ultra-Low Latency: Local inference with FlashAttention 2 optimization
- Multilingual Support: Native support for 10+ languages including English, Chinese, Japanese, Korean
- Smart Mixing: Automatic ducking of game audio when voiceover plays
- Scene Detection: Custom implementation using
decord+ PyTorch on GPU - Audio Analysis:
torchaudioon GPU for fast RMS and spectral flux calculation - Video Analysis: GPU streaming via
decordfor stable motion estimation - Image Processing:
cupy(CUDA-accelerated NumPy) for blur and transforms - Rendering: PyTorch + NVENC hardware encoder for ultra-fast rendering
- Scenes ranked by combined action score (audio 0.6 + video 0.4 weights)
- Configurable aspect ratio (default 9:16 for TikTok/Shorts/Reels)
- Smart cropping with optional blurred background for non-vertical footage
- Retry logic during rendering to avoid spurious failures
AutoShorts is designed to work even when optimal components fail:
| Component | Primary | Fallback |
|---|---|---|
| Video Encoding | NVENC (GPU) | libx264 (CPU) |
| Subtitle Rendering | PyCaps (styled) | FFmpeg burn-in (basic) |
| AI Analysis | OpenAI/Gemini API | Heuristic scoring (local mode) |
| TTS Device | GPU (6GB+ VRAM) | CPU Fallback (slower) |
- NVIDIA GPU with CUDA support (6GB+ VRAM recommended for Qwen3-TTS 1.7B)
- NVIDIA Drivers and System RAM (16GB+ recommended)
- Python 3.10
- FFmpeg 4.4.2 (for Decord compatibility)
- CUDA Toolkit with
nvcc(for building Decord from source) - System libraries:
libgl1,libglib2.0-0
The Makefile handles everything automaticallyβenvironment creation, dependency installation, and building Decord with CUDA support.
git clone https://github.com/divyaprakash0426/autoshorts.git
cd autoshorts
# Run the installer (uses conda/micromamba automatically)
make install
# Setup environment variables
cp .env.example .env
# Edit .env and add your API keys (Gemini/OpenAI)
# Activate the environment
overlay use .venv/bin/activate.nu # For Nushell
# OR
source .venv/bin/activate # For Bash/ZshThe Makefile will:
- Download micromamba if conda/mamba is not found
- Create a Python 3.10 environment with FFmpeg 4.4.2
- Install NV Codec Headers for NVENC support
- Build Decord from source with CUDA enabled
- Install all pip requirements
Prerequisite: NVIDIA Container Toolkit must be installed.
# Build the image
docker build -t autoshorts .
# Run with GPU access
docker run --rm \
--gpus all \
-v $(pwd)/gameplay:/app/gameplay \
-v $(pwd)/generated:/app/generated \
--env-file .env \
autoshortsNote: The
--gpus allflag is essential for NVENC and CUDA acceleration.
Copy .env.example to .env and configure:
cp .env.example .env| Category | Variable | Description |
|---|---|---|
| AI Provider | AI_PROVIDER |
openai, gemini, or local (heuristic-only, no API) |
AI_ANALYSIS_ENABLED |
Enable/disable AI scene analysis | |
GEMINI_DEEP_ANALYSIS |
Gemini-only: upload full video for smarter scene detection (slower initial upload, better results) | |
OPENAI_MODEL |
Model for analysis (e.g., gpt-5-mini) |
|
AI_SCORE_WEIGHT |
How much to weight AI vs heuristic (0.0-1.0) | |
| Semantic Analysis | SEMANTIC_TYPES |
All 7 types analyzed: action, funny, clutch, wtf, epic_fail, hype, skill |
CANDIDATE_CLIP_COUNT |
Number of clips to analyze | |
| Subtitles | ENABLE_SUBTITLES |
Enable subtitle generation |
SUBTITLE_MODE |
speech (Whisper), ai_captions, or none |
|
CAPTION_STYLE |
gaming, dramatic, funny, minimal, genz, story_news, story_roast, story_creepypasta, story_dramatic, auto |
|
PYCAPS_TEMPLATE |
Visual template for captions | |
| TTS Voiceover | ENABLE_TTS |
Enable Qwen3-TTS voiceover |
TTS_LANGUAGE |
Language code (en, zh, ja, ko, de, fr, ru, pt, es, it) |
|
TTS_VOICE_DESCRIPTION |
Natural language voice description (auto-generated if empty) | |
TTS_GAME_AUDIO_VOLUME |
Game audio volume when TTS plays (0.0-1.0, default 0.3) | |
TTS_VOICEOVER_VOLUME |
TTS voiceover volume (0.0-1.0, default 1.0) | |
| Video Output | TARGET_RATIO_W/H |
Aspect ratio (default 9:16) |
SCENE_LIMIT |
Max clips per source video | |
MIN/MAX_SHORT_LENGTH |
Clip duration bounds (seconds) |
See .env.example for the complete list with detailed descriptions.
-
Place source videos in the
gameplay/directory -
Run the script:
python run.py
-
Generated clips are saved to
generated/
Launch the local dashboard to configure settings, start jobs, and preview clips:
streamlit run src/dashboard/About.py| About | Generate | Browse |
|---|---|---|
![]() |
![]() |
![]() |
| Features | Settings | Roadmap |
|---|---|---|
![]() |
![]() |
![]() |
generated/
βββ video_name scene-0.mp4 # Rendered short clip
βββ video_name scene-0_sub.json # Subtitle data
βββ video_name scene-0.ffmpeg.log # Render log
βββ video_name scene-1.mp4
βββ ...
pip install ruff
ruff check .pytest -qTests mock GPU availability and can run in standard CI environments.
For faster iteration during development, you can skip expensive steps using these environment variables in your .env:
| Variable | Description |
|---|---|
DEBUG_SKIP_ANALYSIS=1 |
Skip AI scene analysis (uses cached/heuristic scores) |
DEBUG_SKIP_RENDER=1 |
Skip video rendering (useful for testing analysis only) |
DEBUG_RENDERED_CLIPS="path1:category,path2" |
Test with specific pre-rendered clips |
Example workflow for testing subtitles only:
# In .env
DEBUG_SKIP_ANALYSIS=1
DEBUG_SKIP_RENDER=1
DEBUG_RENDERED_CLIPS="generated/test_clip.mp4:action"| Issue | Solution |
|---|---|
| "CUDA not available" | Ensure --gpus all (Docker) or CUDA toolkit is installed |
| NVENC Error | Falls back to libx264 automatically; check GPU driver |
| PyCaps fails | Falls back to FFmpeg burn-in subtitles automatically |
| Decord EOF hang | Increase DECORD_EOF_RETRY_MAX or set DECORD_SKIP_TAIL_FRAMES=300 |
| API rate limits | Switch to gpt-5-mini (10M free tokens/day) or use local provider |
We love contributions! Whether you're fixing a bug, adding a feature, or improving documentation:
- Check out our Contributing Guide to get started.
- See the Roadmap for our future plans (YOLO Auto-Zoom, Next-Gen TTS, etc.).
This project builds upon the excellent work of:
- artryazanov/shorts-maker-gpu β Heuristics-based shorts maker
- Binary-Bytes/Auto-YouTube-Shorts-Maker β Original concept and inspiration
This project is licensed under the MIT License.
Note: All donations go to charity.










