A sophisticated, end-to-end pipeline for generating engaging YouTube Shorts videos from text stories. This project automates the entire workflow from story summarization through final video production, leveraging AI and advanced multimedia processing.
BrainRot transforms text-based stories into polished YouTube Shorts-optimized videos with:
- Intelligent story summarization using Google's Gemini AI
- Dynamic script generation with engaging narration
- High-quality text-to-speech voice synthesis (Edge-TTS)
- Automatic subtitle generation with speech-to-text (Faster-Whisper)
- Professional video composition with background integration
- Metadata generation (titles, hashtags, descriptions)
Target Format: Vertical videos (1080ร1920) optimized for YouTube Shorts (55-85 seconds)
BrainRot/
โโโ main.py # ๐ด Primary orchestration pipeline
โโโ summarizer.py # AI-powered story summarization
โโโ script_generator.py # Dynamic narration script generation
โโโ voice_generator.py # Text-to-speech voice synthesis
โโโ subtitles.py # Automatic subtitle generation
โโโ video_ffmpeg.py # Video composition & encoding
โโโ video_maker.py # Alternative video generation (MoviePy)
โโโ models.py # Pydantic data models
โโโ trial.py # Voice testing utility
โโโ main_1.py # Alternative TTS pipeline (archived)
โโโ backgrounds/
โ โโโ minecraft.mp4 # Sample background video
โโโ Input Stories/ # Source story files
โ โโโ Humble Pi/
โ โโโ Chapter 1/
โ โ โโโ 1-1.txt through 1-5.txt
โ โ โโโ ...
โ โโโ Chapter 2/
โ โโโ *.txt files
โโโ output/ # Generated video outputs
โโโ [Video Title]/
โโโ voice.mp3
โโโ subtitles.ass
โโโ [title].mp4
โโโ hashtags.txt
โโโ description.txt
Uses Google's Gemini 2.5 Flash model to intelligently summarize input stories.
Key Features:
- Removes personal author information
- Preserves all proper nouns and scientific facts
- Retains engaging human expressions
- Optimizes for YouTube Shorts narration
Usage:
from summarizer import summarize_story
summary = summarize_story("./Input Stories/path/to/story.txt")Converts summaries into fast-paced, scientifically accurate narration scripts.
Output Structure (Pydantic Model):
script_text: 55-85 second narrationscript_title: Catchy video titlehashtags: 5-10 relevant hashtagsdescription: YouTube Shorts description
Generation Rules:
- Opens with the scientific/mathematical concept
- Uses conversational, light-hearted tone
- Includes clear causality explanation
- Avoids filler and exaggeration
- No special symbols or formatting
Usage:
from script_generator import generate_script
script_text, title, hashtags, description = generate_script(summary)Generates natural-sounding narration using Microsoft Edge-TTS.
Features:
- Random voice selection (Ryan or Sonia)
- Configurable speed multiplier (default: 1.15x for Shorts pacing)
- Async TTS generation
- Optional speed enhancement via FFmpeg
- Output: MP3 format
Available Voices:
en-GB-RyanNeural(Male)en-GB-SoniaNeural(Female)
Usage:
from voice_generator import generate_voice
voice_path = generate_voice(script_text, "output/voice.mp3", speed_multiplier=1.15)Generates ASS-format subtitles using OpenAI's Faster-Whisper model.
Features:
- Word-level timestamp accuracy
- Yellow-to-white highlight animation
- ASS format (compatible with FFmpeg subtitle filter)
- Optimized for vertical video (1080ร1920)
- CPU-based inference (INT8 quantization)
Output Format:
- Each word receives individual timing
- Karaoke-style highlighting effect
- ASS metadata optimized for Shorts
Usage:
from subtitles import generate_subtitles
subs_path = generate_subtitles("voice.mp3", "output/subtitles.ass")Produces the final video using FFmpeg with advanced filtering.
Features:
- Vertical crop optimization (9:16 aspect ratio)
- Intelligent background timing (random offset within duration)
- Audio-video synchronization
- Subtitle embedding
- H.264 video codec (high profile, level 4.2)
- AAC audio codec (192k bitrate)
Processing Pipeline:
- Random background offset calculation
- Vertical crop filter (9:16 aspect ratio)
- Scale to 1080ร1920
- Subtitle overlay
- Audio sync to narration duration
- Encoding to H.264/AAC
Usage:
from video_ffmpeg import make_video_ffmpeg
output = make_video_ffmpeg(
background="./backgrounds/minecraft.mp4",
audio="voice.mp3",
subtitles="subtitles.ass",
output="output/final_video.mp4"
)Pydantic models ensuring type safety and validation.
Models:
Summarizer: Summary output validationScript: Structured script generation output
The complete end-to-end pipeline connecting all components.
Workflow:
- Read input story file
- Summarize using Gemini
- Generate script with metadata
- Create output directory structure
- Generate voice narration
- Generate subtitles
- Compose final video
- Save hashtags and description
- Clean up temporary files
Key Features:
- Filename sanitization for cross-platform compatibility
- Organized output directory per video
- Automatic directory creation
- Temporary subtitle handling for FFmpeg compatibility
- Comprehensive progress logging
Usage:
python main.py- Python 3.10+
- FFmpeg and FFprobe (for video processing)
- Edge-TTS support libraries
pip install -r requirements.txtKey Dependencies:
langchain
langchain-google-genai
google-generativeai
edge-tts
faster-whisper
moviepy
pydantic
python-dotenv
Create a .env file in the project root:
GEMINI_API_KEY=your_google_gemini_api_key_hereffmpeg -version
ffprobe -versionPlace story files in:
./Input Stories/[Book Name]/[Chapter]/[Story].txt
Example:
./Input Stories/Humble Pi/Chapter 2/2.txt
- Update the input story path in
main.py:
story_path = "./Input Stories/Humble Pi/Chapter 2/2.txt"-
Ensure a background video exists in
./backgrounds/ -
Run the pipeline:
python main.pyAfter running, you'll find generated content in ./output/[Video Title]/:
output/
โโโ The Wobbly Bridge When Physics Shook London/
โโโ voice.mp3 # Generated narration
โโโ subtitles.ass # SRT-format subtitles
โโโ [title].mp4 # Final video
โโโ hashtags.txt # Social media hashtags
โโโ description.txt # YouTube description
Change Voice:
Edit voice_generator.py:
VOICES = [
"en-GB-RyanNeural", # Male
"en-GB-SoniaNeural" # Female
# Add more voices as needed
]Adjust Narration Speed:
generate_voice(script_text, "voice.mp3", speed_multiplier=1.2)Modify Video Resolution:
In video_ffmpeg.py, adjust the scale filter:
vf_filter = "crop=ih*9/16:ih:(iw-ih*9/16)/2:0,scale=1920:3040,..." # 2K ShortsChange Background Video:
make_video_ffmpeg(
background="./backgrounds/your_video.mp4",
...
)Input Story
โ
[Summarizer] โ Summarize with Gemini
โ
[Script Generator] โ Create narration + metadata
โ
[Voice Generator] โ TTS with Edge-TTS
โ
[Subtitle Generator] โ Whisper transcription
โ
[Video Composer] โ FFmpeg rendering
โ
Final Video + Metadata
Test different TTS voices interactively:
python trial.pyThis utility:
- Cycles through available voices
- Plays each voice sample
- Allows manual selection
- Useful for audio quality testing
- Whisper Model: Use "tiny" instead of "base" for speed (lower accuracy)
- Video Encoding: Change preset from "fast" to "ultrafast" (lower quality)
- FFmpeg Concurrency: Set threads appropriately for your CPU
- Whisper Model: Use "small" or "medium" (slower)
- Video Encoding: Use "slow" preset (takes longer)
- Voice Speed: Lower
SPEED_MULTIPLIERfor clearer speech
- Process videos in batches with separate background videos
- Use CPU mode for Whisper on limited VRAM systems
- Consider streaming background video clips
Solution: Install FFmpeg from https://ffmpeg.org/download.html and add to PATH
Solution: Verify video dimensions in video_ffmpeg.py:
scale=1080:1920 # Must match output resolutionSolution: Reduce speed multiplier or try different voices in VOICES list
Solution: Ensure .env file exists with valid GEMINI_API_KEY
Solution:
- Reduce FFmpeg preset to "ultrafast"
- Use Whisper "tiny" model
- Process in parallel with multiple instances
| File | Purpose | Language |
|---|---|---|
main.py |
Main orchestration pipeline | Python |
summarizer.py |
Gemini-powered summarization | Python |
script_generator.py |
Dynamic script generation | Python |
voice_generator.py |
Edge-TTS integration | Python |
subtitles.py |
Whisper subtitle generation | Python |
video_ffmpeg.py |
FFmpeg video composition | Python |
models.py |
Pydantic validation models | Python |
| Package | Purpose | Version |
|---|---|---|
langchain |
LLM framework | Latest |
google-generativeai |
Gemini API client | Latest |
edge-tts |
Microsoft TTS | Latest |
faster-whisper |
OpenAI speech recognition | Latest |
moviepy |
Video processing (alternative) | Latest |
pydantic |
Data validation | v2+ |
ffmpeg |
Video encoding (system) | 4.4+ |
The pipeline generates:
Video File: MP4 (H.264/AAC)
- Resolution: 1080ร1920 (9:16 vertical)
- Duration: 55-85 seconds
- Bitrate: Optimized for streaming
Metadata Files:
hashtags.txt: Social media readydescription.txt: YouTube Shorts optimizedsubtitles.ass: Professional formatting
To extend this project:
- Add new story sources to
Input Stories/ - Modify script generation prompts in
script_generator.py - Add background videos to
backgrounds/ - Customize models in
models.py
This project uses:
- Google Gemini API
- Microsoft Edge-TTS
- OpenAI Faster-Whisper
- FFmpeg (LGPL)
For issues with:
- Video processing: Check FFmpeg installation
- API errors: Verify Gemini API key in
.env - Voice quality: Adjust speed multiplier and voice selection
- Subtitles: Check Whisper model compatibility
- Batch processing multiple stories
- Custom background video management
- Dynamic thumbnail generation
- Multi-language support
- Alternative LLM integration (Claude, GPT-4)
- Web UI dashboard
- Scheduled uploads
- Analytics tracking
Created: December 2025
Version: 1.0
Status: Production Ready ๐