Automated pipeline that turns a narration recording into a published YouTube video — from a single command.
This project was initially created to help me with this playlist.
python3 publish.py my-reflection.flac
.flac ──► transcribe ──► generate music ──► generate thumbnail ──► render video ──► upload to YouTube
(Whisper) (Lyria 3 Pro) (Gemini Nano Banana 3) (ffmpeg) (YouTube API)
- Transcribe — faster-whisper turns the audio into a
.txttranscript and a.srtsubtitle file. - Generate music — Google Lyria 3 Pro composes an original ambient instrumental track tuned to the mood of the piece. Automatically retries with different style descriptions if the copyright filter triggers.
- Generate thumbnail — Gemini 3 Pro Image (Nano Banana) creates a 16:9 cinematic thumbnail from the transcript content.
- Render video — ffmpeg assembles word-wrapped text slides (1920×1080, dark background, Palatino), normalises voice to −16 LUFS (EBU R128), and mixes in the background music at a level measured automatically to sit below the voice.
- Upload — YouTube Data API v3 uploads the video, sets the thumbnail, adds it to the configured playlist, and writes a
<stem>.published.jsonsidecar so re-runs are skipped.
Every step is idempotent — if its output already exists, it is skipped. Use --force to re-run a specific step.
- Python 3.9+
- ffmpeg (with
ebur128filter — any modern build)
brew install ffmpeg # macOS
# or: apt install ffmpeg # Debian/Ubuntupip install faster-whisper Pillow \
google-api-python-client google-auth-oauthlib google-auth-httplib2| Key | Where to get it | Used by |
|---|---|---|
GEMINI_API_KEY |
aistudio.google.com/apikey | Lyria music + Gemini thumbnail |
export GEMINI_API_KEY="AIza..."Note: Lyria 3 Pro is a paid-tier preview model. Check current pricing at Google AI Studio.
- Go to console.cloud.google.com and create a project.
- Enable YouTube Data API v3.
- OAuth consent screen → External → add your Gmail as a test user.
- Credentials → Create OAuth 2.0 Client ID → Desktop app → download as
client_secret.jsoninto this directory.
The first publish.py run (or youtube_upload.py) will open a browser for consent and cache credentials in .youtube_token.json. Subsequent runs auto-refresh.
Both
client_secret.jsonand.youtube_token.jsonare in.gitignore— never commit them.
python3 publish.py minha-reflexao.flacOptions:
| Flag | Default | Description |
|---|---|---|
--model |
small |
Whisper model size (tiny, small, medium, large) |
--language |
pt |
Transcript language code |
--dry-run |
off | Run steps 1–4 but skip YouTube upload |
--force transcribe |
— | Force re-run of a specific step (repeatable) |
--force all |
— | Force re-run of every step |
Each step can also be run standalone:
# Transcribe only
python3 make_video.py audio.flac --model small --language pt
# Generate music only
python3 make_music.py audio.flac --duration 180 --retries 5
# Generate thumbnail only
python3 make_thumbnail.py audio.flac
# Render video (requires transcript and music to exist)
python3 make_video.py audio.flac --with-music music/local_audio.mp3
# Upload only (requires .mp4 and optionally .thumbnail.jpg)
python3 youtube_upload.py audio.mp4 --playlist PLxxxxxx --privacy publicAdd Suno song URLs to music_urls.txt (one per line), then:
python3 sync_music.pyTracks are saved to music/<uuid>.mp3. Use make_video.py --with-music music/ to pick one at random.
Edit these files to customise defaults without touching code:
| File | Purpose |
|---|---|
footer.txt |
Fixed text appended to every YouTube description |
music_urls.txt |
Suno track URLs for sync_music.py |
To change the target YouTube playlist, privacy setting, or language defaults, edit the constants at the top of youtube_upload.py and publish.py.
The default YouTube Data API quota is 10 000 units/day. A single videos.insert costs 1 600 units, so roughly 6 uploads fit within the free daily quota.
MIT