Transcript Pipeline

Turns raw audio/video into polished, magazine-quality interview transcripts with title/thumbnail suggestions and chapter timestamps — all in one command.

ElevenLabs Scribe v2 transcribes the audio with speaker diarization
Gemini 3 Pro listens to the original audio alongside the raw transcript and cleans it up — removing filler words, backchannel noise, and false starts while preserving what the speakers actually said
Claude 4.6 Opus reads the finished transcript and generates YouTube title/thumbnail combos and chapter timestamps

Quick Start (for non-technical editors)

One-time setup

You only need to do this once on your machine.

1. Install Python (if you don't already have it)

Open Terminal (search for "Terminal" in Spotlight on Mac) and type:

python3 --version

If you see a version number (like Python 3.11.5), you're good — skip to step 2. If not, install it:

Mac: Go to https://www.python.org/downloads/ and download the latest version. Run the installer.
After installing, close and reopen Terminal, then try python3 --version again.

2. Download or clone this project

If someone sent you the folder, just put it somewhere you can find it (like your Documents folder). In Terminal, navigate to it:

cd ~/Documents/transcripts

(Replace the path with wherever you put the folder.)

3. Set up the Python environment

Run these commands one at a time in Terminal:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

4. Set up API keys

You need three API keys. Ask Dwarkesh if you don't have them.

ElevenLabs API key — from https://elevenlabs.io
Gemini API key — from https://aistudio.google.com/apikey
Anthropic API key — from https://console.anthropic.com/settings/keys

Add them to your shell config so they're always available. Run this in Terminal (paste the whole block, replacing the placeholder values with your actual keys):

echo 'export ELEVENLABS_API_KEY="your-elevenlabs-key-here"' >> ~/.zshrc
echo 'export GEMINI_API_KEY="your-gemini-key-here"' >> ~/.zshrc
echo 'export ANTHROPIC_API_KEY="your-anthropic-key-here"' >> ~/.zshrc
source ~/.zshrc

To verify they're set:

echo $ELEVENLABS_API_KEY
echo $GEMINI_API_KEY
echo $ANTHROPIC_API_KEY

All three should print your keys.

Transcribing a file

Every time you open a new Terminal window, activate the environment first:

cd ~/Documents/transcripts
source venv/bin/activate

Then run:

python transcribe.py your-audio-file.mp3

This will:

Upload the audio to ElevenLabs for transcription (~minutes depending on length)
Upload the audio to Gemini and enhance each chunk (~5+ minutes for a long episode)
Generate title/thumbnail suggestions and chapter timestamps with Claude (~1 minute)
Save everything in a project folder

Output

For an input file called episode.mp3, everything goes into projects/episode/:

projects/
  episode/
    transcript.md     # The cleaned, polished transcript
    raw.md            # The raw transcript before cleanup (for comparison)
    postprod.md       # Title/thumbnail suggestions + chapter timestamps
    .cache/           # Cached API results (hidden)

Each episode gets its own folder. The projects/ directory is gitignored.

Common options

Specify number of speakers (helps with diarization accuracy):

python transcribe.py episode.mp3 --speakers 2

Just get the raw transcript (skip cleanup and post-production — much faster):

python transcribe.py episode.mp3 --raw

Skip post-production (just transcript, no titles/timestamps):

python transcribe.py episode.mp3 --no-postprod

Save both raw and cleaned versions:

python transcribe.py episode.mp3 --save-raw

Force a fresh run (ignore all cached results):

python transcribe.py episode.mp3 --no-cache

Caching

Results are cached automatically inside each project's .cache/ folder. If you run the same file again, the pipeline skips any steps that already completed — no repeated API calls. If something fails partway through, just re-run the same command and it picks up where it left off. Use --no-cache to force a completely fresh run.

Supported file formats

Audio: .mp3, .wav, .m4a, .flac, .ogg

Video: .mp4, .mov, .avi, .mkv, .webm (audio is automatically extracted)

How it works

The pipeline splits the raw transcript into chunks of ~4000 tokens each and sends each chunk to Gemini along with the full audio file. Gemini listens to the audio to correct transcription errors and uses the editorial prompt to clean up the text — removing filler words, deleting backchannel-only turns ("Mm-hmm", "Yeah"), merging interrupted thoughts, adding paragraph breaks, and smoothing grammar. The goal is a transcript that reads like a written magazine interview while faithfully preserving what the speakers actually said.

After the transcript is finalized, it's sent to Claude 4.6 Opus for post-production: generating YouTube title/thumbnail combinations (5 titles with 3 thumbnail text ideas each) and chapter timestamps (spaced 8-15 minutes apart).

Each step is cached as it completes, so if the process is interrupted you can resume without re-doing work.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.next		.next
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
prompts.py		prompts.py
requirements.txt		requirements.txt
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript Pipeline

Quick Start (for non-technical editors)

One-time setup

Transcribing a file

Output

Common options

Caching

Supported file formats

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transcript Pipeline

Quick Start (for non-technical editors)

One-time setup

Transcribing a file

Output

Common options

Caching

Supported file formats

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages