Skip to content

dwarkeshsp/transcripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transcript Pipeline

Turns raw audio/video into polished, magazine-quality interview transcripts with title/thumbnail suggestions and chapter timestamps — all in one command.

  1. ElevenLabs Scribe v2 transcribes the audio with speaker diarization
  2. Gemini 3 Pro listens to the original audio alongside the raw transcript and cleans it up — removing filler words, backchannel noise, and false starts while preserving what the speakers actually said
  3. Claude 4.6 Opus reads the finished transcript and generates YouTube title/thumbnail combos and chapter timestamps

Quick Start (for non-technical editors)

One-time setup

You only need to do this once on your machine.

1. Install Python (if you don't already have it)

Open Terminal (search for "Terminal" in Spotlight on Mac) and type:

python3 --version

If you see a version number (like Python 3.11.5), you're good — skip to step 2. If not, install it:

  • Mac: Go to https://www.python.org/downloads/ and download the latest version. Run the installer.
  • After installing, close and reopen Terminal, then try python3 --version again.

2. Download or clone this project

If someone sent you the folder, just put it somewhere you can find it (like your Documents folder). In Terminal, navigate to it:

cd ~/Documents/transcripts

(Replace the path with wherever you put the folder.)

3. Set up the Python environment

Run these commands one at a time in Terminal:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

4. Set up API keys

You need three API keys. Ask Dwarkesh if you don't have them.

Add them to your shell config so they're always available. Run this in Terminal (paste the whole block, replacing the placeholder values with your actual keys):

echo 'export ELEVENLABS_API_KEY="your-elevenlabs-key-here"' >> ~/.zshrc
echo 'export GEMINI_API_KEY="your-gemini-key-here"' >> ~/.zshrc
echo 'export ANTHROPIC_API_KEY="your-anthropic-key-here"' >> ~/.zshrc
source ~/.zshrc

To verify they're set:

echo $ELEVENLABS_API_KEY
echo $GEMINI_API_KEY
echo $ANTHROPIC_API_KEY

All three should print your keys.

Transcribing a file

Every time you open a new Terminal window, activate the environment first:

cd ~/Documents/transcripts
source venv/bin/activate

Then run:

python transcribe.py your-audio-file.mp3

This will:

  1. Upload the audio to ElevenLabs for transcription (~minutes depending on length)
  2. Upload the audio to Gemini and enhance each chunk (~5+ minutes for a long episode)
  3. Generate title/thumbnail suggestions and chapter timestamps with Claude (~1 minute)
  4. Save everything in a project folder

Output

For an input file called episode.mp3, everything goes into projects/episode/:

projects/
  episode/
    transcript.md     # The cleaned, polished transcript
    raw.md            # The raw transcript before cleanup (for comparison)
    postprod.md       # Title/thumbnail suggestions + chapter timestamps
    .cache/           # Cached API results (hidden)

Each episode gets its own folder. The projects/ directory is gitignored.

Common options

Specify number of speakers (helps with diarization accuracy):

python transcribe.py episode.mp3 --speakers 2

Just get the raw transcript (skip cleanup and post-production — much faster):

python transcribe.py episode.mp3 --raw

Skip post-production (just transcript, no titles/timestamps):

python transcribe.py episode.mp3 --no-postprod

Save both raw and cleaned versions:

python transcribe.py episode.mp3 --save-raw

Force a fresh run (ignore all cached results):

python transcribe.py episode.mp3 --no-cache

Caching

Results are cached automatically inside each project's .cache/ folder. If you run the same file again, the pipeline skips any steps that already completed — no repeated API calls. If something fails partway through, just re-run the same command and it picks up where it left off. Use --no-cache to force a completely fresh run.

Supported file formats

Audio: .mp3, .wav, .m4a, .flac, .ogg

Video: .mp4, .mov, .avi, .mkv, .webm (audio is automatically extracted)

How it works

The pipeline splits the raw transcript into chunks of ~4000 tokens each and sends each chunk to Gemini along with the full audio file. Gemini listens to the audio to correct transcription errors and uses the editorial prompt to clean up the text — removing filler words, deleting backchannel-only turns ("Mm-hmm", "Yeah"), merging interrupted thoughts, adding paragraph breaks, and smoothing grammar. The goal is a transcript that reads like a written magazine interview while faithfully preserving what the speakers actually said.

After the transcript is finalized, it's sent to Claude 4.6 Opus for post-production: generating YouTube title/thumbnail combinations (5 titles with 3 thumbnail text ideas each) and chapter timestamps (spaced 8-15 minutes apart).

Each step is cached as it completes, so if the process is interrupted you can resume without re-doing work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors