Convert a YouTube playlist into a structured, citation-backed technical PDF book.
- Topic-grouped chapters (not one chapter per video)
- LLM-written prose with citation verification
- Introduction, conclusion, glossary, references
- Final output:
output/book.pdf
flowchart TD
A[Playlist URL] --> B[Stage 1: Fetch metadata/audio + reference URLs]
B --> C[Stage 2: Transcripts from YouTube Transcript API]
C --> D[Stage 3: Terminology correction]
D --> E[Stage 4: Group + order topics]
E --> F[Stage 5a: Write topic chapters]
F --> G[Stage 5b: Verify citations]
G --> H[Stage 5c: Polish prose]
H --> I[Stage 6: Assemble full book markdown]
I --> J[Stage 7: Render PDF]
J --> K[output/book.pdf]
- Create venv and install:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Add API key in
.env:
GEMINI_API_KEY=your_key_here- Configure provider/model in
config.yaml(default is Gemini Flash).
Full run:
DYLD_LIBRARY_PATH=/opt/homebrew/lib python run.py --playlist "https://www.youtube.com/playlist?list=YOUR_LIST_ID"Resume from stage:
DYLD_LIBRARY_PATH=/opt/homebrew/lib python run.py --playlist "..." --from 3Re-render PDF only:
DYLD_LIBRARY_PATH=/opt/homebrew/lib python run.py --from 7 --to 7llm:
provider: gemini
model: gemini-flash-latest
temperature: 0.3
pipeline:
batch_size: 4
rate_limit_rpm: 6
min_words_per_topic: 8000Important directories:
checkpoints/01_fetchcheckpoints/01b_ref_contentcheckpoints/02_transcriptscheckpoints/02b_correctedcheckpoints/03_groupscheckpoints/04_topicscheckpoints/04b_verifiedcheckpoints/04c_polishedcheckpoints/05_book
Audio cache is local-only and ignored from git:
checkpoints/audio/
.claude/is ignored and should not be committed.checkpoints/audio/is removed/ignored.- Checkpoints and generated book artifacts are versioned as required for reproducible runs.
Bookify/
├── run.py
├── config.yaml
├── requirements.txt
├── README.md
├── pipeline/
│ ├── fetcher.py
│ ├── transcriber.py
│ ├── terminology_corrector.py
│ ├── grouper.py
│ ├── topic_writer.py
│ ├── citation_verifier.py
│ ├── prose_polisher.py
│ ├── assembler.py
│ └── pdf_renderer.py
├── llm/
│ └── client.py
├── utils/
│ ├── checkpoint.py
│ ├── progress.py
│ ├── quality_report.py
│ └── url_filter.py
├── checkpoints/
└── output/
Aditya Chaurasiya
Indian Institute Of Technology Bombay