A blueprint for building a content aggregation platform that transforms YouTube channels, podcasts, and blogs into a unified feed of AI-generated Twitter-style threads.
This repository contains documentation and code snippets for building a system that:
- Aggregates content from YouTube, podcasts (RSS), and blogs (Substack, WordPress, RSS)
- Transcribes audio/video using Whisper on Modal's serverless GPU infrastructure
- Generates personas using Claude Opus to capture each source's writing voice
- Creates threads using Claude Sonnet to produce Twitter-style summaries
- Displays everything in a paginated feed via a REST API
Read these in order to understand and build the system:
- 00 - Overview — System architecture, data flow, and tech stack
- 01 - Data Models — Core database models (Site, Link, SitePersona, Thread)
- 02 - Scraping YouTube — YouTube Data API integration
- 03 - Scraping Podcasts — RSS feed parsing with feedparser
- 04 - Scraping Webpages — Substack, WordPress, and generic RSS
- 05 - Transcription — Modal + Whisper audio transcription pipeline
- 06 - AI Personas — Claude Opus persona generation
- 07 - AI Threads — Claude Sonnet thread generation
- 08 - Background Tasks — Django-Q task orchestration
- 09 - API Patterns — REST API design with Django REST Framework
| Layer | Technology |
|---|---|
| Backend | Django, Django REST Framework, Django-Q |
| Database | PostgreSQL |
| Transcription | Modal (serverless GPU) + OpenAI Whisper |
| AI Generation | Anthropic Claude (Opus for personas, Sonnet for threads) |
| Frontend | React (not covered in this guide) |
- Read the Overview to understand the architecture
- Set up your Data Models
- Implement scrapers for your desired content sources (docs 02-04)
- Set up Transcription if handling audio/video
- Implement Persona and Thread generation
- Wire it together with Background Tasks
- Expose via REST API
YOUTUBE_API_KEY— YouTube Data API v3ANTHROPIC_API_KEY— Claude API for persona and thread generationAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY— S3 for temporary audio storage- Modal account — For serverless Whisper transcription