Content Aggregation System - Build Guide

A blueprint for building a content aggregation platform that transforms YouTube channels, podcasts, and blogs into a unified feed of AI-generated Twitter-style threads.

What This Is

This repository contains documentation and code snippets for building a system that:

Aggregates content from YouTube, podcasts (RSS), and blogs (Substack, WordPress, RSS)
Transcribes audio/video using Whisper on Modal's serverless GPU infrastructure
Generates personas using Claude Opus to capture each source's writing voice
Creates threads using Claude Sonnet to produce Twitter-style summaries
Displays everything in a paginated feed via a REST API

Documentation

Read these in order to understand and build the system:

00 - Overview — System architecture, data flow, and tech stack
01 - Data Models — Core database models (Site, Link, SitePersona, Thread)
02 - Scraping YouTube — YouTube Data API integration
03 - Scraping Podcasts — RSS feed parsing with feedparser
04 - Scraping Webpages — Substack, WordPress, and generic RSS
05 - Transcription — Modal + Whisper audio transcription pipeline
06 - AI Personas — Claude Opus persona generation
07 - AI Threads — Claude Sonnet thread generation
08 - Background Tasks — Django-Q task orchestration
09 - API Patterns — REST API design with Django REST Framework

Tech Stack

Layer	Technology
Backend	Django, Django REST Framework, Django-Q
Database	PostgreSQL
Transcription	Modal (serverless GPU) + OpenAI Whisper
AI Generation	Anthropic Claude (Opus for personas, Sonnet for threads)
Frontend	React (not covered in this guide)

Getting Started

Read the Overview to understand the architecture
Set up your Data Models
Implement scrapers for your desired content sources (docs 02-04)
Set up Transcription if handling audio/video
Implement Persona and Thread generation
Wire it together with Background Tasks
Expose via REST API

Required API Keys

YOUTUBE_API_KEY — YouTube Data API v3
ANTHROPIC_API_KEY — Claude API for persona and thread generation
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY — S3 for temporary audio storage
Modal account — For serverless Whisper transcription

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
plan		plan
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Aggregation System - Build Guide

What This Is

Documentation

Tech Stack

Getting Started

Required API Keys

About

Uh oh!

Releases

Packages

MattSegal/feed-plan

Folders and files

Latest commit

History

Repository files navigation

Content Aggregation System - Build Guide

What This Is

Documentation

Tech Stack

Getting Started

Required API Keys

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages