Digestr is an automated daily digest application designed to ingest content from various sources, primarily YouTube videos, summarize the key points, and generate a high-fidelity audio digest for you to listen to on the go.
The project uses Gemini for summarization and supports multiple text-to-speech backends for audio generation: Gemini TTS and Kokoro via kokoro-fastapi.
- YouTube Integration: Automatically fetches captions/transcripts from YouTube URLs.
- Intelligent Summarization: Uses Gemini models to create concise, insightful summaries of long-form content.
- Daily Digest Synthesis: Weaves multiple sources into a cohesive news-style podcast script.
- Pluggable TTS Providers: Generate audio with either Gemini TTS or Kokoro.
- Multi-Interface Support: Native support for CLI, a FastAPI Backend, and an MCP server for AI agents.
- Article Ingestion (Planned): Extract text from daily news articles.
The following diagram illustrates the end-to-end data flow of the Digestr application:
graph TD
%% Input Sources
In_YT[/YouTube URLs/] --> Ext_Vid[Extract Video IDs]
Ext_Vid --> Fetch_YT[Fetch Transcripts]
%% AI Processing
Fetch_YT --> AI_Summ[Gemini Model: Summarization]
AI_Summ --> Output_Text[News Script / Summary]
Output_Text --> AI_TTS[TTS Provider: Gemini or Kokoro]
AI_TTS --> Output_Audio[Audio Digest .mp3]
This project uses uv for lightning-fast dependency management.
-
Clone the repository and install dependencies: Ensure you have uv installed, then run:
uv sync
-
Configure Environment Variables: Copy the example environment file:
cp .env.example .env
Edit
.envand configure:GOOGLE_API_KEYfor Gemini summarization and Gemini TTSKOKORO_API_URLif you want to use Kokoro TTS (defaults tohttp://localhost:8880/v1/audio/speech)
-
Optional: Run Kokoro locally:
docker run -d -p 8880:8880 \ --name kokoro-fastapi \ ghcr.io/remsky/kokoro-fastapi-cpu:latest
You can run the CLI directly from your terminal, passing one or more target URLs via --urls. Audio generation also supports --tts-provider and an optional --voice override.
Provider defaults:
geminiusesPuckkokorousesaf_bella
uv run python -m src.interfaces.cli.main \
--urls "https://www.youtube.com/watch?v=_Hsdazxi9SI" \
--tts-provider gemini \
--voice CharonPass multiple URLs separated by spaces. The AI will weave them together into a cohesive podcast episode!
uv run python -m src.interfaces.cli.main \
--urls "dQw4w9WgXcQ" "BffWWGOgcWs"uv run python -m src.interfaces.cli.main \
--urls "dQw4w9WgXcQ" \
--tts-provider kokoroYou can start the backend server using:
uv run uvicorn src.interfaces.api.main:app --reloadPOST /digest/text: Returns a JSON object with the generated script.POST /digest/audio: Returns a streaming MP3 file of the digest.
tts_provider also affects text output. Gemini-generated scripts include expressive bracketed tags for Gemini TTS, while Kokoro-generated scripts avoid those tags and rely on punctuation.
Request Body (JSON):
{
"urls": ["https://youtube.com/watch?v=..."],
"tts_provider": "gemini",
"voice": "Puck"
}Kokoro Example:
{
"urls": ["https://youtube.com/watch?v=..."],
"tts_provider": "kokoro",
"voice": "af_bella"
}Digestr includes an MCP (Model Context Protocol) server built with FastMCP, allowing AI agents to natively access its capabilities.
To start the MCP server manually or test it:
uv run python -m src.interfaces.mcp.mainYou can also use the FastMCP inspector:
uv run fastmcp dev src/interfaces/mcp/server.py:mcpAdd the following to your agent's MCP configuration (e.g., mcp_config.json or claude_desktop_config.json):
{
"mcpServers": {
"digestr": {
"command": "uv",
"args": [
"run",
"python",
"-m",
"src.interfaces.mcp.main"
],
"cwd": "/absolute/path/to/digestr",
"env": {
"GEMINI_API_KEY": "your-api-key",
"KOKORO_API_URL": "http://localhost:8880/v1/audio/speech"
}
}
}
}get_transcript(url): Fetches a raw YouTube transcript.summarize_video(url, provider): Generates a news summary for a single video.create_news_program_script(urls, provider): Synthesizes a news broadcast script from multiple videos.generate_audio_digest(urls, voice, provider): Executes the full audio pipeline and returns the absolute file path to the saved MP3.
The project follows a decoupled architecture to support multiple interfaces:
- src/core/: Interface-agnostic business logic for YouTube extraction, AI summarization, and TTS.
- src/interfaces/cli/: Command-line interface implementation.
- src/interfaces/api/: FastAPI backend implementation.
- src/interfaces/mcp/: Model Context Protocol server implementation.