Digestr

Digestr is an automated daily digest application designed to ingest content from various sources, primarily YouTube videos, summarize the key points, and generate a high-fidelity audio digest for you to listen to on the go.

The project uses Gemini for summarization and supports multiple text-to-speech backends for audio generation: Gemini TTS and Kokoro via kokoro-fastapi.

Features

YouTube Integration: Automatically fetches captions/transcripts from YouTube URLs.
Intelligent Summarization: Uses Gemini models to create concise, insightful summaries of long-form content.
Daily Digest Synthesis: Weaves multiple sources into a cohesive news-style podcast script.
Pluggable TTS Providers: Generate audio with either Gemini TTS or Kokoro.
Multi-Interface Support: Native support for CLI, a FastAPI Backend, and an MCP server for AI agents.
Article Ingestion (Planned): Extract text from daily news articles.

Workflow Diagram

The following diagram illustrates the end-to-end data flow of the Digestr application:

graph TD
    %% Input Sources
    In_YT[/YouTube URLs/] --> Ext_Vid[Extract Video IDs]
    Ext_Vid --> Fetch_YT[Fetch Transcripts]
    
    %% AI Processing
    Fetch_YT --> AI_Summ[Gemini Model: Summarization]
    AI_Summ --> Output_Text[News Script / Summary]
    
    Output_Text --> AI_TTS[TTS Provider: Gemini or Kokoro]
    AI_TTS --> Output_Audio[Audio Digest .mp3]

Setup & Installation

This project uses uv for lightning-fast dependency management.

Clone the repository and install dependencies: Ensure you have uv installed, then run:
```
uv sync
```
Configure Environment Variables: Copy the example environment file:
```
cp .env.example .env
```
Edit .env and configure:
- GOOGLE_API_KEY for Gemini summarization and Gemini TTS
- KOKORO_API_URL if you want to use Kokoro TTS (defaults to http://localhost:8880/v1/audio/speech)

Optional: Run Kokoro locally:

docker run -d -p 8880:8880 \
  --name kokoro-fastapi \
  ghcr.io/remsky/kokoro-fastapi-cpu:latest

Usage

You can run the CLI directly from your terminal, passing one or more target URLs via --urls. Audio generation also supports --tts-provider and an optional --voice override.

Provider defaults:

gemini uses Puck
kokoro uses af_bella

Single Video Summary

uv run python -m src.interfaces.cli.main \
  --urls "https://www.youtube.com/watch?v=_Hsdazxi9SI" \
  --tts-provider gemini \
  --voice Charon

Multi-Video News Program (Daily Digest)

Pass multiple URLs separated by spaces. The AI will weave them together into a cohesive podcast episode!

uv run python -m src.interfaces.cli.main \
  --urls "dQw4w9WgXcQ" "BffWWGOgcWs"

Generate Audio With Kokoro

uv run python -m src.interfaces.cli.main \
  --urls "dQw4w9WgXcQ" \
  --tts-provider kokoro

API Usage

You can start the backend server using:

uv run uvicorn src.interfaces.api.main:app --reload

Endpoints

POST /digest/text: Returns a JSON object with the generated script.
POST /digest/audio: Returns a streaming MP3 file of the digest.

tts_provider also affects text output. Gemini-generated scripts include expressive bracketed tags for Gemini TTS, while Kokoro-generated scripts avoid those tags and rely on punctuation.

Request Body (JSON):

{
  "urls": ["https://youtube.com/watch?v=..."],
  "tts_provider": "gemini",
  "voice": "Puck"
}

Kokoro Example:

{
  "urls": ["https://youtube.com/watch?v=..."],
  "tts_provider": "kokoro",
  "voice": "af_bella"
}

MCP Interface

Digestr includes an MCP (Model Context Protocol) server built with FastMCP, allowing AI agents to natively access its capabilities.

To start the MCP server manually or test it:

uv run python -m src.interfaces.mcp.main

You can also use the FastMCP inspector:

uv run fastmcp dev src/interfaces/mcp/server.py:mcp

Connecting an Agent

Add the following to your agent's MCP configuration (e.g., mcp_config.json or claude_desktop_config.json):

{
  "mcpServers": {
    "digestr": {
      "command": "uv",
      "args": [
        "run",
        "python",
        "-m",
        "src.interfaces.mcp.main"
      ],
      "cwd": "/absolute/path/to/digestr",
      "env": {
        "GEMINI_API_KEY": "your-api-key",
        "KOKORO_API_URL": "http://localhost:8880/v1/audio/speech"
      }
    }
  }
}

Exposed Tools

get_transcript(url): Fetches a raw YouTube transcript.
summarize_video(url, provider): Generates a news summary for a single video.
create_news_program_script(urls, provider): Synthesizes a news broadcast script from multiple videos.
generate_audio_digest(urls, voice, provider): Executes the full audio pipeline and returns the absolute file path to the saved MP3.

Architecture

The project follows a decoupled architecture to support multiple interfaces:

src/core/: Interface-agnostic business logic for YouTube extraction, AI summarization, and TTS.
src/interfaces/cli/: Command-line interface implementation.
src/interfaces/api/: FastAPI backend implementation.
src/interfaces/mcp/: Model Context Protocol server implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
mise.toml		mise.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digestr

Features

Workflow Diagram

Setup & Installation

Usage

Single Video Summary

Multi-Video News Program (Daily Digest)

Generate Audio With Kokoro

API Usage

Endpoints

MCP Interface

Connecting an Agent

Exposed Tools

Architecture

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Digestr

Features

Workflow Diagram

Setup & Installation

Usage

Single Video Summary

Multi-Video News Program (Daily Digest)

Generate Audio With Kokoro

API Usage

Endpoints

MCP Interface

Connecting an Agent

Exposed Tools

Architecture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages