Skip to content

aronreisx/digestr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digestr

Digestr is an automated daily digest application designed to ingest content from various sources, primarily YouTube videos, summarize the key points, and generate a high-fidelity audio digest for you to listen to on the go.

The project uses Gemini for summarization and supports multiple text-to-speech backends for audio generation: Gemini TTS and Kokoro via kokoro-fastapi.

Features

  • YouTube Integration: Automatically fetches captions/transcripts from YouTube URLs.
  • Intelligent Summarization: Uses Gemini models to create concise, insightful summaries of long-form content.
  • Daily Digest Synthesis: Weaves multiple sources into a cohesive news-style podcast script.
  • Pluggable TTS Providers: Generate audio with either Gemini TTS or Kokoro.
  • Multi-Interface Support: Native support for CLI, a FastAPI Backend, and an MCP server for AI agents.
  • Article Ingestion (Planned): Extract text from daily news articles.

Workflow Diagram

The following diagram illustrates the end-to-end data flow of the Digestr application:

graph TD
    %% Input Sources
    In_YT[/YouTube URLs/] --> Ext_Vid[Extract Video IDs]
    Ext_Vid --> Fetch_YT[Fetch Transcripts]
    
    %% AI Processing
    Fetch_YT --> AI_Summ[Gemini Model: Summarization]
    AI_Summ --> Output_Text[News Script / Summary]
    
    Output_Text --> AI_TTS[TTS Provider: Gemini or Kokoro]
    AI_TTS --> Output_Audio[Audio Digest .mp3]
Loading

Setup & Installation

This project uses uv for lightning-fast dependency management.

  1. Clone the repository and install dependencies: Ensure you have uv installed, then run:

    uv sync
  2. Configure Environment Variables: Copy the example environment file:

    cp .env.example .env

    Edit .env and configure:

    • GOOGLE_API_KEY for Gemini summarization and Gemini TTS
    • KOKORO_API_URL if you want to use Kokoro TTS (defaults to http://localhost:8880/v1/audio/speech)
  3. Optional: Run Kokoro locally:

    docker run -d -p 8880:8880 \
      --name kokoro-fastapi \
      ghcr.io/remsky/kokoro-fastapi-cpu:latest

Usage

You can run the CLI directly from your terminal, passing one or more target URLs via --urls. Audio generation also supports --tts-provider and an optional --voice override.

Provider defaults:

  • gemini uses Puck
  • kokoro uses af_bella

Single Video Summary

uv run python -m src.interfaces.cli.main \
  --urls "https://www.youtube.com/watch?v=_Hsdazxi9SI" \
  --tts-provider gemini \
  --voice Charon

Multi-Video News Program (Daily Digest)

Pass multiple URLs separated by spaces. The AI will weave them together into a cohesive podcast episode!

uv run python -m src.interfaces.cli.main \
  --urls "dQw4w9WgXcQ" "BffWWGOgcWs"

Generate Audio With Kokoro

uv run python -m src.interfaces.cli.main \
  --urls "dQw4w9WgXcQ" \
  --tts-provider kokoro

API Usage

You can start the backend server using:

uv run uvicorn src.interfaces.api.main:app --reload

Endpoints

  • POST /digest/text: Returns a JSON object with the generated script.
  • POST /digest/audio: Returns a streaming MP3 file of the digest.

tts_provider also affects text output. Gemini-generated scripts include expressive bracketed tags for Gemini TTS, while Kokoro-generated scripts avoid those tags and rely on punctuation.

Request Body (JSON):

{
  "urls": ["https://youtube.com/watch?v=..."],
  "tts_provider": "gemini",
  "voice": "Puck"
}

Kokoro Example:

{
  "urls": ["https://youtube.com/watch?v=..."],
  "tts_provider": "kokoro",
  "voice": "af_bella"
}

MCP Interface

Digestr includes an MCP (Model Context Protocol) server built with FastMCP, allowing AI agents to natively access its capabilities.

To start the MCP server manually or test it:

uv run python -m src.interfaces.mcp.main

You can also use the FastMCP inspector:

uv run fastmcp dev src/interfaces/mcp/server.py:mcp

Connecting an Agent

Add the following to your agent's MCP configuration (e.g., mcp_config.json or claude_desktop_config.json):

{
  "mcpServers": {
    "digestr": {
      "command": "uv",
      "args": [
        "run",
        "python",
        "-m",
        "src.interfaces.mcp.main"
      ],
      "cwd": "/absolute/path/to/digestr",
      "env": {
        "GEMINI_API_KEY": "your-api-key",
        "KOKORO_API_URL": "http://localhost:8880/v1/audio/speech"
      }
    }
  }
}

Exposed Tools

  • get_transcript(url): Fetches a raw YouTube transcript.
  • summarize_video(url, provider): Generates a news summary for a single video.
  • create_news_program_script(urls, provider): Synthesizes a news broadcast script from multiple videos.
  • generate_audio_digest(urls, voice, provider): Executes the full audio pipeline and returns the absolute file path to the saved MP3.

Architecture

The project follows a decoupled architecture to support multiple interfaces:

  • src/core/: Interface-agnostic business logic for YouTube extraction, AI summarization, and TTS.
  • src/interfaces/cli/: Command-line interface implementation.
  • src/interfaces/api/: FastAPI backend implementation.
  • src/interfaces/mcp/: Model Context Protocol server implementation.

About

An AI-powered daily digest creator that transforms YouTube videos into high-fidelity audio news programs using Gemini summarization and pluggable TTS backends (Gemini & Kokoro). supports CLI, FastAPI, and MCP.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages