Skip to content

Colorfulshadow/TG-Digest

Repository files navigation

TG-Digest

TG-Digest is a self-hosted Telegram morning digest service. It reads selected Telegram chats through a user session, filters noise, summarizes with an OpenAI-compatible LLM, archives the full report on the built-in web UI, and can deliver a compact briefing through Telegram, ntfy, or email.

What Is Implemented

  • Telegram user-source adapter with incremental cursor support.
  • SQLModel repositories for groups, messages, cursors, reports, runs, usage, and UI settings.
  • Noise, deduplication, and link metadata filters.
  • Media enrichment for Telegram image posts: downloaded images are combined into a contact sheet and passed to the configured vision model as best-effort image context.
  • Bounded external link context fetching for accessible non-Telegram URLs.
  • Prompt registry with base + category + optional group override templates.
  • Category prompts include news, medical, group chat, generic, and military/security.
  • LiteLLM client wrapper, map/reduce summarizer, and per-run usage recording.
  • Markdown, HTML, and JSON report renderers.
  • Web, Telegram bot, ntfy, and email deliverers.
  • FastAPI management UI with HTMX, Alpine.js, Tailwind CDN, dark mode, local demo mode, report history, costs, run status, group configuration, and LLM/budget settings.
  • Split management pages for home, groups, settings, costs, and report history.
  • Telegram account identity detection for self-related message prioritization.
  • Like/dislike feedback on report items to bias future summaries.
  • APScheduler daily cron job with missed-run backfill.
  • Loguru console/file logging with secret redaction patterns.
  • Dockerfile and Docker Compose deployment skeleton.

Safety Model

Real credentials belong only in .env, which is ignored by Git. The repository includes .env.example with key names and empty placeholders. Runtime data, session files, logs, and virtual environments are also ignored.

Never put these values in source files, config YAML, tests, commit messages, or logs:

  • TG_API_ID, TG_API_HASH, TG_PHONE
  • Telegram login codes or session files
  • TG_BOT_TOKEN, TG_TARGET_CHAT_ID
  • LLM_API_KEY, LLM_BASE_URL
  • WEB_AUTH_USER, WEB_AUTH_PASS
  • NTFY_*, SMTP_*
  • BARK_*

Local Development

The current workspace uses the project-local virtual environment at venv/.

venv\Scripts\python.exe -m pip install -e ".[dev]"
venv\Scripts\python.exe -m pytest -q
venv\Scripts\python.exe -m ruff check src tests
venv\Scripts\python.exe -m src.main

Open:

Without .env, the UI still loads and can seed demo dialogs/reports for local validation.

Configuration

Copy .env.example to .env and fill only the values needed for the gate you are testing.

Copy-Item .env.example .env

Non-secret defaults live in config/settings.yaml:

  • schedule cron and time window
  • default models and concurrency
  • filter limits
  • renderers and enabled delivery channels
  • daily budget and warning ratio
  • web host/port
  • cron schedule and digest time window

The management UI can persist runtime overrides for LLM and budget settings in the database. It also includes a write-only credential panel for .env: configured values are shown only as configured/missing, input fields stay blank, and empty submissions keep existing values.

Proxy

For networks that require a local VPN/proxy, configure proxy URLs in .env.

PROXY_URL=
TG_PROXY_URL=socks://127.0.0.1:10808
LLM_PROXY_URL=http://127.0.0.1:10808

TG_PROXY_URL is used by Telethon. LLM_PROXY_URL is applied to LiteLLM/OpenAI-compatible HTTP calls and optional HTTP delivery clients. If a channel-specific value is empty, PROXY_URL is used as the fallback.

First Telegram Login

Telegram user collection requires a one-time MTProto session.

  1. Create Telegram API credentials at https://my.telegram.org.
  2. Put TG_API_ID, TG_API_HASH, and TG_PHONE in .env.
  3. Run:
venv\Scripts\python.exe -m src.ingest.login

Enter the Telegram verification code when prompted. If the account has 2FA, enter the 2FA password. The session is written under data/session/, which is ignored by Git.

If interactive stdin is unreliable on Windows terminals, use the two-step flow:

venv\Scripts\python.exe -m src.ingest.login --request-code

After Telegram sends the code:

"123456" | venv\Scripts\python.exe -m src.ingest.login --complete

For accounts with 2FA, pipe the code and password on separate lines.

The same request-code and complete-login flow is available in the web UI. Verification codes and 2FA passwords are one-time inputs only and are not written to .env, the database, or logs.

Digest Runs

The dashboard exposes two manual actions:

  • 立即生成: incremental mode. It uses each group's cursor, fetches only messages after that cursor, and advances the cursor after saving messages.
  • 根据过去 N 小时生成: replay mode. It ignores cursors, re-summarizes the configured time window, and does not advance cursors. Use this for testing prompts or regenerating today's report.

The scheduled daily job uses incremental mode when Telegram and LLM credentials are configured:

  • select groups where enabled = true
  • fetch messages newer than the per-group cursor and inside schedule.time_window_hours
  • cap each group by max_messages and prompt size by max_tokens
  • filter noise, deduplicate, map summaries per group, then reduce into one report

Unread counts are only displayed and used for UI sorting. They do not decide what gets summarized. The default schedule is 30 7 * * * in Asia/Shanghai, i.e. 07:30 every day. The cron expression, timezone, and time window can be changed from the settings page.

When dialogs are synced, TG-Digest detects the logged-in Telegram account ID and username. Messages sent by the account, Telegram-marked mentions, or text mentioning @username are marked as self-related and get higher prompt priority. Self-related report items are rendered in a dedicated section instead of being mixed into the must-read list.

For image-heavy Telegram posts, TG-Digest downloads available images to data/media, builds a single contact sheet with message IDs, and sends that sheet to media.vision_model as optional vision context. The checked-in default is Qwen/Qwen3.6-35B-A3B. If the provider or model does not support image input, the run falls back to text, media metadata, and source links. Video posts are not transcoded; the report keeps the original Telegram message link so they can be opened in Telegram. When a summarized item is matched to downloaded images, the Web/HTML/Markdown report renders those images inline; Telegram bot delivery sends the matched images as photo messages after the text digest. Accessible non-Telegram links are fetched only for bounded title/description context.

Report items include like/dislike controls. Feedback is stored locally and summarized into future LLM prompts as preference memory; urgent, safety, medical, deadline, or direct-to-user items are not suppressed solely because of dislikes.

LLM Setup

Use any OpenAI-compatible provider supported by LiteLLM.

Set in .env:

LLM_API_KEY=
LLM_BASE_URL=

Set models in the UI or config/settings.yaml:

llm:
  litellm_provider: "openai"
  map_model: "provider/model"
  reduce_model: "provider/model"
media:
  vision_model: "Qwen/Qwen3.6-35B-A3B"

The checked-in default model split is:

  • deepseek-ai/DeepSeek-V4-Flash for map summaries
  • deepseek-ai/DeepSeek-V4-Pro for final reduce reports

Cost Estimates

The UI cost panel reads token usage from the usage table and estimates spend with the per-1k-token rates in config/settings.yaml. The checked-in defaults match the SiliconFlow USD prices for the default models as of 2026-06-20:

  • map / deepseek-ai/DeepSeek-V4-Flash: input $0.00013 per 1k tokens, output $0.00028 per 1k tokens
  • reduce / deepseek-ai/DeepSeek-V4-Pro: input $0.0016 per 1k tokens, output $0.003135 per 1k tokens

Existing historical runs are not rewritten when rates change. If an older usage row has tokens but stored cost is 0, the UI marks it with * and estimates it with the current rates.

Delivery

Delivery channels are configured in config/settings.yaml:

deliver:
  enabled: ["web", "telegram", "bark"]

Credential requirements:

  • telegram: TG_BOT_TOKEN, TG_TARGET_CHAT_ID
  • ntfy: NTFY_URL
  • bark: BARK_URL, optional BARK_GROUP. Use a full Bark endpoint such as https://api.day.app/<device-key>.
  • email: SMTP_HOST, SMTP_FROM or SMTP_USER, SMTP_TO, optional SMTP_PASS

Telegram bot delivery uses Bot API HTML formatting and splits long reports into multiple messages below the Telegram message length limit. Matched report images are delivered as Telegram photo messages. The web report remains the complete archive; Telegram is a readable mobile preview with source links and inline media.

Docker

docker compose up --build

The compose file mounts:

  • data/session for Telethon session files
  • data/db for SQLite
  • data/reports for rendered reports
  • data/logs for rotating application logs
  • config for editable prompts and settings

The .env file is optional for booting the zero-secret UI, but required for real Telegram/LLM/delivery integration.

Current Integration Gate

Phase A is designed to run without real credentials. Phase B begins with Telegram user login and real network validation. At that point the required values are TG_API_ID, TG_API_HASH, and TG_PHONE in .env.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors