Tools for monitoring open-source geospatial podcasts, generating structured episode notes, and publishing them to subscribers by email. The project has two cooperating pieces:
- Summarizer pipeline – pulls new podcast episodes, formats transcripts, writes Google Docs (optional), and emails the weekly digest.
- FastAPI mini-app – exposes subscribe/unsubscribe endpoints backed by a Mailgun mailing list so readers can manage their subscription links securely.
- Chunked transcript formatting and summary synthesis using the configured OpenAI model.
- Opinionated prompt tuning for open-source geospatial software developers.
- Google Drive export hooks for sharing transcripts/summaries (optional).
- Mailgun List API integration with per-subscriber unsubscribe tokens and one-click headers.
- Double opt-in flow with confirmation links and expiring tokens.
- FastAPI server (
summary-api) for subscription management, confirmation, and health checks.
- Python 3.10+
ffmpeginstalled and reachable by the path inFFMPEG_BIN.- Mailgun domain with an API key and mailing list (e.g.,
geospatial_podcasts@mg.opendata.land). - OpenAI API key with access to the summarization model you choose.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# or: pip install -e .[gdrive] # if you need Google Drive exports- Copy the example environment file and edit the values:
cp .env.example .env $EDITOR .env - At minimum set:
OPENAI_API_KEYMAILGUN_DOMAIN,MAILGUN_API_KEY,MAILGUN_LIST_ADDRESS,MAIL_FROMAPP_BASE_URL(public URL where the FastAPI app is hosted)APP_SIGNING_SECRET(random string for unsubscribe token signing)- Optionally
SUBSCRIBE_FORM_TOKENto require a shared secret for subscription posts
- Optionally configure Google Drive (
GDRIVE_*).
The application writes working files to WORK_DIR and state to STATE_DIR; both paths are ignored by git.
Use the existing CLI entry point (summarize) to process feeds. Example dry-run:
source .venv/bin/activate
summarize --dry-runWhen MAILGUN_LIST_ADDRESS is configured the pipeline sends each digest to that list alias. If MAIL_TO is populated it is treated as a one-time seed list (imported into Mailgun); otherwise you can leave it blank and rely solely on the subscribe/unsubscribe flow. Subscribers must confirm their email before they are added to the list.
Launch the API with the provided script entry:
source .venv/bin/activate
summary-api # wraps uvicorn podcast_summarize.server:appDeploy this behind HTTPS at APP_BASE_URL. Key endpoints:
GET /healthz– simple health check.POST /subscribe– accepts JSON{ "email": "...", "name": "optional" }and sends a confirmation email. IfSUBSCRIBE_FORM_TOKENis set you must also provide"token": "<value>"or theX-Subscribe-Tokenheader. A minimal HTML form is available atGET /subscribefor browser-based signups.GET|POST /confirm/{token}– handles double opt-in confirmation links. Tokens expire afterSUBSCRIBE_CONFIRM_TTL_HOURShours.POST /unsubscribe– accepts{ "email": "..." }to remove a subscriber.GET|POST /unsubscribe/{token}– one-click link used in outgoing emails.GET /unsubscribe– renders a simple email-only form for manual removals.
If you want the API in a self-contained container, a minimal compose file is included:
cp .env.example .env # ensure the API has its env vars
docker compose -f docker-compose.api.yaml up --buildThis binds the app to http://localhost:8002, mounts ./state and ./work inside the container, and restarts the service automatically if it exits. Adjust the published port in docker-compose.api.yaml or set API_PORT in .env if you need a different binding.
| Variable | Description |
|---|---|
OPENAI_API_KEY |
API key used for summarization requests |
MODEL_SUMMARY |
OpenAI model name (default gpt-4o-mini) |
MAILGUN_DOMAIN |
Mailgun domain (e.g., mg.opendata.land) |
MAILGUN_API_KEY |
Mailgun REST API key |
MAILGUN_LIST_ADDRESS |
Mailing list alias for fan-out |
MAIL_FROM |
From header used for outbound mail |
MAIL_TO |
Optional seed addresses imported once into the list (leave empty otherwise) |
APP_BASE_URL |
Public HTTPS base URL where the FastAPI app is served |
APP_SIGNING_SECRET |
Secret used to sign unsubscribe tokens |
SUBSCRIBE_FORM_TOKEN |
Optional shared secret required for subscription requests |
APP_BRAND_NAME |
Label used in emails and forms (defaults to Geospatial Podcast Summaries) |
SUBSCRIBE_CONFIRM_TTL_HOURS |
Hours before confirmation links expire (default 48) |
WORK_DIR, STATE_DIR |
Paths for generated files and state cache |
FFMPEG_BIN |
Path to ffmpeg executable |
GDRIVE_* |
Optional Google Drive integration switches |
src/podcast_summarize/
├── EpisodeProcessor.py # pipeline for fetching, summarizing, emailing episodes
├── summarize.py # prompt configuration and transcript formatting helpers
├── emailer.py # Mailgun delivery logic (list-aware)
├── subscriptions.py # Subscriber store + Mailgun list adapters
├── server.py # FastAPI application
└── config.py # Environment configuration helpers
- The project uses
pyproject.tomlfor dependencies and exposes script entry points. - Run
python3 -m compileall src/podcast_summarizebefore committing larger changes to catch syntax errors. - Keep
.envand other secret material out of version control;.env.exampledocuments all required settings.
This repository is maintained by OpenDataLand. Choose and add a license file (LICENSE) if you plan to distribute the project publicly.