agent-ttp

Render an agent-authored YAML podcast script into a complete, listenable MP3 using OpenAI text-to-speech. episode.mp3

This is not a raw text-to-speech tool. The model is:

raw source → an agent rewrites it into a listening-first script → the CLI renders it to audio

The agent is the writer/producer. It reads source material and rewrites it into coherent spoken segments.
The CLI is the renderer/compiler. It validates, technically chunks overlong segments, calls TTS, inserts pauses, normalizes, and stitches a single file. It never tries to understand the source.

Install

Run the CLI directly with npx (no install needed):

npx @spicadust/agent-ttp --help

Install the agent skill (teaches a coding agent the author → validate → render workflow) with npx skills:

# project-local
npx skills add https://github.com/AirswitchAsa/agent-ttp/tree/main/skills/agent-ttp
# or global, scoped to Claude Code
npx skills add https://github.com/AirswitchAsa/agent-ttp/tree/main/skills/agent-ttp -g -a claude-code

Requires Node ≥ 20 and an OpenAI API key. No ffmpeg — audio is assembled in-process.

Provide your OpenAI API key

render needs a key; validate does not. The key is resolved in this order — the first one found wins:

npx @spicadust/agent-ttp render … --api-key sk-...   # 1. explicit flag
export OPENAI_API_KEY=sk-...              # 2. environment variable
echo 'OPENAI_API_KEY=sk-...' >> .env      # 3. .env in the current directory
npx @spicadust/agent-ttp api-key set                 # 4. stored in ~/.agent-ttp/config.json (prompts, hidden input)

Check what's active with npx @spicadust/agent-ttp api-key status.

Quick start

npx @spicadust/agent-ttp validate script.yaml     # free, no API calls, no key required
npx @spicadust/agent-ttp render script.yaml -o episode.mp3

Worked examples live in skills/agent-ttp/examples/:

File	What it shows
`script.yaml`	Two-voice dialogue introducing the tool
`en-article-briefing.yaml`	Single-narrator news briefing rewritten from an article
`zh-paper-summary.yaml`	Mandarin (`zh-CN`) explainer summarizing a paper
`bilingual-language-learning.yaml`	Per-segment `language` override — English instruction, Spanish examples

Validate any of them without an API key: npx @spicadust/agent-ttp validate skills/agent-ttp/examples/zh-paper-summary.yaml.

Script format (YAML)

title: "Transformer Paper Walkthrough"
language: "zh-CN"                       # default language; each segment may override
style: "calm, dense, explanatory"
model: "gpt-4o-mini-tts-2025-12-15"    # latest gpt-4o-mini-tts snapshot
max_chars: 2000                         # technical-chunk threshold (≤ 4096)

voices:
  host:  { voice: cedar, instructions: "Calm, knowledge-focused Mandarin." }
  guest: { voice: marin, instructions: "Thoughtful podcast co-host." }

segments:
  - id: intro
    speaker: host
    intent: hook
    pause_after_ms: 700
    text: >
      Today we are going to explain what this paper actually solves.
  - id: question
    speaker: guest               # alternating speaker = dialogue
    instructions: "Ask as a genuine, curious question."
    text: >
      So the real question is which bottleneck it removes?

Parameter cascade (most-specific wins): a segment's model / instructions / language override the voice's, which override the script-level defaults. The speaker field binds a segment to a named voice — alternate speakers and you get a two-person dialogue for free.

instructions is the only delivery knob — natural-language direction for tone, accent, pace, emotion, and whispering. agent-ttp intentionally exposes no separate speed parameter: pacing is part of instructions (e.g. "speak slowly and clearly"), which keeps one delivery model instead of two competing ones. language is per-segment: the API has no language field, so the resolved language is carried through instructions as a natural-language clause (zh-CN → "Speak in Mandarin Chinese."), and a single episode can switch languages block to block — which is what makes language-learning content possible.

Two kinds of chunking, kept separate

Semantic chunking is the agent's editorial job: writing coherent spoken segments.
Technical chunking is the CLI's job: splitting a segment on sentence boundaries only when it exceeds max_chars, then stitching the audio back seamlessly.

Commands

Invoke via npx @spicadust/agent-ttp <command>, or as the bare agent-ttp <command> after a global install. The grammar below uses the short form.

agent-ttp validate <script.yaml> [--json]
agent-ttp render <script.yaml> -o <out.mp3|out.wav>
    [--model <id>] [--voice <name>] [--api-key <key>]
    [--cache <dir> | --no-cache] [--no-normalize] [--bitrate <kbps>]
agent-ttp api-key set | status | unset

Output format follows the -o extension: .mp3 (default, ~0.5 MB/min) or .wav (uncompressed, zero-encode).
The API key resolves from --api-key → OPENAI_API_KEY → .env → ~/.agent-ttp/config.json.
Generated audio is cached per segment (keyed on model + voice + instructions + text), so re-rendering after editing one segment only re-synthesizes that segment.

How it works

PCM is the universal currency. Each segment is synthesized as raw 24 kHz/16-bit/mono PCM, concatenated with silence for pauses, peak-normalized, and encoded once at the end — WAV via a hand-written header, MP3 via the pure-JS lamejs encoder. No external binary is ever invoked.

Agent skill

skills/agent-ttp/SKILL.md teaches a coding agent the full workflow: read source → rewrite into a listening-first script → validate → render → return the file.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
skills/agent-ttp		skills/agent-ttp
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-ttp

Install

Provide your OpenAI API key

Quick start

Script format (YAML)

Two kinds of chunking, kept separate

Commands

How it works

Agent skill

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-ttp

Install

Provide your OpenAI API key

Quick start

Script format (YAML)

Two kinds of chunking, kept separate

Commands

How it works

Agent skill

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages