feat(tts): parallel streaming pipeline with prestart workers and overlap viz by WinstonLiyt · Pull Request #4 · LeiLiLab/TreeDebater

WinstonLiyt · 2026-05-04T19:37:50Z

Description

Overhaul of the streaming TTS chunk pipeline so that one chunk can be refined by multiple workers concurrently, and add a richer overlap visualizer to inspect what actually overlaps on the wall clock.

Highlights:

Shared candidate pool (_ChunkRefineContext): a normal worker and an optional prestart worker push TTS candidates into one pool; the first to confirm an in-range candidate via try_adopt() wins.
Ratio-based prestart: at iter i, if chars(i+2)/chars(i+1) ≥ 2.0 or chars(i+2) ≥ 1000, kick off a prestart worker for chunk i+2 so it has two extra chunks of audio time to refine. The existing last-chunk prestart is now expressed in the same framework (prestart_kind ∈ {"", "ratio", "last"}).
Early-cut + speed adjust: a chunk whose fastspeech estimate exceeds 1.25× target is split mid-flight, and a chosen candidate outside tolerance is re-TTSed with clamped speed ∈ [0.85, 1.15] if that closes the gap.
Word-based chunk merging: MIN_CHUNK_WORDS=30 (replaces MIN_CHUNK_CHARS=50); an undersized last chunk is folded into its predecessor.
EnvConfig flag: streaming_tts is now a real EnvConfig field instead of a free-floating flag.
Overlap viz: per-worker fs/llm/tts streams produce two parallel lanes (normal + prestart) instead of one, the playback bar is labelled with the prestart kind, and a stack_pngs.py helper plus overlap_viz.sh updates make multi-run comparison easier.

Motivation and Context

Streaming TTS was previously serial within a chunk: refine → TTS, with only the very last chunk getting a head start. Long mid-stream chunks that needed many refines still blew the audio budget of the chunk playing before them, causing silence gaps. Adopting a shared candidate pool lets us start refining a long chunk two iterations early and keep the normal worker running, so we adopt whichever lands first. The visualizer changes were needed because the old single-lane plot could not show two workers racing against the same playback deadline.

How Has This Been Tested?

Ran the streaming pipeline end-to-end on representative debate configs and verified the generated overlap timeline matches expectations (prestart lane visible, chosen candidate marked, no gaps where a prestart was adopted).

Types of changes

Fix bugs
Add new feature
Update documentation

…nk pre-start Add early-cut splitting for oversized chunks, TTS speed adjustment post-processing, fast initial compression for short-budget rounds, pre-start of last chunk, and next-chunk context in LLM rewrites. Also fix overlap_viz.sh to accept multiple args.

- remove long-chunk splitting and fast-initial-compress paths - chunk 0 always uses sequential TTS without refinement - track prep_start_lead_s on ChunkProfile for adopted pre-started chunks - default enable_early_cut to False

CompareEnv now reads streaming_tts from its own config instead of inheriting from each debater's config, so baseline and test runs share the same streaming setting.

- overlap_viz_par.py renders pre-started last chunks on a separate lane using prep_start_lead_s, so they don't visually collide with the main thread bars of preceding chunks - overlap_viz.sh stacks per-chunk-dir PNGs into a single combined overlap_timeline_combined.png when multiple dirs match - add stack_pngs.py helper that vertically stacks PNGs

- ouragents: fall back to self.config.model when helper_model is None, not just when the attribute is missing - utils/model: pass num_retries=3 to litellm.completion - utils/tool: include traceback in extraction-retry warning

Refactor TTS refinement around a shared _ChunkRefineContext where a prestart worker (kicked off two iters early on ratio or absolute size triggers, plus the existing last-chunk variant) and the normal worker contribute candidates to the same pool. First worker to confirm an in-range candidate wins via try_adopt(). Per-worker fs/llm/tts streams are recorded so overlap_viz_par renders both lanes in parallel with the chosen kind labelled on playback. Chunk merging now uses word count (MIN_CHUNK_WORDS=30) and also folds an undersized last chunk into its predecessor.

WinstonLiyt added 8 commits April 17, 2026 19:49

fix: fix a bug

0a9ce0c

Merge remote-tracking branch 'origin/main' into ytli_417

f6151d7

refactor: simplify tts streaming pipeline

9f533cd

- remove long-chunk splitting and fast-initial-compress paths - chunk 0 always uses sequential TTS without refinement - track prep_start_lead_s on ChunkProfile for adopted pre-started chunks - default enable_early_cut to False

refactor: move streaming_tts flag to EnvConfig

b12afa4

CompareEnv now reads streaming_tts from its own config instead of inheriting from each debater's config, so baseline and test runs share the same streaming setting.

fix: small robustness fixes

a3fa57c

- ouragents: fall back to self.config.model when helper_model is None, not just when the attribute is missing - utils/model: pass num_retries=3 to litellm.completion - utils/tool: include traceback in extraction-retry warning

dqwang122 merged commit f56f18b into main May 15, 2026
3 checks passed

dqwang122 deleted the ytli_417 branch May 15, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): parallel streaming pipeline with prestart workers and overlap viz #4

feat(tts): parallel streaming pipeline with prestart workers and overlap viz #4
dqwang122 merged 8 commits into
mainfrom
ytli_417

WinstonLiyt commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WinstonLiyt commented May 4, 2026

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants