Conversation
…nk pre-start Add early-cut splitting for oversized chunks, TTS speed adjustment post-processing, fast initial compression for short-budget rounds, pre-start of last chunk, and next-chunk context in LLM rewrites. Also fix overlap_viz.sh to accept multiple args.
- remove long-chunk splitting and fast-initial-compress paths - chunk 0 always uses sequential TTS without refinement - track prep_start_lead_s on ChunkProfile for adopted pre-started chunks - default enable_early_cut to False
CompareEnv now reads streaming_tts from its own config instead of inheriting from each debater's config, so baseline and test runs share the same streaming setting.
- overlap_viz_par.py renders pre-started last chunks on a separate lane using prep_start_lead_s, so they don't visually collide with the main thread bars of preceding chunks - overlap_viz.sh stacks per-chunk-dir PNGs into a single combined overlap_timeline_combined.png when multiple dirs match - add stack_pngs.py helper that vertically stacks PNGs
- ouragents: fall back to self.config.model when helper_model is None, not just when the attribute is missing - utils/model: pass num_retries=3 to litellm.completion - utils/tool: include traceback in extraction-retry warning
Refactor TTS refinement around a shared _ChunkRefineContext where a prestart worker (kicked off two iters early on ratio or absolute size triggers, plus the existing last-chunk variant) and the normal worker contribute candidates to the same pool. First worker to confirm an in-range candidate wins via try_adopt(). Per-worker fs/llm/tts streams are recorded so overlap_viz_par renders both lanes in parallel with the chosen kind labelled on playback. Chunk merging now uses word count (MIN_CHUNK_WORDS=30) and also folds an undersized last chunk into its predecessor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Overhaul of the streaming TTS chunk pipeline so that one chunk can be refined by multiple workers concurrently, and add a richer overlap visualizer to inspect what actually overlaps on the wall clock.
Highlights:
_ChunkRefineContext): a normal worker and an optional prestart worker push TTS candidates into one pool; the first to confirm an in-range candidate viatry_adopt()wins.i, ifchars(i+2)/chars(i+1) ≥ 2.0orchars(i+2) ≥ 1000, kick off a prestart worker for chunki+2so it has two extra chunks of audio time to refine. The existing last-chunk prestart is now expressed in the same framework (prestart_kind ∈ {"", "ratio", "last"}).1.25× targetis split mid-flight, and a chosen candidate outside tolerance is re-TTSed with clampedspeed ∈ [0.85, 1.15]if that closes the gap.MIN_CHUNK_WORDS=30(replacesMIN_CHUNK_CHARS=50); an undersized last chunk is folded into its predecessor.streaming_ttsis now a real EnvConfig field instead of a free-floating flag.stack_pngs.pyhelper plusoverlap_viz.shupdates make multi-run comparison easier.Motivation and Context
Streaming TTS was previously serial within a chunk: refine → TTS, with only the very last chunk getting a head start. Long mid-stream chunks that needed many refines still blew the audio budget of the chunk playing before them, causing silence gaps. Adopting a shared candidate pool lets us start refining a long chunk two iterations early and keep the normal worker running, so we adopt whichever lands first. The visualizer changes were needed because the old single-lane plot could not show two workers racing against the same playback deadline.
How Has This Been Tested?
Types of changes