v5.3.0 - OmniVoice + native SRT duration targeting, Visual Tag Builder, Granite ASR improvements
Latestexmpvid_small.mp4
Highlights
This release is technically v5.3.0, but the main feature push here is still the OmniVoice integration that landed in v5.2.0, now paired with the Granite ASR additions and fixes from v5.3.0.
The biggest practical change is this:
OmniVoice is the first TTS engine in the suite where subtitle segment duration can be meaningfully guided at generation time.
That matters because the suite now has a model path that can aim for target SRT timing before fallback stretch/correction has to do the heavy lifting.
OmniVoice
OmniVoice is now integrated into the unified suite with:
- official OmniVoice model support
- text TTS and SRT workflows
- multilingual generation with broad upstream language coverage
- instruction-based voice design
- narrator cloning support with explicit reference text
- interruption support in unified generation flows
Native duration-aware SRT generation
This is the part worth paying attention to.
For TTS SRT, the suite can now send target segment duration directly into OmniVoice. In practice that means:
- generated segments can land much closer to subtitle timing targets
stretch_to_fithas less corrective work to do- timing adjustments can stay more natural
- precise subtitle dubbing / timing workflows become much more practical
This is not just fake post-speeding. The model is actually being guided with its native duration control during generation.
Visual Tag Builder
This release also introduces the new 📐 Visual Tag Builder.
It started as an OmniVoice helper, but it became a more general visual tag / attribute assembly node.
Current strengths:
- playful visual reordering of attributes
- built-in OmniVoice preset
- reusable custom presets
- saved column order
- workflow persistence for chosen preset / selections
I’ll add a short demo video showing the interaction separately.
Granite ASR updates in v5.3.0
- Granite ASR 4.1 diarization and timestamp improvements
- plus-model speaker diarization with suite-native
[Speaker]output - fixes for longer transcript cutoff in native timestamp mode
- clearer Granite model / diarization documentation