Skip to content

v5.3.0 - OmniVoice + native SRT duration targeting, Visual Tag Builder, Granite ASR improvements

Latest

Choose a tag to compare

@diodiogod diodiogod released this 24 Jun 01:58
exmpvid_small.mp4

Highlights

This release is technically v5.3.0, but the main feature push here is still the OmniVoice integration that landed in v5.2.0, now paired with the Granite ASR additions and fixes from v5.3.0.

The biggest practical change is this:

OmniVoice is the first TTS engine in the suite where subtitle segment duration can be meaningfully guided at generation time.

That matters because the suite now has a model path that can aim for target SRT timing before fallback stretch/correction has to do the heavy lifting.

OmniVoice

OmniVoice is now integrated into the unified suite with:

  • official OmniVoice model support
  • text TTS and SRT workflows
  • multilingual generation with broad upstream language coverage
  • instruction-based voice design
  • narrator cloning support with explicit reference text
  • interruption support in unified generation flows

Native duration-aware SRT generation

This is the part worth paying attention to.

For TTS SRT, the suite can now send target segment duration directly into OmniVoice. In practice that means:

  • generated segments can land much closer to subtitle timing targets
  • stretch_to_fit has less corrective work to do
  • timing adjustments can stay more natural
  • precise subtitle dubbing / timing workflows become much more practical

This is not just fake post-speeding. The model is actually being guided with its native duration control during generation.

Visual Tag Builder

This release also introduces the new 📐 Visual Tag Builder.

It started as an OmniVoice helper, but it became a more general visual tag / attribute assembly node.

Current strengths:

  • playful visual reordering of attributes
  • built-in OmniVoice preset
  • reusable custom presets
  • saved column order
  • workflow persistence for chosen preset / selections

I’ll add a short demo video showing the interaction separately.

Granite ASR updates in v5.3.0

  • Granite ASR 4.1 diarization and timestamp improvements
  • plus-model speaker diarization with suite-native [Speaker] output
  • fixes for longer transcript cutoff in native timestamp mode
  • clearer Granite model / diarization documentation