Skip to content

v1.2.0 - Translation Update

Compare
Choose a tag to compare
@I5UCC I5UCC released this 07 Apr 21:01
· 362 commits to main since this release
4e0d909

🢃 Download Release

🢃 Download CPU Only Version

Discord Support Server

With default settings, this program has following requirements:

  • Inference on CPU:
    • ~2GB of storage space.
    • ~400MB of available RAM.
  • Inference on GPU:
    • CUDA enabled GPU (NVIDIA ONLY), otherwise it will fall back to using CPU.
    • ~5GB of storage space.
    • ~1GB of available RAM.
    • ~500MB of available VRAM.
  • SteamVR (IF ran in VR, no Oculus/Meta support as of now.)

Depending on settings changed in the program those requirements can change rapidly.


v1.2.0 Changelog

  • Translation between languages, powered by M2M-100 using ctranslate2.
    • Translate between any of the ~100 languages supported.
    • Translation requires downloading the M2M-100 model into cache, which is another ~2GB.
    • Inference is done on CPU by default, you can change this but i would advise against it, unless you have another 2GB of VRAM to spare.
  • Text timeout is now handled by TextboxSTT, for more consistency between KAT, Textbox and the SteamVR Overlay.
    • e.g. it will consistently populate the Textbox/KAT until either the Text timeout time is reached (30.0 seconds by default), or if it is cleared manually. Changing that value to <=0.0 will never clear the textbox, unless cleared manually.
  • Changed the default "phrase_time_limit" from 2.0 to 1.0, for more "real time" transcriptions in modes "once_continuous" and "realtime"

v1.1.3 Changelog

  • Fixed obs not launching unless reloading the program.
  • added a typewriter effect to the OBS Source for better readability.

v1.1.2 Changelog

  • Fixed context managing issue with audio source in mode once_continuous and realtime
  • Try preventing SteamVR Overlay from freezing by switching Application type to Overlay and reinitializing OVR when error OverlayError_RequestFailed

v1.1.1 Changelog

  • Automatically restarting the program when it is needed.
  • Fixed obs browser source not launching.
  • Fixed whisper transcribing random words when its only noise. (maybe use VAD in the future to avoid this issue and generally better results with transcription)
  • Refactor and logging changes and fixes.
  • Reverted some default value changes

v1.1.0 Changelog

  • #2 allow use of user fine tuned models on Huggingface
    • translation to english does not work with those models, at least with my testing.
    • In the model section of the settings select "custom" and enter a path to a huggingface model: e.g. "openai/whisper-base": You can return to selection by pressing enter on an empty box.
TextboxSTT_n0NS2WHmrr.mp4
  • complete config revamp, same (and more) config options but more organized!
    • sadly for this version you cannot automatically take your old config with you, you can ask in the support discord on how to do that if you have alot of word replacements and/or emotes set.
  • fast reload feature: click on the ⭯ button to quickly reload TextboxSTT
  • added audio settings: added a gain slider and an individiual toggle for each audio feedback step.

    image
  • Shows transcribe times in main UI now.
  • better log management, the program creates up to 5 logs, "latest.log" is the latest. logs are now saved in the "cache" folder.
  • added a program icon, wowee
  • Seperate windows are now always positioning relative to the window that it was opened from, not on the main window.
  • lots of refactoring and additional error logging.
  • updated to faster-whisper 0.3.0
  • some smaller bugfixes