Release v1.2.0 - Translation Update · I5UCC/VRCTextboxSTT

With default settings, this program has following requirements:

Inference on CPU:
- ~2GB of storage space.
- ~400MB of available RAM.
Inference on GPU:
- CUDA enabled GPU (NVIDIA ONLY), otherwise it will fall back to using CPU.
- ~5GB of storage space.
- ~1GB of available RAM.
- ~500MB of available VRAM.
SteamVR (IF ran in VR, no Oculus/Meta support as of now.)

Depending on settings changed in the program those requirements can change rapidly.

Translation between languages, powered by M2M-100 using ctranslate2.
- Translate between any of the ~100 languages supported.
- Translation requires downloading the M2M-100 model into cache, which is another ~2GB.
- Inference is done on CPU by default, you can change this but i would advise against it, unless you have another 2GB of VRAM to spare.
Text timeout is now handled by TextboxSTT, for more consistency between KAT, Textbox and the SteamVR Overlay.
- e.g. it will consistently populate the Textbox/KAT until either the Text timeout time is reached (30.0 seconds by default), or if it is cleared manually. Changing that value to <=0.0 will never clear the textbox, unless cleared manually.
Changed the default "phrase_time_limit" from 2.0 to 1.0, for more "real time" transcriptions in modes "once_continuous" and "realtime"

Fixed context managing issue with audio source in mode once_continuous and realtime
Try preventing SteamVR Overlay from freezing by switching Application type to Overlay and reinitializing OVR when error OverlayError_RequestFailed

Automatically restarting the program when it is needed.
Fixed obs browser source not launching.
Fixed whisper transcribing random words when its only noise. (maybe use VAD in the future to avoid this issue and generally better results with transcription)
Refactor and logging changes and fixes.
Reverted some default value changes

#2 allow use of user fine tuned models on Huggingface
- translation to english does not work with those models, at least with my testing.
- In the model section of the settings select "custom" and enter a path to a huggingface model: e.g. "openai/whisper-base": You can return to selection by pressing enter on an empty box.

TextboxSTT_n0NS2WHmrr.mp4

complete config revamp, same (and more) config options but more organized!
- sadly for this version you cannot automatically take your old config with you, you can ask in the support discord on how to do that if you have alot of word replacements and/or emotes set.
fast reload feature: click on the ⭯ button to quickly reload TextboxSTT
added audio settings: added a gain slider and an individiual toggle for each audio feedback step.
Shows transcribe times in main UI now.
better log management, the program creates up to 5 logs, "latest.log" is the latest. logs are now saved in the "cache" folder.
added a program icon, wowee
Seperate windows are now always positioning relative to the window that it was opened from, not on the main window.
lots of refactoring and additional error logging.
updated to faster-whisper 0.3.0
some smaller bugfixes

Provide feedback