v1.0.1
RealtimeSTT v1.0.1
Major Features
-
Kroko/Banafo ASR Support
- Added the
kroko_onnxtranscription engine for Kroko/Banafo.datastreaming models. - Added engine aliases for Kroko/Banafo usage through the transcription engine system.
- Added realtime preview support for streaming engines using persistent transcription sessions.
- Kroko realtime previews feed only newly recorded audio frames instead of repeatedly reprocessing the full buffer.
- Kroko final transcription remains one-shot; the streaming path is used for realtime previews when the selected realtime engine supports it.
- Added the
-
Kroko-ONNX Installer Helper
- Added
stt-install-kroko, exposed through thekroko-builderextra. - The helper builds and installs Kroko-ONNX for the active Python environment.
- On Windows, the current Kroko builder path targets CPython 3.12 x64 and uses Docker Desktop.
- Public Kroko Community models with known filenames can be downloaded into the RealtimeSTT cache.
- Added
-
Meta Omnilingual ASR Support
- Added the
omnilingual_asrtranscription engine for Meta Omnilingual ASR. - Supports the published CTC and LLM model-card plumbing from the upstream
omnilingual-asrpackage. - Uses
omniASR_CTC_1B_v2as the default Omnilingual model when the recorder is still configured with a Whisper-style default model name. - In-memory audio is passed as predecoded waveform dictionaries to avoid upstream raw-array handling issues.
- Added the
Improvements
-
Streaming Engine Interface
- Added a generic streaming transcription session interface.
- Engines can now opt into incremental realtime decoding.
- Existing engines keep the full-buffer fallback behavior.
-
Kroko Realtime Behavior
- Kroko model cadence is used to choose automatic finalization tail padding.
- Added support for suppressing Kroko native stdout/stderr during recognizer calls with
suppress_native_output=True. - Added
KROKO_ONNX_SUPPRESS_LICENSE_OUTPUT=1handling for compatible Kroko builds.
-
Install And Smoke-Test Guidance
- Documented the recommended Kroko recorder install path:
pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]" - Clarified that
kroko-builderbuilds Kroko-ONNX, whilesilero-onnx-cpuprovides the local VAD backend needed by recorder-based smoke tests and live microphone use. - Added a public Kroko smoke script:
tests/realtimestt_kroko_test.py. - Added a public Omnilingual smoke script:
tests/realtimestt_omnilingual_test.py.
- Documented the recommended Kroko recorder install path:
-
Docs
- Added Kroko-ONNX engine documentation.
- Added Omnilingual ASR engine documentation.
- Added installation guidance for platform-sensitive extras.
- Added
docs/licenses.mdwith engine and model-family license notes. - Clarified that
example_fastapi_serveris a source-checkout reference server, not installed by the PyPI wheel.
Fixes
-
Omnilingual Runtime Handling
- Fixed Omnilingual audio handoff so in-memory audio is passed in the format expected by the upstream backend.
- Added clearer dependency/platform guidance for the current Omnilingual stack.
-
Kroko Smoke Path
- Updated Kroko smoke-test guidance so clean installs include the needed local VAD backend.
- The standalone Kroko smoke script now gives a direct hint when the recorder VAD dependency is missing.
Compatibility Notes
- Existing default
faster_whisperusage remains the compatibility path. - Kroko recorder/live usage should install:
pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]" - The Kroko Windows builder currently requires CPython 3.12 x64 plus Docker Desktop with the Linux engine running.
- Meta Omnilingual ASR is intended for Linux/WSL2 with Python 3.11.x.
- Native Windows Omnilingual runtime is not supported because
fairseq2nhas no Windows wheel. - Python 3.12.x is not currently a practical Omnilingual target because of upstream
omnilingual-asrpackage metadata. - Licensed Kroko Pro models require a Pro-capable Kroko wheel and a key supplied through configuration, CLI, or environment variables.
Other
- Added focused Kroko and Omnilingual unit coverage.
- Added public manual smoke paths for Kroko and Omnilingual.
- Package version bumped to
1.0.1.