Skip to content

v1.0.1

Choose a tag to compare

@KoljaB KoljaB released this 22 May 19:50
· 51 commits to master since this release

RealtimeSTT v1.0.1

Major Features

  • Kroko/Banafo ASR Support

    • Added the kroko_onnx transcription engine for Kroko/Banafo .data streaming models.
    • Added engine aliases for Kroko/Banafo usage through the transcription engine system.
    • Added realtime preview support for streaming engines using persistent transcription sessions.
    • Kroko realtime previews feed only newly recorded audio frames instead of repeatedly reprocessing the full buffer.
    • Kroko final transcription remains one-shot; the streaming path is used for realtime previews when the selected realtime engine supports it.
  • Kroko-ONNX Installer Helper

    • Added stt-install-kroko, exposed through the kroko-builder extra.
    • The helper builds and installs Kroko-ONNX for the active Python environment.
    • On Windows, the current Kroko builder path targets CPython 3.12 x64 and uses Docker Desktop.
    • Public Kroko Community models with known filenames can be downloaded into the RealtimeSTT cache.
  • Meta Omnilingual ASR Support

    • Added the omnilingual_asr transcription engine for Meta Omnilingual ASR.
    • Supports the published CTC and LLM model-card plumbing from the upstream omnilingual-asr package.
    • Uses omniASR_CTC_1B_v2 as the default Omnilingual model when the recorder is still configured with a Whisper-style default model name.
    • In-memory audio is passed as predecoded waveform dictionaries to avoid upstream raw-array handling issues.

Improvements

  • Streaming Engine Interface

    • Added a generic streaming transcription session interface.
    • Engines can now opt into incremental realtime decoding.
    • Existing engines keep the full-buffer fallback behavior.
  • Kroko Realtime Behavior

    • Kroko model cadence is used to choose automatic finalization tail padding.
    • Added support for suppressing Kroko native stdout/stderr during recognizer calls with suppress_native_output=True.
    • Added KROKO_ONNX_SUPPRESS_LICENSE_OUTPUT=1 handling for compatible Kroko builds.
  • Install And Smoke-Test Guidance

    • Documented the recommended Kroko recorder install path:
      pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
    • Clarified that kroko-builder builds Kroko-ONNX, while silero-onnx-cpu provides the local VAD backend needed by recorder-based smoke tests and live microphone use.
    • Added a public Kroko smoke script: tests/realtimestt_kroko_test.py.
    • Added a public Omnilingual smoke script: tests/realtimestt_omnilingual_test.py.
  • Docs

    • Added Kroko-ONNX engine documentation.
    • Added Omnilingual ASR engine documentation.
    • Added installation guidance for platform-sensitive extras.
    • Added docs/licenses.md with engine and model-family license notes.
    • Clarified that example_fastapi_server is a source-checkout reference server, not installed by the PyPI wheel.

Fixes

  • Omnilingual Runtime Handling

    • Fixed Omnilingual audio handoff so in-memory audio is passed in the format expected by the upstream backend.
    • Added clearer dependency/platform guidance for the current Omnilingual stack.
  • Kroko Smoke Path

    • Updated Kroko smoke-test guidance so clean installs include the needed local VAD backend.
    • The standalone Kroko smoke script now gives a direct hint when the recorder VAD dependency is missing.

Compatibility Notes

  • Existing default faster_whisper usage remains the compatibility path.
  • Kroko recorder/live usage should install:
    pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
  • The Kroko Windows builder currently requires CPython 3.12 x64 plus Docker Desktop with the Linux engine running.
  • Meta Omnilingual ASR is intended for Linux/WSL2 with Python 3.11.x.
  • Native Windows Omnilingual runtime is not supported because fairseq2n has no Windows wheel.
  • Python 3.12.x is not currently a practical Omnilingual target because of upstream omnilingual-asr package metadata.
  • Licensed Kroko Pro models require a Pro-capable Kroko wheel and a key supplied through configuration, CLI, or environment variables.

Other

  • Added focused Kroko and Omnilingual unit coverage.
  • Added public manual smoke paths for Kroko and Omnilingual.
  • Package version bumped to 1.0.1.