Skip to content

avencera/scriptrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scriptrs

Work in progress

scriptrs is early and intentionally narrow right now:

  • macOS only
  • Apple CoreML only
  • Parakeet TDT v2 only
  • no CUDA
  • no non-macOS backend yet

Rust transcription with native CoreML Parakeet v2 inference.

The base crate exposes a single-chunk TranscriptionPipeline. Fast long-audio chunking lives behind the long-form feature via LongFormTranscriptionPipeline. VAD-backed speech region planning lives behind the vad feature, which also enables long-form.

Current scope

  • Base pipeline for short audio
  • Optional fast long-form pipeline with overlap chunking
  • Optional VAD-backed long-form region planning
  • Native CoreML inference on macOS
  • Hugging Face download support with optional local model loading

What it does not do yet

  • Linux or Windows support
  • CUDA support
  • Other ASR models
  • Streaming transcription
  • Stable public guarantees around model layout or long-form behavior

Install

[dependencies]
scriptrs = "0.1.0"

For fast long-form transcription:

[dependencies]
scriptrs = { version = "0.1.0", features = ["long-form"] }

For VAD-backed long-form transcription:

[dependencies]
scriptrs = { version = "0.1.0", features = ["vad"] }

Model downloads

With the default online feature, scriptrs can resolve models automatically:

  • it downloads the runtime bundle from avencera/scriptrs-models

You can override either side of that:

  • SCRIPTRS_MODELS_DIR=/path/to/models forces a local bundle
  • SCRIPTRS_MODELS_REPO=owner/repo forces a specific Hugging Face model repo layout

Local model layout

If you want to use from_dir(...) or SCRIPTRS_MODELS_DIR, the local bundle should look like this:

models/
  parakeet-v2/
    encoder.mlmodelc/
    decoder.mlmodelc/
    joint-decision.mlmodelc/
    vocab.txt

With vad, add:

models/
  vad/
    silero-vad.mlmodelc/

Usage

Short audio

Use the base pipeline when your audio already fits in a single Parakeet chunk.

With the default online feature, from_pretrained() is the intended path:

use scriptrs::TranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = TranscriptionPipeline::from_pretrained()?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

If the input is too long for the base pipeline, it returns AudioTooLong.

If you want to use a local bundle instead:

use scriptrs::TranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = TranscriptionPipeline::from_dir("models")?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Long audio

Enable long-form if you want scriptrs to own long-audio chunking internally and you care most about speed on clean, dense speech.

use scriptrs::LongFormTranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

LongFormConfig defaults to the fast overlap-chunking path with 4 workers. You can tune the worker count when you want less or more parallelism:

use scriptrs::{LongFormConfig, LongFormTranscriptionPipeline};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let config = LongFormConfig {
        worker_count: 2,
        ..LongFormConfig::default()
    };
    let result = pipeline.run_with_config(&audio, &config)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Enable vad when you want VAD-backed speech region planning for sparse speech, long silences, or recordings with a lot of non-speech audio:

use scriptrs::{LongFormConfig, LongFormMode, LongFormTranscriptionPipeline};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let config = LongFormConfig {
        mode: LongFormMode::Vad,
        ..LongFormConfig::default()
    };
    let result = pipeline.run_with_config(&audio, &config)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Example

A small WAV example is included:

cargo run --example transcribe_wav -- --audio /path/to/file.wav --pretrained
cargo run --example transcribe_wav -- --audio /path/to/file.wav --models-dir models
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --pretrained --long-form
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --models-dir models --long-form
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --pretrained --long-form --long-form-workers 2
cargo run --example transcribe_wav --features vad -- --audio /path/to/file.wav --pretrained --long-form --vad-long-form

The example expects mono 16kHz WAV input.

Notes

  • The public API is still moving
  • scriptrs currently targets the exact file layout and model I/O shipped in avencera/scriptrs-models; if you swap in a different CoreML Parakeet export, you may need runtime code changes
  • Use long-form for the fastest path on clean, dense speech
  • Add vad when you need better robustness on sparse-speech or non-speech-heavy recordings

About

WIP Transcription in rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors