scriptrs

Work in progress

scriptrs is early and intentionally narrow right now:

macOS only
Apple CoreML only
Parakeet TDT v2 only
no CUDA
no non-macOS backend yet

Rust transcription with native CoreML Parakeet v2 inference.

The base crate exposes a single-chunk TranscriptionPipeline. Fast long-audio chunking lives behind the long-form feature via LongFormTranscriptionPipeline. VAD-backed speech region planning lives behind the vad feature, which also enables long-form.

Current scope

Base pipeline for short audio
Optional fast long-form pipeline with overlap chunking
Optional VAD-backed long-form region planning
Native CoreML inference on macOS
Hugging Face download support with optional local model loading

What it does not do yet

Linux or Windows support
CUDA support
Other ASR models
Streaming transcription
Stable public guarantees around model layout or long-form behavior

Install

[dependencies]
scriptrs = "0.1.0"

For fast long-form transcription:

[dependencies]
scriptrs = { version = "0.1.0", features = ["long-form"] }

For VAD-backed long-form transcription:

[dependencies]
scriptrs = { version = "0.1.0", features = ["vad"] }

Model downloads

With the default online feature, scriptrs can resolve models automatically:

it downloads the runtime bundle from avencera/scriptrs-models

You can override either side of that:

SCRIPTRS_MODELS_DIR=/path/to/models forces a local bundle
SCRIPTRS_MODELS_REPO=owner/repo forces a specific Hugging Face model repo layout

Local model layout

If you want to use from_dir(...) or SCRIPTRS_MODELS_DIR, the local bundle should look like this:

models/
  parakeet-v2/
    encoder.mlmodelc/
    decoder.mlmodelc/
    joint-decision.mlmodelc/
    vocab.txt

With vad, add:

models/
  vad/
    silero-vad.mlmodelc/

Usage

Short audio

Use the base pipeline when your audio already fits in a single Parakeet chunk.

With the default online feature, from_pretrained() is the intended path:

use scriptrs::TranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = TranscriptionPipeline::from_pretrained()?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

If the input is too long for the base pipeline, it returns AudioTooLong.

If you want to use a local bundle instead:

use scriptrs::TranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = TranscriptionPipeline::from_dir("models")?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Long audio

Enable long-form if you want scriptrs to own long-audio chunking internally and you care most about speed on clean, dense speech.

use scriptrs::LongFormTranscriptionPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let result = pipeline.run(&audio)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

LongFormConfig defaults to the fast overlap-chunking path with 4 workers. You can tune the worker count when you want less or more parallelism:

use scriptrs::{LongFormConfig, LongFormTranscriptionPipeline};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let config = LongFormConfig {
        worker_count: 2,
        ..LongFormConfig::default()
    };
    let result = pipeline.run_with_config(&audio, &config)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Enable vad when you want VAD-backed speech region planning for sparse speech, long silences, or recordings with a lot of non-speech audio:

use scriptrs::{LongFormConfig, LongFormMode, LongFormTranscriptionPipeline};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let audio: Vec<f32> = load_mono_16khz_audio();
    let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
    let config = LongFormConfig {
        mode: LongFormMode::Vad,
        ..LongFormConfig::default()
    };
    let result = pipeline.run_with_config(&audio, &config)?;

    println!("{}", result.text);
    Ok(())
}

fn load_mono_16khz_audio() -> Vec<f32> {
    Vec::new()
}

Example

A small WAV example is included:

cargo run --example transcribe_wav -- --audio /path/to/file.wav --pretrained
cargo run --example transcribe_wav -- --audio /path/to/file.wav --models-dir models
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --pretrained --long-form
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --models-dir models --long-form
cargo run --example transcribe_wav --features long-form -- --audio /path/to/file.wav --pretrained --long-form --long-form-workers 2
cargo run --example transcribe_wav --features vad -- --audio /path/to/file.wav --pretrained --long-form --vad-long-form

The example expects mono 16kHz WAV input.

Notes

The public API is still moving
scriptrs currently targets the exact file layout and model I/O shipped in avencera/scriptrs-models; if you swap in a different CoreML Parakeet export, you may need runtime code changes
Use long-form for the fastest path on clean, dense speech
Add vad when you need better robustness on sparse-speech or non-speech-heavy recordings

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dev-tools		dev-tools
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scriptrs

Current scope

What it does not do yet

Install

Model downloads

Local model layout

Usage

Short audio

Long audio

Example

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scriptrs

Current scope

What it does not do yet

Install

Model downloads

Local model layout

Usage

Short audio

Long audio

Example

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages