This project is a command-line tool for converting speech from audio files into text using the Vosk speech recognition engine. It processes large audio files in efficient chunks, converting raw audio data from u8
to i16
slices and leveraging the Vosk library to perform the speech-to-text conversion. The recognized text is then saved to an output text file.
- Programming Language: Rust
- Speech Recognition: Vosk Speech Recognition API (via the
vosk
crate) - Core Dependencies:
- Rust Standard Library:
std::error::Error
,std::fs::File
,std::io::{Read, Write}
,std::path::Path
,std::slice
, andstd::time::Instant
- Vosk Model: Requires a pre-downloaded speech recognition model from Vosk Models
- Rust Standard Library:
- Rust & Cargo (latest stable version recommended)
- Vosk Model Data (download a model suitable for your language)
- The
vosk
Rust crate (add this to yourCargo.toml
):[dependencies] vosk = "x.y.z" # Replace x.y.z with the appropriate version
- SpeechRecognizer Struct:
Contains methods for initializing the recognizer, processing audio files, and converting byte slices.new(model_path: &str, sample_rate: f32)
: Constructor to initialize the recognizer with the path to the Vosk model and the audio sample rate.recognize_audio_file(audio_file_path: &str)
: Reads the audio file in predefined chunks, converts each chunk for processing, and uses the Vosk API to recognize speech. The final text output is written to a file with a derived name.as_i16_slice(buffer: &[u8])
: Converts a slice of bytes to a slice of 16-bit integers. This is used to prepare audio data for the recognition process.
Follow these steps to set up and run the project:
-
Clone the Repository
git clone https://github.com/dsddevs/speech-to-text-rust.git cd speech-to-text-rust
-
Install Rust and Cargo If you haven’t installed Rust already, follow the instructions on the official Rust website.
-
Download the Vosk Model
- Visit the Vosk Models page.
- Download a model that suits your language requirements.
- Extract the model to a directory on your machine.
-
Configure the Environment Set the environment variable for the model path or update the application configuration to point to your model:
export VOSK_MODEL_PATH="/path/to/your/model"
-
Build the Project Use Cargo to build the project in release mode:
cargo build --release
-
Run the Application Execute the application by providing an audio file as an argument:
cargo run --release -- /path/to/audio.wav
The tool will process the provided audio file and create a corresponding text file containing the recognized speech.
-
File Processing:
The audio file is read in chunks (using a chunk size defined in the code), ensuring efficient memory usage. Each chunk is converted fromu8
data intoi16
slices for feeding into the recognizer. -
Speech Recognition:
The Vosk recognizer handles the data in streaming mode, processing each chunk and collecting partial results. Once the processing is complete, it finalizes the recognition to produce the complete text output. -
Output Generation:
The final recognized text is saved to a text file whose name is derived from the original audio filename. -
Logging and Performance:
The program logs processing progress and computes the total time taken for the conversion, providing insights into the performance of the tool.
- The application uses Rust's robust error handling (
Result
andBox<dyn Error>
) to capture I/O errors, decoding issues, and other potential runtime problems. - In case of errors during file read/write operations or recognition processing, appropriate error messages will be displayed.
- Improving error messages and adding more comprehensive exception handling.
- Supporting real-time or streaming audio input.
- Extending functionality to handle various audio formats more gracefully.
- Adding configuration options for advanced tuning of the Vosk recognizer parameters.
Contributions are welcome! Please fork the repository, create a new branch for your changes, and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the Apache License 2.0.