This project implements a Orpheus text-to-speech (TTS) system that converts text input into natural-sounding speech. It uses a combination of neural networks and ONNX runtime for efficient inference.
- Text to speech conversion using neural networks
- ONNX model integration for efficient inference
- Support for 24kHz audio output
- Code generation and audio synthesis pipeline
- C++ and Python implementation
- Python 3.11+
- C++ compiler with C++11 support
- Required Python packages:
- numpy
- soundfile
- onnxruntime
- opencv-python
- pillow
- protobuf
- scipy
- sympy
-
Clone the repository
git clone https://github.com/N-E-W-T-O-N/Orpheus-RKLLM.git
-
Install Python dependencies:
Make sure
uv
is already installed on your device.uv sync
-
Build the C++ component
The project needs
onnxruntime
to run ONNX models and thelibsndfile
library to convert waveforms (a list of floats) into audio files.sudo apt-get install libsndfile1-dev libasound-dev autoconf automake build-essential libasound2-dev libflac-dev libogg-dev libtool libvorbis-dev libopus-dev libmp3lame-dev libmpg123-dev pkg-config libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
Now simply run the following script:
bash build.sh -b
or
g++ Inference.cpp -lrkllmrt input.cpp output.cpp -I onnxruntime/include -L onnxruntime/lib -L onnxruntime/lib/libonnxruntime* -lpthread -ldl -lm -lsndfile -o llm
You can obtain the model in two ways:
-
Download from Hugging Face:
Use the provided Download.py script to download a pre-trained model directly from Hugging Face. -
Export your own model:
Use the Export.py script to export and prepare your own model for inference.
Refer to the respective scripts for usage instructions.
uv run cli.py 1500 2000 "Hey there my name is EDISON, <giggles> and I'm a speech generation model that can sound like a person.I Am a badass person"
- NOTE : Since Huggingface do not have tokenizer in CPP I am using
Input.py
to create Input_Ids which is used by the rkllm model.
export LD_LIBRARY_PATH=$(pwd)/onnxruntime/lib:$LD_LIBRARY_PATH # Model required this Environment Variable to run the onnx model
./llm orpheus_3b_0.1_ft_w8a8_RK3588_GGUF_F16.rkllm 1000 2000 "Features of Good Design Before we proceed to the actual patterns, let’s discuss the process of designing software architecture: things to aim for and things you’d better avoid.Code reuse Cost and time are two of the most valuable metrics when developing any software product. Less time in development means entering the market earlier than competitors. Lower development costs mean more money is left for marketing and a broader reach to potential customers."
To monitor the inference performance of RKLLM on the board like the above figure, you can use the command:
export RKLLM_LOG_LEVEL=1
Process Finish..
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 189073.94 / / /
I rkllm: Prefill 1523.53 47 32.42 30.85
I rkllm: Generate 308636.08 999 308.95 3.24
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 6.66
I rkllm: --------------------------------------------------------------------------------------
This will display the number of tokens processed and the inference time for both the Prefill and Generate stages after each inference, as shown in the figure below. This information will help you evaluate the performance by providing detailed logging of how long each stage of the inference process takes. If you need to view more detailed logs, such as the tokens after encoding the prompt, you can use the following command:
export RKLLM_LOG_LEVEL=2
- "tara",
- "leah",
- "jess",
- "leo",
- "dan",
- "mia",
- "zac",
- "zoe"