Orpheus Neural Text-to-Speech Engine using RkLLM

Overview

This project implements a Orpheus text-to-speech (TTS) system that converts text input into natural-sounding speech. It uses a combination of neural networks and ONNX runtime for efficient inference.

Features

Text to speech conversion using neural networks
ONNX model integration for efficient inference
Support for 24kHz audio output
Code generation and audio synthesis pipeline
C++ and Python implementation

Prerequisites

Python 3.11+
C++ compiler with C++11 support
Required Python packages:
- numpy
- soundfile
- onnxruntime
- opencv-python
- pillow
- protobuf
- scipy
- sympy

Installation

Clone the repository

git clone https://github.com/N-E-W-T-O-N/Orpheus-RKLLM.git

Install Python dependencies:

Make sure uv is already installed on your device.
```
uv sync
```

Build the C++ component

The project needs onnxruntime to run ONNX models and the libsndfile library to convert waveforms (a list of floats) into audio files.

sudo apt-get install libsndfile1-dev libasound-dev autoconf automake build-essential libasound2-dev  libflac-dev libogg-dev libtool libvorbis-dev libopus-dev libmp3lame-dev libmpg123-dev pkg-config libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0

Now simply run the following script:

bash build.sh -b

or

g++ Inference.cpp -lrkllmrt input.cpp output.cpp -I onnxruntime/include -L onnxruntime/lib -L onnxruntime/lib/libonnxruntime* -lpthread -ldl -lm -lsndfile -o llm

Usage

Model

You can obtain the model in two ways:

Download from Hugging Face:
Use the provided Download.py script to download a pre-trained model directly from Hugging Face.
Export your own model:
Use the Export.py script to export and prepare your own model for inference.

Refer to the respective scripts for usage instructions.

Python Interface

uv run cli.py 1500 2000 "Hey there my name is EDISON, <giggles> and I'm a speech generation model that can sound like a person.I Am a badass person"

Cpp Interface

NOTE : Since Huggingface do not have tokenizer in CPP I am using Input.py to create Input_Ids which is used by the rkllm model.

export LD_LIBRARY_PATH=$(pwd)/onnxruntime/lib:$LD_LIBRARY_PATH  # Model required this Environment Variable to run the onnx model 

./llm orpheus_3b_0.1_ft_w8a8_RK3588_GGUF_F16.rkllm 1000 2000 "Features of Good Design Before we proceed to the actual patterns, let’s discuss the process of designing software architecture: things to aim for and things you’d better avoid.Code reuse Cost and time are two of the most valuable metrics when developing any software product. Less time in development means entering the market earlier than competitors. Lower development costs mean more money is left for marketing and a broader reach to potential customers."

Monitoring

To monitor the inference performance of RKLLM on the board like the above figure, you can use the command:

export RKLLM_LOG_LEVEL=1

Process Finish..
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Stage         Total Time (ms)  Tokens    Time per Token (ms)      Tokens per Second      
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Init          189073.94        /         /                        /                      
I rkllm:  Prefill       1523.53          47        32.42                    30.85                  
I rkllm:  Generate      308636.08        999       308.95                   3.24                   
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Memory Usage (GB)
I rkllm:  6.66        
I rkllm: --------------------------------------------------------------------------------------

This will display the number of tokens processed and the inference time for both the Prefill and Generate stages after each inference, as shown in the figure below. This information will help you evaluate the performance by providing detailed logging of how long each stage of the inference process takes. If you need to view more detailed logs, such as the tokens after encoding the prompt, you can use the following command:

export RKLLM_LOG_LEVEL=2

Voice

"tara",
"leah",
"jess",
"leo",
"dan",
"mia",
"zac",
"zoe"

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Export		Export
onnxruntime		onnxruntime
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
Download.py		Download.py
Inference-Cli.py		Inference-Cli.py
Inference.cpp		Inference.cpp
Input.py		Input.py
Output.py		Output.py
README.md		README.md
RKLLM.py		RKLLM.py
Run.cpp		Run.cpp
build.sh		build.sh
decoder_model.onnx		decoder_model.onnx
output.cpp		output.cpp
output.hpp		output.hpp
output.wav		output.wav
pyproject.toml		pyproject.toml
rkllm.h		rkllm.h
sndfile.h		sndfile.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orpheus Neural Text-to-Speech Engine using RkLLM

Overview

Features

Prerequisites

Installation

Usage

Model

Python Interface

Cpp Interface

Monitoring

Voice

About

Uh oh!

Uh oh!

Languages

N-E-W-T-O-N/OrpheusTTSInference-RKLLM

Folders and files

Latest commit

History

Repository files navigation

Orpheus Neural Text-to-Speech Engine using RkLLM

Overview

Features

Prerequisites

Installation

Usage

Model

Python Interface

Cpp Interface

Monitoring

Voice

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages