Unleash your digital characters with the magic of AI-driven voice, animation, and personality!
Persona Engine is your all-in-one toolkit for creating captivating, interactive avatars! It masterfully combines:
🎨 Live2D: For expressive, real-time character animation.
🧠 Large Language Models (LLMs): Giving your character a unique voice and personality.
🎤 Automatic Speech Recognition (ASR): To understand voice commands and conversation.
🗣️ Text-to-Speech (TTS): Enabling your character to speak naturally.
🎭 Real-time Voice Cloning (RVC - Optional): To mimic specific voice characteristics.
Perfectly suited for VTubing 🎬, dynamic streaming 🎮, and innovative virtual assistant applications 🤖.
Let's bring your character to life like never before! ✨

Witness the Persona Engine creating digital magic:

(Click the image above to watch the demo!)
And here's another little glimpse into what the friendly engine can do:
output.webm
- 🌸 Overview: What's Inside?
- 🚀 Getting Started: Installation Guide
- ✨ Features Galore!
- ⚙️ Architecture / How it Works
- 💡 Potential Use Cases
- 🤝 Contributing
- 🎭 Live2D Integration Guide
- 💬 Join Our Community!
- ❓ Support & Contact
Persona Engine listens to your voice 🎤, thinks using powerful AI language models 🧠 (guided by a personality you define!), speaks back with a synthesized voice 🔊 (which can optionally be cloned using RVC!), and animates a Live2D avatar 🎭 accordingly.
It integrates seamlessly into streaming software like OBS Studio using Spout for high-quality visual output. The included "Aria" model is specially rigged for optimal performance, but you can integrate your own Live2D models (see the Live2D Integration Guide).
Important
Persona Engine achieves the most natural and in-character interactions when used with a specially fine-tuned Large Language Model (LLM). This model is trained to understand the engine's specific communication format.
While you can use standard OpenAI-compatible models (e.g., from Ollama, Groq, OpenAI), it requires careful prompt engineering within the personality.txt
file. We provide a template (personality_example.txt
) in the repository to guide you.
Detailed instructions for configuring personality.txt
for standard models are crucial and can be found in the Installation Guide.
👉 Want to try the fine-tuned model or see a live demo? Hop into our Discord! 😊
➡️ Please follow the detailed Installation and Setup Guide to install prerequisites, download models, configure, and run the engine. ⬅️
Key Requirements Covered:
- System: Mandatory NVIDIA GPU with CUDA support is required for core features (ASR, TTS, RVC).
- Software: .NET Runtime, espeak-ng.
- AI Models: Downloading Whisper ASR models.
- Live2D: Setting up your Live2D model (using Aria or your own).
- (Optional) RVC: Real-time Voice Cloning setup.
- LLM: Configuring access to your chosen LLM (API keys, endpoints).
- Streaming: Setting up Spout output for OBS/other software.
- Configuration: Understanding
appsettings.json
. - Troubleshooting: Common issues and solutions.
-
🎭 Live2D Avatar Integration:
- Loads and renders Live2D models (
.model3.json
). - Includes the specially rigged "Aria" model.
- Supports emotion-driven animations (
[EMOTION:name]
) and VBridger-standard lip-sync parameters. - Dedicated services for Emotion, Idle, and Blinking animations.
- See the detailed Live2D Integration & Rigging Guide for custom model requirements!
- Loads and renders Live2D models (
-
🧠 AI-Driven Conversation:
- Connects to OpenAI-compatible Large Language Model (
LLM
) APIs (local or cloud). - Guided by your custom
personality.txt
file. - Features improved conversation context and session management for more robust interactions.
- Optimized for the optional special fine-tuned model (see Overview).
- Connects to OpenAI-compatible Large Language Model (
-
🗣️ Voice Interaction (Requires NVIDIA
GPU
):- Listens via microphone (using
NAudio
/PortAudio
). - Detects speech segments using Silero
VAD
. - Understands speech using Whisper
ASR
(viaWhisper.NET
). - Includes dedicated Barge-In Detection to handle user interruptions more gracefully.
- Uses a small, fast Whisper model for interruption detection and a larger, more accurate model for transcription.
- Listens via microphone (using
-
🔊 Advanced Text-to-Speech (
TTS
) (Requires NVIDIAGPU
):- Sophisticated pipeline: Text Normalization -> Sentence Segmentation -> Phonemization ->
ONNX
Synthesis. - Brings text to life using custom
kokoro
voice models. - Uses
espeak-ng
as a fallback for unknown words/symbols.
- Sophisticated pipeline: Text Normalization -> Sentence Segmentation -> Phonemization ->
-
👤 Optional Real-time Voice Cloning (
RVC
) (Requires NVIDIAGPU
):- Integrates
RVC
ONNX
models. - Modifies the
TTS
voice output in real-time to sound like a specific target voice. - Can be disabled for performance.
- Integrates
-
📜 Customizable Subtitles:
- Displays spoken text with configurable styling options via the
UI
.
- Displays spoken text with configurable styling options via the
-
💬 Control
UI
& Chat Viewer:- Dedicated
UI
window for monitoring engine status. - Viewing latency metrics (LLM, TTS, Audio)
- Live adjustment of
TTS
parameters (pitch, rate) and Roulette Wheel settings. - View and edit the conversation history.
- Dedicated
-
👀 Screen Awareness (Experimental):
- Optional Vision module allows the AI to "see" and read text from specified application windows.
-
🎡 Interactive Roulette Wheel (Experimental):
- An optional, configurable on-screen roulette wheel for interactive fun.
-
📺 Streaming Output (
Spout
):- Sends visuals (Avatar, Subtitles, Roulette Wheel) directly to OBS Studio or other
Spout
-compatible software. - Uses separate, configurable
Spout
streams (no window capture needed!).
- Sends visuals (Avatar, Subtitles, Roulette Wheel) directly to OBS Studio or other
-
🎶 Audio Output:
- Plays generated speech clearly via
PortAudio
.
- Plays generated speech clearly via
-
⚙️ Configuration:
- Primary setup via
appsettings.json
(details in Installation Guide). - Real-time adjustments for some settings via the Control
UI
.
- Primary setup via
-
🤬 Profanity Filtering:
- Basic keyword list + optional Machine Learning (
ML
)-based filtering forLLM
responses.
- Basic keyword list + optional Machine Learning (
Persona Engine operates in a continuous loop, bringing your character to life through these steps:
-
Listen: 🎤
- A microphone captures audio.
- A Voice Activity Detector (VAD) identifies speech segments.
-
Understand: 👂
- A fast Whisper model detects potential user interruptions.
- Once speech ends, a more accurate Whisper model transcribes the full utterance.
-
Contextualize (Optional): 👀
- If enabled, the Vision module captures text content from specified application windows.
-
Think: 🧠
- Transcribed text, conversation history, optional screen context, and rules from
personality.txt
are sent to the configured Large Language Model (LLM).
- Transcribed text, conversation history, optional screen context, and rules from
-
Respond: 💬
- The LLM generates a text response.
- This response may include emotion tags (e.g.,
[EMOTION:😊]
) or commands.
-
Filter (Optional): 🤬
- The response is checked against profanity filters.
-
Speak: 🔊
- The Text-to-Speech (TTS) system converts the filtered text into audio.
- It uses a
kokoro
voice model primarily. - It falls back to
espeak-ng
for unknown elements.
-
Clone (Optional): 👤
- If Real-time Voice Cloning (RVC) is enabled, it modifies the TTS audio in real-time.
- This uses an ONNX model to match the target voice profile.
-
Animate: 🎭
- Phonemes extracted during TTS drive lip-sync parameters (VBridger standard).
- Emotion tags in the LLM response trigger corresponding Live2D expressions or motions.
- Idle animations play when the character is not speaking to maintain a natural look.
- (See Live2D Integration & Rigging Guide for details!)
-
Display:
- 📜 Subtitles are generated from the spoken text.
- 📺 The animated avatar, subtitles, and optional Roulette Wheel are sent via dedicated Spout streams to OBS or other software.
- 🎶 The synthesized (and optionally cloned) audio is played through the selected output device.
-
Loop:
- The engine returns to the listening state, ready for the next interaction.
- 🎬 VTubing & Live Streaming: Create an AI co-host, an interactive character responding to chat, or even a fully AI-driven VTuber persona.
- 🤖 Virtual Assistant: Build a personalized, animated desktop companion that talks back.
- 🏪 Interactive Kiosks: Develop engaging virtual guides for museums, trade shows, retail environments, or information points.
- 🎓 Educational Tools: Design an AI language practice partner, an interactive historical figure Q&A bot, or a dynamic tutor.
- 🎮 Gaming: Implement more dynamic and conversational Non-Player Characters (NPCs) or companion characters in games.
- 💬 Character Chatbots: Allow users to have immersive conversations with their favorite fictional characters brought to life.
Contributions are highly welcome! If you have improvements, bug fixes, or new features in mind, please follow these steps:
- Discuss (Optional but Recommended): For major changes, please open a GitHub Issue first to discuss your ideas.
- Fork: Fork the repository to your own GitHub account.
- Branch: Create a new feature branch for your changes (
git checkout -b feature/YourAmazingFeature
). - Code: Make your changes. Please try to adhere to the existing coding style and add comments where necessary.
- Commit: Commit your changes with clear messages (
git commit -m 'Add some AmazingFeature'
). - Push: Push your branch to your fork (
git push origin feature/YourAmazingFeature
). - Pull Request: Open a Pull Request (PR) back to the
main
branch of the original repository. Describe your changes clearly in the PR.
Your help in making Persona Engine better is greatly appreciated! 😊
Need help getting started? Have questions or brilliant ideas? 💡 Want to see a live demo, test the special fine-tuned model, or chat directly with a Persona Engine character? Having trouble converting RVC models or rigging your own Live2D model? Come say hi on Discord! 👋

You can also report bugs or request features via GitHub Issues.
- Primary Support & Community: Please join our Discord Server for help, discussion, and demos.
- Bug Reports & Feature Requests: Please use GitHub Issues.
- Direct Contact: You can also reach out via Twitter/X.
Tip
Remember to consult the Live2D Integration & Rigging Guide for details on preparing custom avatars. For detailed setup steps, please refer to the Installation and Setup Guide.