Minimalistic Unity package that wires OpenAI GPT Realtime and ElevenLabs Voice Agents into a reusable voice-agent prefab. Mostly intentended for prototyping (I am doing this for a university lecture to help students get started). The goal is to have a simple, open source starting point installable through the Unity Package Manager via Git.
- Basic Realtime Voice AI integration for conversational AI via websockets
- Tool calls with an easy annotation method of C# code to register the tool with LLM
- Event annotations that broadcast in-game happenings back to the model as structured user messages
- Sample prefabs (sphere + educational cube) that showcase audio playback, tooling, and event-driven guidance for students
This is not intended to be a full library, but rather as a minimalistic starting point for websocket voice AI integration. I will not really maintain this beyound the scope required for my students. Feel free to make feature suggestions or report bugs - just no promise I will resolve the issue.
If you are looking for a complete library for OpenAI check out this well maintained OpenAI library: https://github.com/RageAgainstThePixel/com.openai.unity
Also for Elevenlabs voice generation (but no agents) rather look at: https://github.com/RageAgainstThePixel/com.rest.elevenlabs
If someones wants to maintain this feel free to fork it and I will link to your repo.
- ✅ Repository scaffolding and planning documents
- ✅ Realtime OpenAI voice loop with large streaming buffer & server-driven interruption handling
- ✅ Attribute-based function calling: annotate any
MonoBehaviourmethod with[RealtimeTool]to expose it as an OpenAI tool - ✅ Sample prefab (
SarcasticSphereAgent) demonstrating realtime playback, audio-reactive scaling, and tool-controlled movement - ✅ ElevenLabs realtime controller (streaming audio playback + tool-call bridge)
- 🔄 Additional debugging tooling (see
plan.mdfor roadmap)
- Clone this repository and open the root Unity project (tested with
6000.2.9f1). - The package lives under
Packages/com.dfin.voiceagent. The project manifest references it via a local path for rapid iteration. - Install supporting dependencies via the Unity Package Manager:
com.unity.nuget.newtonsoft-json(official Json.NET fork, IL2CPP compatible).https://github.com/endel/NativeWebSocket.git#upm(WebSocket layer that works on desktop, Android, iOS, Quest).
- Open
Voice Agent → Settingsto createAssets/VoiceAgent/Resources/VoiceAgentSettings.asset, enter development API keys, and adjust options. The OpenAI section stores realtime model defaults and VAD settings; the ElevenLabs section stores thexi-api-key, agent id, optional voice override, and expected output sample rate. - Drop
OpenAiRealtimeControlleron a GameObject (the required mic/audio components are added automatically). On play, the controller will create a fallbackAudioListenerif your scene does not already have one, then stream mic input and play back the model's audio responses in real time. If you need to stop playback, callCancelActiveResponses()manually.- The built-in streaming queue holds roughly 30 minutes of PCM audio by default; adjust
StreamingAudioPlayer.MaxBufferedSecondsif you want a different memory/latency trade-off. - The inspector exposes
Request Initial Response On Connect; leave it checked if you want an automatic greeting (response.create) right aftersession.update, or disable it for a silent start.
- The built-in streaming queue holds roughly 30 minutes of PCM audio by default; adjust
- For an ElevenLabs-only prototype, add
ElevenLabsRealtimeControlleralongsideMicrophoneCaptureandStreamingAudioPlayer. Set Connect On Start, paste a valid API key + agent id into the settings asset, and optionally enable Log Events to see transcripts/VAD scores in the console. - Try the sample prefabs under
Assets/VoiceAgent/Prefabs/:SarcasticSphereAgent.prefabwires in the realtime controller, an audio-reactive scaler, and a tool that lets the model move the sphere along the X-axis (clamped to[-1, 1]).EducationalCubeAgent.prefabswaps the sphere for a cube with three clickable mini-cubes. Each click raises a[RealtimeEvent], interrupts playback, and lets the agent guide students through the right sequence while exposing a reset tool. Both prefabs ship with the initial-response toggle enabled so they greet you as soon as Play mode starts.
- Read
DEVELOPMENT.mdfor coding standards and contribution workflow as they evolve.
Expose Unity methods to the OpenAI Realtime model with one attribute:
using DFIN.VoiceAgent.OpenAI;
public class MoveCube : MonoBehaviour
{
[RealtimeTool("Moves the cube to an absolute X coordinate between -1 and 1.")]
public void SetCubeX(
[RealtimeToolParam("Absolute world X position (-1 .. 1).")] float x)
{
transform.position = new Vector3(Mathf.Clamp(x, -1f, 1f), transform.position.y, transform.position.z);
}
}[RealtimeTool]marks the method; the optionalnameargument overrides the function name exposed to the model.[RealtimeToolParam]documents each argument and controls whether it is required (defaults totrue).- Supported parameter types: strings, booleans, numeric types, and enums. Optional parameters must be nullable or supply a default value.
OpenAiRealtimeControllerdiscovers tools automatically at runtime, advertises them insession.update, and invokes them when the model issues afunction_call. Return values are serialized and streamed back; void methods send a default “Tool call handled.” message.- Check
SphereMovementToolfor a concrete example included in the package. - For more on the payload format and capabilities, see the official OpenAI Function Calling guide.
- Unity Package Manager →
Add package from git URL… - Use
https://github.com/dfin/unity-voice-agent.git#path=Packages/com.dfin.voiceagent
plan.md– phased technical roadmap (kept up to date during development).DEVELOPMENT.md– contributor setup, coding patterns (stub for now).- Future student-facing tutorials and sample explanations will live under
Packages/com.dfin.voiceagent/Documentation~/. - Extra API notes live under
docs/—keep them in sync with runtime behavior.
MIT License (see LICENSE). Do as you please with this. If you want to use this for a game PLEASE, PLEASE, PLEASE DO! I want to see amazing AI characters in games and this has so much potential. Also if anyone needs help for Voice AI integration in games (eg in Unreal Engine etc) feel free to reach out. Happy to help.
The editor configuration stores API keys in serialized assets for ease of use. Treat them as development-only credentials and rotate them if a project is shared.