From Agent Intelligence to Interactive Intelligence. Give your AI agent a body.
A web-based 3D VRM avatar viewer with real-time animations, voice chat, and lip sync โ built for OpenClaw.
Cute sakura UI, multiple background scenes, camera presets, emotion bar, and 162 animations. VRM model not included โ bring your own!
git clone https://github.com/Dongping-Chen/Clawatar.git
cd Clawatar
npm install
npm run startOpen http://localhost:3000 and drop your .vrm model onto the page.
- 162 animations โ wave, dance, think, laugh, shrug, and more (Mixamo VRMA)
- Facial expressions โ happy, sad, angry, surprised, relaxed
- Idle behavior โ avatar looks around, stretches, yawns when waiting
- Touch reactions โ click the avatar for headpats, pokes, and silly reactions โจ
- Sakura/anime theme โ cute pink glassmorphism panels
- Background scenes โ Sakura Garden ๐ธ, Night Sky ๐, Cozy Cafรฉ โ, Sunset ๐
- Camera presets โ Face, Portrait, Full Body, Cinematic with smooth transitions
- Quick emotion bar โ ๐๐ข๐ ๐ฎ๐๐ one-tap expression + animation combos
- Audio-driven lip sync โ mouth moves to actual speech audio
- Voice input โ speak via your browser's microphone
- Voice output โ ElevenLabs TTS (optional, requires API key)
- AI conversation โ powered by OpenClaw (optional)
- Multi-device routing policy โ action/expression is broadcast to all paired devices, while reply text/audio is routed only to the device that triggered the turn
- 6 scenes โ Cozy Bedroom ๐๏ธ, Izakaya ๐ฎ, Cafรฉ โ, Phone Booth ๐, Sunset Balcony ๐, Swimming Pool ๐
- Blender procedural pipeline โ Python scripts generate geometry + materials + lights โ Cycles render โ GLB export
- Emissive-only materials โ all scenes use Emission shaders for reliable rendering in Three.js
- Auto emissive lights โ brightest emissive meshes automatically spawn PointLights
- Camera freedom โ orbit ยฑ135ยฐ inside scenes, configurable per-scene camera + exposure
- Activity modes โ Study, Exercise, Chill with themed camera angles + animations
- Scene loader โ
loadRoomGLB()loads single GLB as entire environment with character lighting
- Join Google Meet / Zoom โ avatar appears via OBS Virtual Camera
- Listen & respond โ captures meeting audio via BlackHole โ Whisper STT โ OpenClaw AI โ TTS
- Smart triggers โ responds when called by name or asked a question
- Streaming pipeline (v3) โ VAD + OpenClaw orchestrated model + streaming ElevenLabs TTS
- No direct LLM calls โ all AI routes through OpenClaw Gateway (model selection, context, persona handled automatically)
- Rolling context โ maintains 2-minute transcript window for coherent responses
- Local WebSocket API โ control everything programmatically on the same machine
- Drag & drop โ load any VRM model
- Standalone mode โ works without OpenClaw or ElevenLabs
- OpenClaw skill โ install as an agent skill for AI-driven avatars
No VRM model is bundled. You can:
- Drag & drop a
.vrmfile onto the viewer - Set a URL in
clawatar.config.jsonโmodel.url - Enter a URL in the Model panel in the UI
Edit clawatar.config.json:
{
"model": { "url": "", "autoLoad": true },
"voice": {
"elevenlabsVoiceId": "your-voice-id",
"elevenlabsModel": "eleven_turbo_v2_5"
},
"server": { "vitePort": 3000, "wsPort": 8765, "audioPort": 8866 },
"openclaw": { "gatewayPort": 18789, "sessionId": "vrm-chat" }
}{"type": "play_action", "action_id": "161_Waving"}
{"type": "set_expression", "name": "happy", "weight": 0.8}
{"type": "speak", "text": "Hello!", "action_id": "161_Waving", "expression": "happy"}
{"type": "reset"}{"type":"sync","category":"action","payload":{"actionId":"161_Waving","expression":"happy","expressionWeight":0.8}}
{"type":"speak_audio","text":"Hello!","audio_url":"https://...","audio_device":"<source_device>","target_device":"<source_device>","reply_device":"<source_device>"}sync/actionis broadcast to keep avatar motion synchronized across all devices.speak_audio/audio_start/audio_chunk/audio_endare reply-routed to the focused source device.
Browser (localhost:3000)
โโโ Three.js + @pixiv/three-vrm
โโโ VRMA animation playback
โโโ Audio-driven lip sync
โโโ Chat UI + Emotion Bar
โ
โ Local WebSocket (loopback ws://127.0.0.1:8765)
โผ
WS Server (server/ws-server.ts)
โโโ Command relay & routing
โโโ ElevenLabs TTS
โโโ OpenClaw Gateway bridge (all AI routing)
โโโ Meeting speech โ Gateway API โ orchestrated model
โ
โผ
OpenClaw Gateway (localhost:18789)
โโโ Model orchestration (Opus/Sonnet/Codex)
โโโ Session & context management
โโโ Persona & memory
- iPhone, iPad, and macOS clients use relay transport only (
/ws/client). - Simulator builds follow the same relay-only policy.
- Direct app WebSocket transport (
ws://127.0.0.1:8765) is removed from Apple clients. ws-server.tsbinds loopback (127.0.0.1) by default and also rejects non-loopback WS clients as defense in depth. SetCLAWATAR_ALLOW_REMOTE_WS_CLIENTS=1only for explicit LAN debugging.- Pairing tokens are long-lived; add new devices with
/pair/add-deviceinstead of creating a new session. - Model orchestration path is unchanged: relay bridge ->
ws-server.ts-> OpenClaw gateway (:18789).
Clawatar includes an OpenClaw skill at skill/SKILL.md. Install it to let your AI agent control the avatar with animations, expressions, and voice.
| Command | Description |
|---|---|
npm run start |
Start dev server + WebSocket server |
npm run dev |
Vite dev server only |
npm run ws-server |
WebSocket server only |
npm run build |
Production build |
npm run catalog |
Regenerate animation catalog |
npm run meeting |
Virtual meeting bridge v2 (continuous listen + smart trigger) |
npm run meeting:v3 |
Virtual meeting bridge v3 (streaming VAD + streaming TTS) |
Each scene is a Blender Python script that generates procedural geometry โ exports GLB.
# Build a scene
/Applications/Blender.app/Contents/MacOS/Blender --background --python blender/build_izakaya_v4.py
# Copy to public
cp /tmp/izakaya.glb public/scenes/izakaya.glb
# Load in viewer
open http://localhost:3000?room=izakaya| Script | Scene | GLB Size |
|---|---|---|
build_room_v9.py |
Cozy Bedroom | 3.7 MB |
build_izakaya_v4.py |
Izakaya Bar | 5.9 MB |
build_cafe_v6.py |
Coffee Cafรฉ | 4.6 MB |
build_phone_booth_v6.py |
Rainy Phone Booth | 1.6 MB |
build_balcony_v8.py |
Sunset Balcony | 7.7 MB |
build_pool_v8.py |
Swimming Pool | 7.1 MB |
- All emission strengths โฅ 3.0 โ sub-1.0 gets baked dark by glTF exporter
- Use Emission shader only (not Principled BSDF) for reliable Three.js rendering
- Cycles renderer โ 64 samples + denoiser
- Center stage clear โ character stands at origin (0,0,0)
- Background elements at Blender -Y โ they end up behind the character in Three.js
- GLB under 8 MB โ optimize mesh complexity
- See
SCENES.mdfor detailed scene configs and review scores
- Install OBS Studio and BlackHole 2ch
- Create a Multi-Output Device (Audio MIDI Setup) โ your speakers + BlackHole 2ch
- Set system output to the Multi-Output Device
- OBS: Add Browser Source โ
http://localhost:3000?embedโ Start Virtual Camera - Start the avatar:
npm run start - Start the meeting bridge:
npm run meeting:v3 - In Google Meet: select OBS Virtual Camera (video) and BlackHole 2ch (mic)
See virtual-meeting/README.md for detailed architecture docs.
- Animations: Mixamo โ non-commercial use, credit required
- VRM rendering: @pixiv/three-vrm
- Inspired by: moeru-ai/airi
MIT โ see LICENSE



