Skip to content

Dongping-Chen/Clawatar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

74 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Clawatar ๐ŸŽญ

From Agent Intelligence to Interactive Intelligence. Give your AI agent a body.

A web-based 3D VRM avatar viewer with real-time animations, voice chat, and lip sync โ€” built for OpenClaw.

Screenshots

Default sakura theme Night sky with face close-up

Sakura garden with petals Sunset full body view

Cute sakura UI, multiple background scenes, camera presets, emotion bar, and 162 animations. VRM model not included โ€” bring your own!

Quick Start

git clone https://github.com/Dongping-Chen/Clawatar.git
cd Clawatar
npm install
npm run start

Open http://localhost:3000 and drop your .vrm model onto the page.

Features

๐ŸŽญ Avatar & Animation

  • 162 animations โ€” wave, dance, think, laugh, shrug, and more (Mixamo VRMA)
  • Facial expressions โ€” happy, sad, angry, surprised, relaxed
  • Idle behavior โ€” avatar looks around, stretches, yawns when waiting
  • Touch reactions โ€” click the avatar for headpats, pokes, and silly reactions โœจ

๐ŸŒธ Beautiful UI

  • Sakura/anime theme โ€” cute pink glassmorphism panels
  • Background scenes โ€” Sakura Garden ๐ŸŒธ, Night Sky ๐ŸŒ™, Cozy Cafรฉ โ˜•, Sunset ๐ŸŒ…
  • Camera presets โ€” Face, Portrait, Full Body, Cinematic with smooth transitions
  • Quick emotion bar โ€” ๐Ÿ˜Š๐Ÿ˜ข๐Ÿ˜ ๐Ÿ˜ฎ๐Ÿ˜Œ๐Ÿ’ƒ one-tap expression + animation combos

๐ŸŽค Voice & Chat

  • Audio-driven lip sync โ€” mouth moves to actual speech audio
  • Voice input โ€” speak via your browser's microphone
  • Voice output โ€” ElevenLabs TTS (optional, requires API key)
  • AI conversation โ€” powered by OpenClaw (optional)
  • Multi-device routing policy โ€” action/expression is broadcast to all paired devices, while reply text/audio is routed only to the device that triggered the turn

๐Ÿ  3D Scene System (Blender Pipeline)

  • 6 scenes โ€” Cozy Bedroom ๐Ÿ›๏ธ, Izakaya ๐Ÿฎ, Cafรฉ โ˜•, Phone Booth ๐Ÿ“ž, Sunset Balcony ๐ŸŒ‡, Swimming Pool ๐ŸŠ
  • Blender procedural pipeline โ€” Python scripts generate geometry + materials + lights โ†’ Cycles render โ†’ GLB export
  • Emissive-only materials โ€” all scenes use Emission shaders for reliable rendering in Three.js
  • Auto emissive lights โ€” brightest emissive meshes automatically spawn PointLights
  • Camera freedom โ€” orbit ยฑ135ยฐ inside scenes, configurable per-scene camera + exposure
  • Activity modes โ€” Study, Exercise, Chill with themed camera angles + animations
  • Scene loader โ€” loadRoomGLB() loads single GLB as entire environment with character lighting

๐Ÿ“น Virtual Meeting Avatar

  • Join Google Meet / Zoom โ€” avatar appears via OBS Virtual Camera
  • Listen & respond โ€” captures meeting audio via BlackHole โ†’ Whisper STT โ†’ OpenClaw AI โ†’ TTS
  • Smart triggers โ€” responds when called by name or asked a question
  • Streaming pipeline (v3) โ€” VAD + OpenClaw orchestrated model + streaming ElevenLabs TTS
  • No direct LLM calls โ€” all AI routes through OpenClaw Gateway (model selection, context, persona handled automatically)
  • Rolling context โ€” maintains 2-minute transcript window for coherent responses

๐Ÿ”Œ Developer-Friendly

  • Local WebSocket API โ€” control everything programmatically on the same machine
  • Drag & drop โ€” load any VRM model
  • Standalone mode โ€” works without OpenClaw or ElevenLabs
  • OpenClaw skill โ€” install as an agent skill for AI-driven avatars

Bring Your Own Model

No VRM model is bundled. You can:

  1. Drag & drop a .vrm file onto the viewer
  2. Set a URL in clawatar.config.json โ†’ model.url
  3. Enter a URL in the Model panel in the UI

Configuration

Edit clawatar.config.json:

{
  "model": { "url": "", "autoLoad": true },
  "voice": {
    "elevenlabsVoiceId": "your-voice-id",
    "elevenlabsModel": "eleven_turbo_v2_5"
  },
  "server": { "vitePort": 3000, "wsPort": 8765, "audioPort": 8866 },
  "openclaw": { "gatewayPort": 18789, "sessionId": "vrm-chat" }
}

WebSocket Protocol

{"type": "play_action", "action_id": "161_Waving"}
{"type": "set_expression", "name": "happy", "weight": 0.8}
{"type": "speak", "text": "Hello!", "action_id": "161_Waving", "expression": "happy"}
{"type": "reset"}

Multi-device message split

{"type":"sync","category":"action","payload":{"actionId":"161_Waving","expression":"happy","expressionWeight":0.8}}
{"type":"speak_audio","text":"Hello!","audio_url":"https://...","audio_device":"<source_device>","target_device":"<source_device>","reply_device":"<source_device>"}
  • sync/action is broadcast to keep avatar motion synchronized across all devices.
  • speak_audio / audio_start / audio_chunk / audio_end are reply-routed to the focused source device.

Architecture

Browser (localhost:3000)
โ”œโ”€โ”€ Three.js + @pixiv/three-vrm
โ”œโ”€โ”€ VRMA animation playback
โ”œโ”€โ”€ Audio-driven lip sync
โ””โ”€โ”€ Chat UI + Emotion Bar
    โ”‚
    โ”‚ Local WebSocket (loopback ws://127.0.0.1:8765)
    โ–ผ
WS Server (server/ws-server.ts)
โ”œโ”€โ”€ Command relay & routing
โ”œโ”€โ”€ ElevenLabs TTS
โ”œโ”€โ”€ OpenClaw Gateway bridge (all AI routing)
โ””โ”€โ”€ Meeting speech โ†’ Gateway API โ†’ orchestrated model
    โ”‚
    โ–ผ
OpenClaw Gateway (localhost:18789)
โ”œโ”€โ”€ Model orchestration (Opus/Sonnet/Codex)
โ”œโ”€โ”€ Session & context management
โ””โ”€โ”€ Persona & memory

Apple App Transport Policy (Relay-only)

  • iPhone, iPad, and macOS clients use relay transport only (/ws/client).
  • Simulator builds follow the same relay-only policy.
  • Direct app WebSocket transport (ws://127.0.0.1:8765) is removed from Apple clients.
  • ws-server.ts binds loopback (127.0.0.1) by default and also rejects non-loopback WS clients as defense in depth. Set CLAWATAR_ALLOW_REMOTE_WS_CLIENTS=1 only for explicit LAN debugging.
  • Pairing tokens are long-lived; add new devices with /pair/add-device instead of creating a new session.
  • Model orchestration path is unchanged: relay bridge -> ws-server.ts -> OpenClaw gateway (:18789).

OpenClaw Skill

Clawatar includes an OpenClaw skill at skill/SKILL.md. Install it to let your AI agent control the avatar with animations, expressions, and voice.

Scripts

Command Description
npm run start Start dev server + WebSocket server
npm run dev Vite dev server only
npm run ws-server WebSocket server only
npm run build Production build
npm run catalog Regenerate animation catalog
npm run meeting Virtual meeting bridge v2 (continuous listen + smart trigger)
npm run meeting:v3 Virtual meeting bridge v3 (streaming VAD + streaming TTS)

Building Scenes (Blender Pipeline)

Each scene is a Blender Python script that generates procedural geometry โ†’ exports GLB.

# Build a scene
/Applications/Blender.app/Contents/MacOS/Blender --background --python blender/build_izakaya_v4.py

# Copy to public
cp /tmp/izakaya.glb public/scenes/izakaya.glb

# Load in viewer
open http://localhost:3000?room=izakaya

Scene scripts (in blender/)

Script Scene GLB Size
build_room_v9.py Cozy Bedroom 3.7 MB
build_izakaya_v4.py Izakaya Bar 5.9 MB
build_cafe_v6.py Coffee Cafรฉ 4.6 MB
build_phone_booth_v6.py Rainy Phone Booth 1.6 MB
build_balcony_v8.py Sunset Balcony 7.7 MB
build_pool_v8.py Swimming Pool 7.1 MB

Key rules for scene scripts

  • All emission strengths โ‰ฅ 3.0 โ€” sub-1.0 gets baked dark by glTF exporter
  • Use Emission shader only (not Principled BSDF) for reliable Three.js rendering
  • Cycles renderer โ€” 64 samples + denoiser
  • Center stage clear โ€” character stands at origin (0,0,0)
  • Background elements at Blender -Y โ€” they end up behind the character in Three.js
  • GLB under 8 MB โ€” optimize mesh complexity
  • See SCENES.md for detailed scene configs and review scores

Virtual Meeting Setup

  1. Install OBS Studio and BlackHole 2ch
  2. Create a Multi-Output Device (Audio MIDI Setup) โ†’ your speakers + BlackHole 2ch
  3. Set system output to the Multi-Output Device
  4. OBS: Add Browser Source โ†’ http://localhost:3000?embed โ†’ Start Virtual Camera
  5. Start the avatar: npm run start
  6. Start the meeting bridge: npm run meeting:v3
  7. In Google Meet: select OBS Virtual Camera (video) and BlackHole 2ch (mic)

See virtual-meeting/README.md for detailed architecture docs.

Credits

License

MIT โ€” see LICENSE

About

From Agentic Intelligence to Interactive Intelligence. Give your AI agent a body and home.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors