clawcall

Give your OpenClaw / self-hosted AI agent inbound phone calls.

A self-hostable TypeScript service that bridges Twilio inbound voice calls to an OpenClaw gateway — routing every utterance through a full agent turn with native tool access.

Why this exists

OpenClaw's 176k-member Discord has a persistent demand: "I want my agent to answer my phone." There are now two architecturally distinct ways to do that, and clawcall is the second one.

The native plugin approach (@openclaw/voice-call realtime mode, PR #71272 / e2f13959d4): the realtime LLM (OpenAI or Gemini Live) handles the audio session end-to-end and can call a special openclaw_agent_consult tool when it needs to reach the gateway. Sub-second conversational latency; agent tools available via the consult hop.

clawcall's approach: skip the realtime LLM entirely. Every caller utterance goes straight to the OpenClaw gateway via chat.send, running the agent's complete tool-calling loop on the gateway turn — the same path as any other message to your agent. clawcall then synthesises the text reply to speech and delivers it to the caller.

The trade-offs are real and worth stating plainly:

	clawcall	native realtime plugin
Latency	Higher (STT → full agent turn → TTS)	Lower (realtime LLM + async consult hop)
Agent tool access	Every turn, natively	Via `openclaw_agent_consult` tool call
STT/TTS provider	Pluggable (Deepgram / ElevenLabs / keyless dev mode)	OpenAI / ElevenLabs
Model control	Your gateway's configured model	OpenAI or Gemini Live
Self-host / keyless dev	Yes	No

Choose clawcall when you want full agent-turn fidelity on every utterance, control over your STT and TTS providers, or a simpler mental model: the phone call is just another chat.send. Choose the native plugin when you need the lowest possible conversational latency and are comfortable with the realtime LLM + consult-hop model.

Architecture

              PSTN / VoIP
                  │
                  ▼
         ┌─────────────────┐
         │  Twilio Voice   │
         │  (inbound call) │
         └───────┬─────────┘
                 │ POST /twilio/voice
                 │ (TwiML: <Connect><Stream>)
                 ▼
         ┌─────────────────────────────────────────────┐
         │               clawcall server               │
         │  ┌──────────────────────────────────────┐   │
         │  │           CallSession                │   │
         │  │                                      │   │
         │  │  Twilio WS ──► STT ──► GatewayClient │   │
         │  │  (mulaw in)   (Deepgram)  (chat.send) │   │
         │  │                              │        │   │
         │  │                              ▼        │   │
         │  │                     OpenClaw Gateway  │   │
         │  │                     (agent turn —     │   │
         │  │                      tools run here)  │   │
         │  │                              │        │   │
         │  │  Twilio WS ◄── TTS ◄─────────┘        │   │
         │  │  (mulaw out) (ElevenLabs)             │   │
         │  └──────────────────────────────────────┘   │
         └─────────────────────────────────────────────┘
                 │
                 ▼
         OpenClaw Gateway
         ws://127.0.0.1:18789
         (tools: calendar, memory,
          send-message, web search…)

Data flow per utterance:

Twilio streams μ-law 8 kHz audio over WebSocket to /twilio/stream
Deepgram streaming STT transcribes it in real time
On a final transcript, GatewayClient.agentTurn() sends a chat.send message to the OpenClaw gateway over the protocol-v4 WebSocket
The gateway runs the agent's full tool-calling loop; text streams back as deltaText events
ElevenLabs streaming TTS synthesises each chunk as it arrives
μ-law audio is sent back to Twilio, which plays it to the caller
Barge-in: if Deepgram detects speech while audio is playing, TTS is interrupted immediately and a new agent turn begins

Quickstart

1. Prerequisites

Node.js 20+
A running OpenClaw gateway
A Twilio account with a voice-capable phone number
(Optional) Deepgram and ElevenLabs API keys

2. Clone and install

git clone https://github.com/CODEANDTRUST/clawcall.git
cd clawcall
npm install

3. Configure

cp .env.example .env
# Edit .env — minimum required fields:
#   PUBLIC_URL, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN,
#   OPENCLAW_GATEWAY_TOKEN, ALLOW_FROM

See Provider setup below for API key details.

4. Expose your server

During development, use ngrok to get a public HTTPS URL:

ngrok http 3000
# Copy the https://... URL into PUBLIC_URL in your .env

5. Point your Twilio number at clawcall

In the Twilio Console, set the Voice webhook for your number to:

https://your-server.example.com/twilio/voice

HTTP method: POST

6. Start the server

npm run build
npm start

# or for development with hot reload:
npm run dev

7. Call it

Call your Twilio number. The agent will answer.

Provider setup

Twilio (required)

Create an account at twilio.com
Buy a voice-capable phone number
Copy your Account SID and Auth Token from the Console dashboard
Set TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN in .env

OpenClaw gateway (required)

clawcall connects to your locally-running OpenClaw gateway over WebSocket.

// In your openclaw.json:
{
  "gateway": {
    "auth": {
      "token": "your-secret-token"  // → OPENCLAW_GATEWAY_TOKEN
    }
  }
}

Set OPENCLAW_GATEWAY_URL (default: ws://127.0.0.1:18789) and OPENCLAW_GATEWAY_TOKEN in .env.

Deepgram STT (recommended)

Sign up at deepgram.com (free tier available)
Create an API key
Set STT_PROVIDER=deepgram and DEEPGRAM_API_KEY=... in .env

Keyless dev mode: set STT_PROVIDER=null. The server logs audio chunk counts and emits a canned transcript after 2 seconds of audio — enough to exercise the full pipeline without any API key.

ElevenLabs TTS (recommended)

Sign up at elevenlabs.io (free tier available)
Create an API key
Set TTS_PROVIDER=elevenlabs, ELEVENLABS_API_KEY=..., and optionally ELEVENLABS_VOICE_ID=... in .env

Keyless dev mode: set TTS_PROVIDER=twilio-say. Twilio synthesises speech using its built-in <Say> verb — no ElevenLabs account needed. Latency is higher (no streaming) but works with zero external TTS API keys.

Inbound allowlist

By default (INBOUND_POLICY=allowlist), only numbers listed in ALLOW_FROM can reach your agent. This is the recommended production configuration.

# Allow specific numbers (comma-separated E.164):
ALLOW_FROM=+18432965626,+18005551234

# Open to all callers (dev/testing only):
INBOUND_POLICY=all

Important: The allowlist authenticates the caller's reported number via the Twilio webhook signature, which proves the request came from Twilio. It does not cryptographically verify PSTN caller-ID ownership — VoIP callers can spoof their From number. Use this as a first line of defence, not as the only security layer for sensitive agents.

Per-number persona

Map individual Twilio "To" numbers to a custom greeting and ElevenLabs voice:

PERSONA_MAP='[{"from":"+18005551234","greeting":"Welcome to Acme Corp, how can I help?","voiceId":"YOUR_VOICE_ID"}]'

Security notes

Prompt injection over voice: A caller can speak arbitrary text that becomes the agent's input. Configure your OpenClaw agent's system prompt to treat the voice channel as untrusted user input. Do not give the agent access to tools that execute code or send messages without confirmation unless you trust all callers in your allowlist.

Cost: Each call incurs Twilio per-minute charges, Deepgram streaming transcription costs, and ElevenLabs character costs. Set INBOUND_POLICY=allowlist to prevent unexpected charges from unknown callers.

Comparison

	clawcall	`@openclaw/voice-call`	clawphone	deepclaw
Agent tools mid-call	Yes (every turn, gateway)	Via consult tool (#71272)	No (CLI spawn)	No (side LLM)
Streaming STT	Deepgram	OpenAI/ElevenLabs	Twilio built-in	Deepgram
Streaming TTS	ElevenLabs	ElevenLabs	Twilio built-in	ElevenLabs
Keyless dev mode	Yes	No	Yes	No
Inbound allowlist	Yes	Yes	No	No
Self-hostable	Yes	Plugin only	Yes	Yes
Barge-in	Yes	Partial	No	Yes

Contributing

PRs welcome. Please open an issue first for significant changes.

npm test        # run the test suite
npm run build   # typecheck + compile
npm run lint    # ESLint

Built and maintained by Code and Trust

Code and Trust builds AI agent infrastructure for businesses.

Companion guide: Give your OpenClaw agent a phone number

MIT License — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clawcall

Why this exists

Architecture

Quickstart

1. Prerequisites

2. Clone and install

3. Configure

4. Expose your server

5. Point your Twilio number at clawcall

6. Start the server

7. Call it

Provider setup

Twilio (required)

OpenClaw gateway (required)

Deepgram STT (recommended)

ElevenLabs TTS (recommended)

Inbound allowlist

Per-number persona

Security notes

Comparison

Contributing

Built and maintained by Code and Trust

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clawcall

Why this exists

Architecture

Quickstart

1. Prerequisites

2. Clone and install

3. Configure

4. Expose your server

5. Point your Twilio number at clawcall

6. Start the server

7. Call it

Provider setup

Twilio (required)

OpenClaw gateway (required)

Deepgram STT (recommended)

ElevenLabs TTS (recommended)

Inbound allowlist

Per-number persona

Security notes

Comparison

Contributing

Built and maintained by Code and Trust

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages