Native WhatsApp voice calls in pure Go, straight from the browser. Built for native VoIP media, multi-account (multi-session) operation, and a modern browser client.
Overview · Architecture · Quick Start · API · Security
WaCalls pairs one or more WhatsApp accounts via QR code and lets you place and receive 1:1 voice calls from any browser on the LAN. The browser microphone is sent as raw 16 kHz PCM over a WebRTC data channel to the Go server, which encodes it with Meta's MLow codec and injects the media into WhatsApp's SRTP relay mesh — and the reverse path brings the peer's audio back to the browser.
The entire VoIP stack runs natively in pure Go: the MLow voice codec, RTP/SRTP
packetization, STUN, the WebRTC/SCTP relay transport and the <call> signaling,
integrated with whatsmeow and served to a
React 19 client. There is no cgo and no native DLL — the MLow codec is a vendored
pure-Go package, so a plain go build produces a self-contained binary with live audio.
Multiple WhatsApp accounts can be paired and operated side by side, each with its own pairing QR, connection status, and history. A single account can also run several concurrent 1:1 calls at once — one per browser operator — routed independently by call ID.
Status: stable. Outgoing and incoming 1:1 calls reach
ACTIVEwith bidirectional audio, and a single account can hold several of them concurrently. Sessions persist inwacalls.db(pure-Go SQLite).
┌──────────────────────────────────────────────────────────────────────────┐
│ BROWSER (React client) │
│ mic + speaker · WebRTC data channel (16 kHz PCM) · HTTP + SSE │
└───────────────────────────────┬──────────────────────────────────────────┘
│ POST /api/sessions/{sid}/calls/{id}/webrtc (SDP)
│ GET /api/events (SSE)
▼
┌──────────────────────────── GO SERVER (cmd/server) ────────────────────────┐
│ SessionManager registry of accounts (client + CallManager + bridge) │
│ Broker SSE hub (sessions, auth, call lifecycle fan-out) │
│ Bridge pion WebRTC bridge (16 kHz PCM data channel ⇄ call core) │
│ │
│ internal/wa VoipSocket adapter over whatsmeow │
│ internal/voip call · signaling · media · transport · core · wanode │
└───────────────┬──────────────────────────────────────┬────────────────────┘
│ <call> signaling (Signal/USync) │ SRTP media
▼ ▼
┌───────────────┐ ┌──────────────────────┐
│ WhatsApp WS │ │ WhatsApp relay │
│ (whatsmeow) │ │ (SRTP over SCTP/DC) │
└───────────────┘ └──────────────────────┘
| Path | Responsibility |
|---|---|
cmd/server |
HTTP/SSE broker, session manager + store, WebRTC bridge, process lifecycle |
internal/wa |
VoipSocket — sends/receives <call> stanzas via whatsmeow |
internal/voip/core |
Domain types, constants, the VoipSocket interface |
internal/voip/wanode |
Shared WhatsApp-node and JID helpers |
internal/voip/media |
MLow codec (vendored pure-Go mlow/), RTP, SRTP, SSRC, PCM helpers, key derivation |
internal/voip/transport |
SCTP relay, STUN, subscription encoding |
internal/voip/signaling |
<call> stanza build/parse, call-key crypto, relay-ack parsing |
internal/voip/call |
CallManager — orchestrates a single call end to end |
client/ |
React 19 + Vite + Tailwind v4 + shadcn/ui (dialer, call cards, sessions, history) |
The core is internal/voip/call.CallManager, which drives a call end to end. Outgoing
call sequence:
1. POST .../calls → CallManager.StartCall(peerJid)
generates a callID, builds the <call> offer, sends it
2. Browser opens WebRTC → POST .../calls/{id}/webrtc (SDP offer)
the bridge answers with an SDP answer (pion)
3. Peer accepts → events.CallAccept → HandleCallAccept
server receives <relay> + hop-by-hop keys
4. Relay transport → STUN binding/allocate on WhatsApp relays
ICE + DTLS + SCTP DataChannel connect (pion)
5. SRTP media flowing → state goes ACTIVE
├── uplink (you → peer): browser 16 kHz PCM (data channel) → MLow encode → SRTP → relay
└── downlink (peer → you): relay → SRTP → MLow decode → 16 kHz PCM (data channel) → browser
6. Teardown → DELETE .../calls/{id} or events.CallTerminate
CallManager.EndCall + bridge cleanup
Each protocol step (hop-by-hop SRTP key derivation, RTP packetization at PT=120/16 kHz,
STUN relay registration, relay-ack and <call> stanza parsing) is implemented and covered
by tests in internal/voip (go test ./...).
- Go 1.26+
- Node 22+ and npm (only to build/run the React client)
No C compiler, cgo, or native libraries are required — the MLow codec is vendored
pure Go (internal/voip/media/mlow).
# clone and enter the project
git clone <repo-url> wacalls-go
cd wacalls-go
# Go dependencies
go mod download
# React client dependencies
cd client && npm install && cd ..go run ./cmd/server -addr :8080 # add -debug for verbose logsLive audio works out of the box — the MLow codec is pure Go, so a plain build
includes it. No build tags, no CGO_ENABLED, no DLLs.
Open http://localhost:8080, click New session, and scan the QR shown in the browser
(it is also printed in the terminal) with WhatsApp → Linked devices. Add more accounts
the same way and switch between them in the sidebar.
cd client
npm run dev # Vite on :5173, proxies /api → http://localhost:8080For production, build the static client and serve it from the Go server:
cd client && npm run build && cd ..
go run ./cmd/server -static client/dist -addr :8080| Flag | Default | Meaning |
|---|---|---|
-addr |
:8080 |
HTTP listen address |
-db |
wacalls.db |
SQLite session database path |
-static |
client/dist |
Static client directory (optional) |
-debug |
false |
Verbose logging (includes whatsmeow's internal log) |
-max-calls-per-session |
8 |
Max concurrent calls per session (0 = unlimited) |
All routes are session-scoped. Events stream over a single SSE channel, tagged with the
originating sessionId.
| Method | Route | Purpose |
|---|---|---|
GET |
/api/sessions |
List accounts (id, name, jid, status, paired) |
POST |
/api/sessions |
Create an account and begin QR pairing |
DELETE |
/api/sessions/{sid} |
Log out and remove an account |
POST |
/api/sessions/{sid}/logout |
Disconnect an account (keep it for re-pairing) |
POST |
/api/sessions/{sid}/pair |
Re-pair an account (emit a fresh QR) |
POST |
/api/sessions/{sid}/calls |
Start an outgoing call ({ phone, duration_ms?, record? }) |
POST |
/api/sessions/{sid}/calls/{id}/webrtc |
Exchange the browser WebRTC SDP |
POST |
/api/sessions/{sid}/calls/{id}/accept |
Accept an incoming call |
POST |
/api/sessions/{sid}/calls/{id}/reject |
Reject an incoming call |
DELETE |
/api/sessions/{sid}/calls/{id} |
End an active call |
GET |
/api/sessions/{sid}/history |
Recent call history (up to 50 records) |
GET |
/api/events |
Server-sent events (sessions, auth, call lifecycle) |
go test ./... # media stack: SRTP, STUN, RTP, relay-ack, codec, state
cd client && npm run build # client type-check + production buildThe API has no authentication — anyone with HTTP access can create accounts, place
calls, and read history. Run it only on a trusted LAN. wacalls.db holds WhatsApp
session credentials (secrets): do not commit it and keep it protected.
This project builds on the work of:
- whatsmeow — Go WhatsApp Web protocol library
- pion/webrtc — pure-Go WebRTC stack (ICE + DTLS + SCTP)
- whatsapp-rust — reference MLow codec implementation (ported to the vendored pure-Go
internal/voip/media/mlow) - zapo — VoIP media-stack reference



