RTCstack

A production-ready, self-hosted WebRTC communication platform. Drop-in video conferencing, screen sharing, recording, RTMP/WHIP ingress, and live/post-call transcription — packaged as a single Docker Compose stack with a clean TypeScript SDK and three UI kits.

Built on LiveKit, Caddy, and OpenAI Whisper.

Architecture

RTCstack separates media from signalling. There are two independent data paths:

                        ┌─────────────────────────────────────────────────────────┐
                        │                    Docker Stack                         │
                        │                                                         │
 Browser ──── HTTPS ──▶ │ Caddy ──▶ /v1/*  ──▶  RTCstack API  ──▶  LiveKit SDK  │
                        │                                                         │
 Browser ──── WSS ────▶ │ Caddy ──▶ /livekit ──▶  LiveKit SFU                   │
 Browser ──── UDP ────▶ │ (direct, no proxy)                                     │
                        └─────────────────────────────────────────────────────────┘

Path 1 — Token handshake (HTTPS): Your app calls the RTCstack API to get a signed LiveKit JWT. The API is a thin Fastify server; no media passes through it.

Path 2 — Call (WebRTC): The browser connects directly to LiveKit via WebSocket + UDP. The API is not in the media path; all audio/video is end-to-end encrypted at the SFU layer.

Features

Feature	Status
Video & audio conferencing	✅
Screen sharing	✅
Room recording (composite + per-track)	✅
RTMP / WHIP ingress (stream from OBS, etc.)	✅
Text chat over data channel	✅
Reactions	✅
Live transcription (Whisper, per-room)	✅ optional
Post-call transcription (async, from recording)	✅ optional
Role-based access (host, moderator, participant, viewer)	✅
Device selection (mic, camera, speaker)	✅
Connection quality badges	✅
HMAC-signed API requests + replay protection	✅
Per-key rate limiting	✅
Webhook delivery with retries	✅
TLS via Caddy (automatic or manual cert)	✅
TURN/STUN relay via coturn	✅
S3-compatible recording storage via MinIO	✅
React, Vue 3, Vanilla JS UI kits	✅
Framework-agnostic TypeScript SDK	✅

Quick Start

Prerequisites

Docker 24+ with Compose v2
openssl (for generating secrets)

1. Configure

cd docker
cp .env.example .env

Edit .env — at minimum replace every REPLACE_WITH_* value. You can generate secrets with:

openssl rand -base64 32   # for API_KEY, API_SECRET
openssl rand -base64 32   # for REDIS_PASSWORD, LIVEKIT_API_SECRET, etc.

2. Start

docker compose up -d

This brings up: Caddy, LiveKit, Egress, coturn, the RTCstack API, Redis, and MinIO.

3. Verify

curl http://localhost:3246/v1/health

{
  "status": "ok",
  "version": "0.1.0",
  "livekit": "connected",
  "redis": "connected",
  "minio": "connected",
  "features": {
    "recording": true,
    "transcriptionLive": false,
    "transcriptionPost": false
  }
}

4. Get a token and connect

curl -X POST http://localhost:3246/v1/token \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: <API_KEY>" \
  -H "X-RTCstack-Timestamp: $(date +%s)" \
  -H "X-RTCstack-Signature: sha256=<hmac>" \
  -d '{ "roomId": "my-room", "userId": "alice", "displayName": "Alice", "role": "host" }'

Use the returned token and url with the SDK to connect.

Project Structure

rtcStack/
├── apps/
│   ├── api/                    # Fastify REST API (Node.js / TypeScript)
│   ├── playground/             # Interactive dev playground
│   ├── website/                # VitePress documentation site
│   └── examples/
│       ├── react-example/      # React reference app
│       ├── vue-example/        # Vue 3 reference app
│       └── vanilla-example/    # Vanilla JS reference app
│
├── packages/
│   ├── sdk/                    # @rtcstack/sdk — framework-agnostic SDK
│   ├── ui-react/               # @rtcstack/ui-react — React component library
│   ├── ui-vue/                 # @rtcstack/ui-vue — Vue 3 component library
│   └── ui-vanilla/             # @rtcstack/ui-vanilla — Vanilla JS library
│
├── services/
│   ├── whisper/                # Whisper STT HTTP service (Python)
│   ├── stt-live/               # Live transcription agent (Python / LiveKit Agents)
│   └── stt-worker/             # Post-call transcription worker (Node.js)
│
├── docker/
│   ├── docker-compose.yml      # Main stack
│   ├── docker-compose.stt.yml  # Standalone STT stack (GPU machine)
│   ├── .env.example            # Environment template
│   ├── Caddyfile               # Reverse proxy config
│   ├── livekit.yaml            # LiveKit server config
│   ├── egress.yaml             # Recording config
│   └── turnserver.conf         # coturn TURN/STUN config
│
├── development/                # Architecture plans and standards docs
├── turbo.json                  # Turborepo task pipeline
└── pnpm-workspace.yaml

Packages

`@rtcstack/sdk`

Framework-agnostic TypeScript SDK. The only thing you need to build a custom UI.

import { createCall } from '@rtcstack/sdk'

const call = createCall({ token, url })

await call.connect()

call.on('participantJoined', (p) => console.log(p.name, 'joined'))
call.on('transcriptReceived', (seg) => console.log(seg.speaker, ':', seg.text))
call.on('speakingStarted', (id, name) => console.log(name, 'is speaking…'))

await call.toggleMic()
await call.startScreenShare()
await call.sendMessage('Hello!')
await call.disconnect()

Key APIs:

API	Description
`call.connect() / disconnect()`	Join / leave the room
`call.participants`	`Map<id, Participant>` — all remote participants
`call.localParticipant`	Local participant with media state
`call.activeSpeakers`	Current speakers sorted by audio level
`call.messages`	Chat history (capped at 500)
`call.devices`	Enumerated camera/mic/speaker devices
`call.toggleMic() / toggleCamera()`	Toggle local tracks
`call.startScreenShare() / stopScreenShare()`	Screen sharing
`call.sendMessage(text)`	Broadcast text chat
`call.sendReaction(emoji)`	Send floating emoji reaction
`call.switchDevice(kind, deviceId)`	Switch active device
`call.setLayout(layout)`	`'grid' \| 'speaker' \| 'spotlight'`
`call.pin(participantId)`	Pin a participant

Events:

Event	Payload
`connectionStateChanged`	`ConnectionState`
`participantJoined` / `participantLeft`	`Participant`
`participantUpdated`	`Participant`
`activeSpeakerChanged`	`Participant[]`
`screenShareStarted` / `screenShareStopped`	`Participant` / `id`
`messageReceived`	`Message`
`reactionReceived`	`(from: string, emoji: string)`
`transcriptReceived`	`TranscriptSegment`
`speakingStarted`	`(speakerId: string, speakerName: string)`
`speakingStopped`	`speakerId: string`
`recordingStarted` / `recordingStopped`	—
`reconnecting` / `reconnected`	—
`disconnected`	`reason?: string`
`devicesChanged`	`DeviceList`

For testing without a LiveKit server:

import { createMockCall } from '@rtcstack/sdk/mock'
const call = createMockCall()

`@rtcstack/ui-react`

React 18 component library. Batteries included — full conference room or individual composable pieces.

import { CallProvider, VideoConference } from '@rtcstack/ui-react'
import '@rtcstack/ui-react/dist/styles.css'

function App() {
  return (
    <CallProvider call={call}>
      <VideoConference />
    </CallProvider>
  )
}

Hooks:

const participants   = useParticipants()
const local          = useLocalParticipant()
const speakers       = useActiveSpeakers()
const messages       = useMessages()
const transcripts    = useTranscription()     // TranscriptSegment[]
const speaking       = useSpeakingIndicators() // Map<speakerId, speakerName>
const { toggleMic }  = useMediaControls()

`@rtcstack/ui-vue`

Vue 3 component library with Composition API composables.

<script setup>
import { VideoConference } from '@rtcstack/ui-vue'
import '@rtcstack/ui-vue/dist/styles.css'
</script>

<template>
  <VideoConference />
</template>

Composables mirror the React hooks: useParticipants(), useLocalParticipant(), useTranscription(), useSpeakingIndicators(), useMediaControls(), etc.

`@rtcstack/ui-vanilla`

No-framework JavaScript library. Mount a full conference room into any HTMLElement.

import { mountVideoConference } from '@rtcstack/ui-vanilla'

const { unmount } = mountVideoConference(document.getElementById('room'), { call })

// When done:
unmount()

API Reference

All endpoints live under /v1/. Every request (except /v1/health) must be authenticated — see Authentication.

Health

Method	Path	Auth	Description
GET	`/v1/health`	None	Status and feature flags

Tokens

Method	Path	Description
POST	`/v1/token`	Generate a LiveKit JWT for a user
POST	`/v1/token/refresh`	Refresh an expiring token

POST /v1/token body:

{
  "roomId": "string",
  "userId": "string",
  "displayName": "string",
  "role": "host | moderator | participant | viewer",
  "metadata": {}
}

Response:

{
  "token": "<jwt>",
  "url": "wss://your-domain/livekit",
  "expiresAt": 1714000000
}

Rooms

Method	Path	Description
POST	`/v1/rooms`	Create a room
GET	`/v1/rooms`	List active rooms
GET	`/v1/rooms/:roomId`	Room details
DELETE	`/v1/rooms/:roomId`	Close room (disconnects all participants)

Participants

Method	Path	Description
GET	`/v1/rooms/:roomId/participants`	List participants
DELETE	`/v1/rooms/:roomId/participants/:participantId`	Kick participant
POST	`/v1/rooms/:roomId/participants/:participantId/mute`	Server-side mic mute

Recording

Method	Path	Description
POST	`/v1/rooms/:roomId/recording/start`	Start composite recording
POST	`/v1/rooms/:roomId/recording/stop`	Stop recording
GET	`/v1/rooms/:roomId/recording`	Recording status and egress ID

Recordings are stored in MinIO under the rtcstack-recordings bucket.

Ingress (RTMP / WHIP)

Method	Path	Description
POST	`/v1/rooms/:roomId/ingress`	Create RTMP or WHIP ingress endpoint
GET	`/v1/rooms/:roomId/ingress`	List ingress endpoints
DELETE	`/v1/rooms/:roomId/ingress/:ingressId`	Stop ingress

Webhooks

Method	Path	Description
GET	`/v1/webhooks`	List webhooks
POST	`/v1/webhooks`	Create webhook (url, events[], secret)
GET	`/v1/webhooks/:id`	Webhook details
PATCH	`/v1/webhooks/:id`	Update webhook
DELETE	`/v1/webhooks/:id`	Delete webhook
POST	`/v1/webhooks/:id/test`	Send a test event

Webhook events: room_started, room_finished, participant_joined, participant_left, recording_started, recording_finished, transcription_completed.

Transcription (optional)

Requires TRANSCRIPTION_LIVE_ENABLED=true or TRANSCRIPTION_POST_ENABLED=true and the stt-live / stt-post Docker profile.

Method	Path	Description
POST	`/v1/rooms/:roomId/transcription/start`	Start live transcription (dispatches agent)
POST	`/v1/rooms/:roomId/transcription/stop`	Stop live transcription
GET	`/v1/rooms/:roomId/transcription`	Live transcript segments
PATCH	`/v1/rooms/:roomId/transcription/speakers`	Update speaker display names
POST	`/v1/recordings/:recordingId/transcribe`	Queue post-call transcription
GET	`/v1/transcriptions`	List all transcripts
GET	`/v1/transcriptions/:id`	Full transcript document
DELETE	`/v1/transcriptions/:id`	Delete transcript

Authentication

Every API request (except /v1/health) must include three headers:

X-Api-Key: <API_KEY>
X-RTCstack-Timestamp: <unix-seconds>
X-RTCstack-Signature: sha256=<hmac>

Signature calculation:

canonical_string = METHOD + "\n" + PATH + "\n" + TIMESTAMP + "\n" + SHA256(body || "")
signature = "sha256=" + HMAC-SHA256(API_SECRET, canonical_string).hex()

Note: the body hash is always computed over the raw request body before it is parsed. For GET/DELETE requests with no body, use the SHA-256 of an empty string.

Protections:

Timing-safe comparison — resistant to timing attacks
Replay window — requests with timestamps older than 5 minutes are rejected
Per-key rate limiting — configurable via RATE_LIMIT_MAX / RATE_LIMIT_WINDOW_MS

Node.js example:

import { createHmac, createHash } from 'crypto'

function sign(method: string, path: string, body: string, apiKey: string, apiSecret: string) {
  const ts = Math.floor(Date.now() / 1000).toString()
  const bodyHash = createHash('sha256').update(body).digest('hex')
  const canonical = `${method}\n${path}\n${ts}\n${bodyHash}`
  const sig = 'sha256=' + createHmac('sha256', apiSecret).update(canonical).digest('hex')
  return {
    'X-Api-Key': apiKey,
    'X-RTCstack-Timestamp': ts,
    'X-RTCstack-Signature': sig,
  }
}

Configuration

Copy docker/.env.example to docker/.env and fill in your values.

Core

Variable	Default	Description
`DOMAIN`	`localhost`	Public domain name
`CADDY_TLS_MODE`	`internal`	`internal` (self-signed, dev) or `auto` (Let's Encrypt)
`API_KEY`	—	Pre-shared API key (generate with `openssl rand -base64 32`)
`API_SECRET`	—	HMAC signing secret
`NODE_ENV`	`production`	`development` disables some security checks
`TOKEN_TTL_SECONDS`	`21600`	LiveKit JWT lifetime (6 hours)

LiveKit

Variable	Description
`LIVEKIT_API_KEY`	LiveKit server API key
`LIVEKIT_API_SECRET`	LiveKit server API secret
`LIVEKIT_WSS_URL`	Public WebSocket URL for clients (e.g. `wss://yourdomain/livekit`)
`LIVEKIT_RTC_EXTERNAL_IP`	Server's public IP (for ICE candidates)

Redis

Variable	Default	Description
`REDIS_PASSWORD`	—	Redis auth password
`REDIS_URL`	—	Full Redis URL (used by API and agents)

MinIO (object storage)

Variable	Description
`MINIO_ROOT_USER`	MinIO admin username
`MINIO_ROOT_PASSWORD`	MinIO admin password
`EGRESS_S3_BUCKET`	Bucket for recordings (default: `rtcstack-recordings`)

CORS & Rate Limiting

Variable	Default	Description
`CORS_ORIGINS`	`*`	Comma-separated allowed origins. Empty = allow all
`RATE_LIMIT_MAX`	`100`	Requests per window per API key
`RATE_LIMIT_WINDOW_MS`	`60000`	Rate limit window in milliseconds

Ports

All host ports are configurable. Defaults:

Variable	Default	Service
`PORT_HTTP`	`3240`	Caddy HTTP
`PORT_HTTPS`	`3241`	Caddy HTTPS
`PORT_API`	`3246`	RTCstack API
`PORT_LIVEKIT`	`3242`	LiveKit WS
`PORT_REDIS`	`3247`	Redis
`PORT_MINIO`	`3248`	MinIO API
`PORT_MINIO_CONSOLE`	`3249`	MinIO Console
`PORT_TURN`	`3244`	coturn STUN/TURN UDP
`PORT_WHISPER`	`3281`	Whisper HTTP (STT only)

Docker Profiles

The main stack runs without any profile flag. Optional services are gated behind profiles.

Base stack (no profile)

docker compose up -d

Starts: Caddy, LiveKit, Egress, coturn, API, Redis, MinIO.

Live transcription

docker compose --profile stt-live up -d

Adds: Whisper (STT HTTP engine) + stt-live-agent (LiveKit Agents worker that joins rooms and streams audio to Whisper in real time).

Requires in .env:

TRANSCRIPTION_LIVE_ENABLED=true

Post-call transcription

docker compose --profile stt-post up -d

Adds: Whisper + stt-worker (processes recordings after calls end; queued via Redis).

Requires in .env:

TRANSCRIPTION_POST_ENABLED=true

Both

docker compose --profile stt-live --profile stt-post up -d

GPU acceleration for Whisper

docker compose --profile stt-live \
  -f docker-compose.yml \
  -f docker-compose.stt.gpu.yml \
  up -d

Requires NVIDIA GPU + nvidia-container-toolkit.

Speech-to-Text Add-ons

Live transcription

When started, the live transcription agent:

Joins the LiveKit room as an invisible participant
Subscribes to every audio track
Uses RMS energy detection to identify speech boundaries
Flushes completed speech chunks to Whisper (default: 1.5s silence or 30s max)
Applies hallucination filtering and word deduplication
Broadcasts each segment to all room participants via LiveKit data channel
Stores segments in Redis (retrievable via GET /v1/rooms/:roomId/transcription)

The SDK emits transcriptReceived(segment) and speakingStarted(id, name) / speakingStopped(id) events so your UI can show real-time indicators.

STT tuning variables:

Variable	Default	Description
`PAUSE_THRESHOLD_SECONDS`	`1.5`	Silence gap that closes a speech chunk
`SHORT_PAUSE_SECONDS`	`0.3`	Flush early if last chunk ended a sentence
`MAX_CHUNK_SECONDS`	`30.0`	Hard ceiling per Whisper request
`SPEECH_RMS_THRESHOLD`	`200`	PCM energy threshold for speech detection
`WHISPER_MAX_CONCURRENT`	`2`	Max parallel Whisper requests per room
`STT_LANGUAGE`	`en`	ISO 639-1 language code or `auto`
`WHISPER_MODEL`	`base`	`tiny` / `base` / `small` / `medium` / `large-v3`
`WHISPER_DEVICE`	`cpu`	`cpu` or `cuda`
`WHISPER_FILTER_PHRASES`	—	Comma-separated phrases to drop (Whisper hallucinations)

Tuning tip: If transcription never triggers, lower SPEECH_RMS_THRESHOLD first (try 100). If Whisper produces garbled output, increase PAUSE_THRESHOLD_SECONDS to give it longer chunks.

Post-call transcription

After a recording finishes, call POST /v1/recordings/:recordingId/transcribe. The stt-worker downloads the recording from MinIO, submits it to Whisper in chunks, merges the per-speaker segments, and saves the result back to MinIO. A transcription_completed webhook fires when done.

Examples

Three ready-to-run reference apps are in apps/examples/.

React

cd apps/examples/react-example
pnpm install
pnpm dev

Vue 3

cd apps/examples/vue-example
pnpm install
pnpm dev

Vanilla JS

cd apps/examples/vanilla-example
pnpm install
pnpm dev

All examples expect the API to be running at http://localhost:3246. Edit src/config.ts (or equivalent) to point at a remote stack.

Development

Requirements

Tool	Version
Node.js	≥ 20
pnpm	≥ 9
Docker	≥ 24 with Compose v2
Python	≥ 3.11 (STT services only)

Install

pnpm install

Build all packages

pnpm build

Run tests

pnpm test

Type check

pnpm typecheck

Local dev — full stack

cd docker
docker compose up -d          # infrastructure

cd ../apps/api
pnpm dev                      # API in watch mode on :3246

Local dev — SDK + UI kits only (no Docker)

pnpm dev                      # watches all packages in parallel

The React and Vue UI kits have Storybook:

cd packages/ui-react
pnpm storybook                # http://localhost:6006

cd packages/ui-vue
pnpm storybook

Adding a package

RTCstack uses pnpm workspaces and Turborepo. New packages go in packages/ or apps/; add them to pnpm-workspace.yaml if not already covered by the glob.

Versioning

Packages use Changesets:

pnpm changeset         # describe your change
pnpm changeset version # bump versions
pnpm changeset publish # publish to npm

Infrastructure Services

Service	Image	Role
Caddy	`caddy:2.8-alpine`	TLS termination, reverse proxy, WSS forwarding
LiveKit	`livekit/livekit-server:v1.11`	WebRTC SFU — audio, video, data channels
Egress	`livekit/egress:v1.12`	Room recording engine
coturn	`coturn/coturn:4.6.2-alpine`	STUN/TURN relay for symmetric NAT traversal
RTCstack API	Custom Dockerfile	Token generation, room management REST API
Redis	`redis:7-alpine`	Rate limiting, session state, STT queues
MinIO	`minio/minio`	S3-compatible storage for recordings and transcripts
Whisper (optional)	Custom Dockerfile	OpenAI Whisper speech-to-text HTTP service
stt-live-agent (optional)	Custom Dockerfile	Live transcription agent (LiveKit Agents SDK)
stt-worker (optional)	Custom Dockerfile	Post-call transcription queue worker

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps		apps
assets/logo		assets/logo
docker		docker
packages		packages
services		services
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json

Folders and files

Latest commit

History

Repository files navigation

RTCstack

Contents

Architecture

Features

Quick Start

Prerequisites

1. Configure

2. Start

3. Verify

4. Get a token and connect

Project Structure

Packages

@rtcstack/sdk

@rtcstack/ui-react

@rtcstack/ui-vue

@rtcstack/ui-vanilla

API Reference

Health

Tokens

Rooms

Participants

Recording

Ingress (RTMP / WHIP)

Webhooks

Transcription (optional)

Authentication

Configuration

Core

LiveKit

Redis

MinIO (object storage)

CORS & Rate Limiting

Ports

Docker Profiles

Base stack (no profile)

Live transcription

Post-call transcription

Both

GPU acceleration for Whisper

Speech-to-Text Add-ons

Live transcription

Post-call transcription

Examples

React

Vue 3

Vanilla JS

Development

Requirements

Install

Build all packages

Run tests

Type check

Local dev — full stack

Local dev — SDK + UI kits only (no Docker)

Adding a package

Versioning

Infrastructure Services

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`@rtcstack/sdk`

`@rtcstack/ui-react`

`@rtcstack/ui-vue`

`@rtcstack/ui-vanilla`

Packages