Local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP
Built on LiveKit Agents with fully local STT/TTS/LLM using Speaches (Faster-Whisper STT), Kokoro (TTS), and Ollama.
- Local Voice Pipeline: Speaches (Faster-Whisper STT) + Kokoro (TTS) + Ollama LLM
- Wake Word Detection: "Hey Cal" activation via Picovoice Porcupine
- n8n Integrations: Home Assistant, APIs, databases - anything n8n can connect to
- Web Search: DuckDuckGo integration for real-time information
- Webhook API: External triggers for announcements and tool reload
- Mobile App: Flutter client for Android and iOS (see
mobile/)
# Clone and configure
git clone https://github.com/CoreWorxLab/caal.git
cd caal
cp .env.example .env
nano .env # Set CAAL_HOST_IP, OLLAMA_HOST, N8N_MCP_URL, N8N_MCP_TOKEN
# Deploy
docker compose up -dOpen http://YOUR_SERVER_IP:3000 from any device on your network.
Requirements:
- Docker with NVIDIA Container Toolkit (for GPU acceleration)
- Ollama running on your network
- n8n with MCP enabled (Settings > MCP Access)
- 12GB+ VRAM recommended
CAAL supports three network configurations:
| Mode | Voice From | Access URL | Command |
|---|---|---|---|
| LAN HTTP | Host machine only | http://localhost:3000 |
docker compose up -d |
| LAN HTTPS | Any LAN device | https://192.168.1.100 |
docker compose --profile https up -d |
| Tailscale | Anywhere | https://your-machine.tailnet.ts.net |
docker compose --profile https up -d |
Why the difference? Browsers block microphone access on HTTP except from localhost. HTTPS is required for voice from other devices.
Simplest setup. Voice works from the host machine; other devices can use text chat:
# Set your LAN IP in .env
CAAL_HOST_IP=192.168.1.100
# Start
docker compose up -dFull voice from any device on your LAN using locally-trusted certificates:
1. Install mkcert and generate certs:
# Install mkcert (Arch/Manjaro)
sudo pacman -S mkcert
# Install mkcert (Ubuntu/Debian)
sudo apt install mkcert
# Install mkcert (macOS)
brew install mkcert
# Install local CA (one-time, may need browser restart)
mkcert -install
# Generate cert for your LAN IP
mkcert 192.168.1.100
# Move to certs folder with standard names
mkdir -p certs
mv 192.168.1.100.pem certs/server.crt
mv 192.168.1.100-key.pem certs/server.key2. Configure .env:
CAAL_HOST_IP=192.168.1.100
HTTPS_DOMAIN=192.168.1.1003. Set key permissions and rebuild frontend:
chmod 644 certs/server.key # nginx needs read access
# Frontend bakes in wss:// URL at build time - must rebuild
docker compose --profile https build frontend4. Start with HTTPS profile:
docker compose --profile https up -d5. Access from any LAN device:
https://192.168.1.100
Note: Other devices on your LAN need the mkcert CA installed to avoid certificate warnings. Run
mkcert -CAROOTto find the CA cert, then install it on other devices.
Access CAAL from anywhere with HTTPS via Tailscale:
1. Generate Tailscale certificates:
# Get your Tailscale hostname
tailscale status | head -1
# Generate certs (replace with your hostname)
tailscale cert your-machine.tailnet.ts.net
# Move certs to project with standard names
mkdir -p certs
mv your-machine.tailnet.ts.net.crt certs/server.crt
mv your-machine.tailnet.ts.net.key certs/server.key2. Configure .env:
CAAL_HOST_IP=100.x.x.x # Your Tailscale IP (tailscale ip -4)
HTTPS_DOMAIN=your-machine.tailnet.ts.net # Your Tailscale hostname3. Rebuild frontend and start:
# Frontend bakes in wss:// URL at build time - must rebuild
docker compose --profile https build frontend
# Start all services
docker compose --profile https up -d4. Access from any Tailscale device:
https://your-machine.tailnet.ts.net
┌───────────────────────────────────────────────────────────────────────┐
│ Docker Compose Stack │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Frontend │ │ LiveKit │ │ Speaches │ │ Kokoro │ │
│ │ (Next.js) │ │ Server │ │ (STT, GPU) │ │ (TTS, GPU) │ │
│ │ :3000 │ │ :7880 │ │ :8000 │ │ :8880 │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │ │
│ │ └───────────────┼───────────────┘ │
│ └───────────────────────┐ │ │
│ │ │ │
│ ┌─────┴───────┴─────┐ │
│ │ Agent │ │
│ │ (Voice Pipeline) │ │
│ │ :8889 (webhooks) │ │
│ └─────────┬─────────┘ │
│ │ │
└────────────────────────────────────┼──────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
│ Ollama │ │ n8n │ │ Your │
│ (LLM) │ │ Workflows │ │ APIs │
└───────────┘ └───────────┘ └───────────┘
External Services (on your network)
| Variable | Description | Default |
|---|---|---|
CAAL_HOST_IP |
Your server's LAN IP (required for WebRTC) | - |
N8N_MCP_URL |
n8n MCP server URL (required) | - |
LIVEKIT_URL |
LiveKit server URL | ws://localhost:7880 |
SPEACHES_URL |
Speaches STT server URL | http://localhost:8000 |
KOKORO_URL |
Kokoro TTS server URL | http://localhost:8880 |
WHISPER_MODEL |
Faster-Whisper model | Systran/faster-whisper-medium |
TTS_VOICE |
Kokoro voice name | am_puck |
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
OLLAMA_MODEL |
LLM model name | ministral-3:8b |
OLLAMA_THINK |
Enable thinking mode (slower) | false |
PORCUPINE_ACCESS_KEY |
Picovoice key for wake word | - |
CAAL discovers tools from n8n workflows via MCP. Each workflow with a webhook trigger becomes a voice command.
Example workflows are included in the n8n-workflows/ folder:
cd n8n-workflows
cp config.env.example config.env
nano config.env # Set your n8n IP and API key
python setup.py # Creates all workflows in n8n- Enable MCP in n8n: Settings > MCP Access > Enable MCP
- Set connection method to Access Token and copy the token
- Enable workflow access in each workflow's settings
- Set
N8N_MCP_URLin.envto your n8n MCP endpoint (e.g.,http://192.168.1.100:5678/mcp-server/http)
| Workflow | Voice Command |
|---|---|
espn_get_nfl_scores |
"What are the NFL scores?" |
calendar_get_events |
"What's on my calendar today?" |
hass_control |
"Turn on the office lamp" |
radarr_search_movies |
"Do I have any Batman movies?" |
See n8n-workflows/README.md for full documentation.
CAAL supports "Hey Cal" wake word detection using Picovoice Porcupine.
Setup:
- Get a free access key from Picovoice Console
- Train a custom "Hey Cal" wake word and download the Web (WASM) model
- Place file in
frontend/public/:hey_cal.ppn- Custom wake word model (must replace with your own)
- Add to
.env:PORCUPINE_ACCESS_KEY=your_key_here - Rebuild frontend:
docker compose build frontend && docker compose up -d
Usage:
- Toggle wake word on/off with the ear icon in the control bar
- Say "Hey Cal" to activate - CAAL responds with a greeting
- Conversation continues until agent finishes speaking
External systems can trigger CAAL actions via HTTP:
| Endpoint | Method | Description |
|---|---|---|
/announce |
POST | Make CAAL speak a message |
/wake |
POST | Trigger wake word greeting |
/reload-tools |
POST | Refresh MCP tool cache |
/health |
GET | Health check |
Example - Announce:
curl -X POST http://localhost:8889/announce \
-H "Content-Type: application/json" \
-d '{"message": "Package delivered at front door"}'Example - Reload Tools:
curl -X POST http://localhost:8889/reload-tools \
-H "Content-Type: application/json" \
-d '{"tool_name": "calendar_create_event"}'A Flutter mobile client is available in the mobile/ directory for Android and iOS.
cd mobile
cp .env.example .env
nano .env # Set CAAL_SERVER_URL to your server
flutter pub get
flutter runNote: Wake word requires training separate mobile models from Picovoice Console (the web WASM models don't work on mobile).
See mobile/README.md for full documentation.
# Install dependencies
uv sync
# Start infrastructure (LiveKit + Speaches + Kokoro)
docker compose up -d livekit speaches kokoro
# Run agent locally
uv run voice_agent.py dev
# Run frontend locally
cd frontend && pnpm install && pnpm devDevelopment commands:
uv run ruff check src/ # Lint
uv run mypy src/ # Type check
uv run pytest # Testcaal/
├── voice_agent.py # Main entry point
├── .env # Environment variables
├── docker-compose.yaml # Docker deployment
├── prompt/
│ └── default.md # System prompt template
├── frontend/ # Next.js web interface
│ ├── public/ # Wake word models go here
│ └── components/ # UI components
├── mobile/ # Flutter mobile app
│ ├── lib/ # Dart source code
│ ├── android/ # Android config
│ └── ios/ # iOS config
├── n8n-workflows/ # Example n8n workflows
│ ├── setup.py # One-command deployment
│ ├── config.env.example # Configuration template
│ └── *.json # Workflow definitions
└── src/caal/
├── integrations/ # n8n MCP, web search
├── llm/ # Ollama with think parameter
├── webhooks.py # HTTP API endpoints
└── utils/ # Formatting helpers
Symptom: Frontend loads but voice doesn't work
-
Check CAAL_HOST_IP in
.env- must match your network mode:- LAN HTTP/HTTPS: your LAN IP (e.g.,
192.168.1.100) - Tailscale: your Tailscale IP (
tailscale ip -4)
- LAN HTTP/HTTPS: your LAN IP (e.g.,
-
Check firewall - these ports must be open:
Port Protocol Purpose 3000 TCP Web UI 7880 TCP WebSocket signaling 7881 TCP/UDP WebRTC fallback 50000-50100 UDP WebRTC media -
Check LiveKit logs:
docker compose logs livekit | grep -i "ice\|error"
# Check agent logs
docker compose logs -f agent
# Verify Speaches (STT) is healthy
curl http://localhost:8000/health
# Verify Kokoro (TTS) is healthy
curl http://localhost:8880/health
# Verify Ollama is reachable
curl http://YOUR_OLLAMA_IP:11434/api/tagsSymptom: Agent logs show "error connecting to Ollama"
Ollama defaults to localhost only. Start it with network binding:
OLLAMA_HOST=0.0.0.0 ollama serveOr set in your shell profile:
export OLLAMA_HOST=0.0.0.0Symptom: Frontend times out waiting for agent, especially on first connection
Ollama unloads models after 5 minutes by default. On slower drives (HDD), reloading takes too long.
Option 1 - Keep model loaded:
OLLAMA_HOST=0.0.0.0 OLLAMA_KEEP_ALIVE=24h ollama serveOption 2 - Pre-load model before connecting:
ollama run qwen3:8b # or your configured model- Verify
N8N_MCP_URLandN8N_MCP_TOKENin.env - Check n8n has MCP enabled (Settings > MCP Access)
- Ensure workflows have webhook triggers and are active
Normal - models download on first run (~2-5 minutes). Watch with:
docker compose logs -f speaches kokoro# Generate new API keys
docker run --rm livekit/livekit-server generate-keys
# Update .env and livekit.yaml with generated valuesFor HTTPS, see Network Modes. Options:
- LAN HTTPS (mkcert): Full voice from any device on your local network
- Tailscale: Full voice from anywhere via Tailscale network
Both use the same --profile https and nginx for TLS termination.
-
No Streaming STT: Faster-Whisper uses batch processing (waits for speech to end). This is a fundamental limitation of Whisper-based solutions.
-
Wake Word Models: The Python
.ppnmodels don't work in browser - you need the Web (WASM) version from Picovoice.
- LiveKit Agents - Voice agent framework
- Speaches - Faster-Whisper STT server
- Kokoro-FastAPI - Kokoro TTS server
- Ollama - Local LLM server
- n8n - Workflow automation
- Picovoice Porcupine - Wake word engine
MIT License - see LICENSE for details.