AVA answers your calls when you can't, holds a natural conversation with a human-like persona, and keeps you in the loop via Signal. You can send live instructions mid-call from your phone.
graph TB
subgraph External["EXTERNAL SERVICES"]
Twilio["Twilio<br/>Voice / PSTN<br/>STT (Gather)<br/>Webhooks"]
OpenAI["OpenAI<br/>GPT-4o (conversation)<br/>TTS (fallback)"]
ElevenLabs["ElevenLabs<br/>TTS (primary voice)<br/>eleven_multilingual_v2"]
end
subgraph Docker["DOCKER HOST (your server)"]
subgraph Ingress["INGRESS (choose one)"]
Caddy["Caddy :443/:80<br/>Let's Encrypt<br/>auto HTTPS"]
Cloudflared["Cloudflare Tunnel<br/>outbound, no open ports"]
end
subgraph AVA["AVA (FastAPI :8000)"]
Main["main.py<br/>Call routing<br/>Twilio hooks<br/>Rate limiter<br/>Audio serve<br/>Diagnostics"]
Conv["conversation.py<br/>GPT-4o loop<br/>Streaming<br/>Meta parsing<br/>Summarizer"]
TTS["tts.py<br/>ElevenLabs β OpenAI<br/>β Polly (fallback)<br/>Cache (MD5)<br/>Circuit breaker"]
Owner["owner_channel.py<br/>Signal notify<br/>Signal poll (3s)<br/>Slash commands<br/>Instructions"]
Contact["contact_lookup.py<br/>contacts.json<br/>Twilio CNAM<br/>E.164 normalize<br/>Lang from prefix"]
I18n["i18n.py<br/>8+ languages<br/>Signal templates<br/>Polly voices<br/>Twilio codes"]
end
SignalCLI["signal-cli :8080<br/>REST API<br/>Native mode<br/>Self-hosted"]
subgraph Volumes["Persistent Volumes"]
TTSCache["tts_cache (MP3s)"]
CallData["/data/calls/ (JSON)"]
Contacts["/data/contacts.json"]
SignalData["signal_data"]
end
end
OwnerPhone["Owner's Phone<br/>(Signal app)"]
Twilio -->|"HTTPS webhooks"| Caddy
Twilio -->|"HTTPS webhooks"| Cloudflared
Caddy -->|"ava-net"| Main
Cloudflared -->|"ava-net"| Main
Main <--> Conv
Main <--> TTS
Main <--> Owner
Main <--> Contact
Conv <--> I18n
Main <--> I18n
Conv -->|"HTTPS"| OpenAI
TTS -->|"HTTPS"| ElevenLabs
TTS -->|"HTTPS"| OpenAI
Owner -->|"HTTP (ava-net)"| SignalCLI
SignalCLI <-->|"Signal protocol"| OwnerPhone
TTS --> TTSCache
Main --> CallData
Contact --> Contacts
SignalCLI --> SignalData
style External fill:#f9f0ff,stroke:#7c3aed
style Docker fill:#f0f9ff,stroke:#2563eb
style AVA fill:#ecfdf5,stroke:#059669
style Ingress fill:#fef3c7,stroke:#d97706
style Volumes fill:#fef2f2,stroke:#dc2626
sequenceDiagram
participant Caller as Caller's Phone
participant Twilio as Twilio (PSTN + STT)
participant AVA as AVA Server
participant GPT as OpenAI GPT-4o
participant TTS as ElevenLabs / OpenAI TTS
participant Signal as Owner (Signal)
Caller->>Twilio: Dials owner (call forwarded)
Twilio->>AVA: POST /twilio/incoming<br/>(CallSid, From, To)
Note over AVA: Contact lookup (local/CNAM)<br/>Detect lang from phone prefix<br/>(+41βde-CH, +48βpl-PL)
AVA-->>Signal: π Incoming call notification
AVA->>TTS: Generate greeting TTS
TTS-->>AVA: MP3 audio URL
AVA->>Twilio: TwiML: Gather + Play<br/>speech_timeout=2s<br/>language=de-CH, enhanced=true
Twilio->>Caller: Plays greeting audio
loop Max 10 exchanges
Caller->>Twilio: Speaks
Twilio->>AVA: POST /process_speech<br/>(SpeechResult, Confidence)
Note over AVA: langdetect on text<br/>Pop Signal instructions
opt Owner sent instruction
Signal-->>AVA: "tell him I'll call back"
Note over AVA: Inject [RELAY_TO_CALLER: ...]<br/>into GPT user message
end
AVA->>GPT: Stream GPT-4o (user text + instructions)
GPT-->>AVA: Sentence chunks (streaming)
Note over AVA: TTS pipeline: start TTS on<br/>1st sentence while GPT<br/>still generates the rest
AVA->>TTS: TTS sentence 1 (parallel)
TTS-->>AVA: MP3 URL
AVA->>TTS: TTS remaining sentences
Note over AVA: Parse meta JSON<br/>end_call, urgency, topic,<br/>caller_name, lang
opt GPT switched language
Note over AVA: Update STT language<br/>for next Gather<br/>e.g. de-CH β pl-PL
end
AVA->>Twilio: TwiML: Gather + Play<br/>(updated STT language)
Twilio->>Caller: Plays response audio
opt Every 4 transcript entries
AVA-->>Signal: π Live update<br/>(topic, last 6 lines)
end
end
Note over AVA: end_call=true OR<br/>END_CALL_NOW from owner
AVA->>Twilio: TwiML: Play + Hangup
Twilio->>Caller: Goodbye + disconnect
Twilio->>AVA: POST /twilio/status<br/>CallStatus=completed
AVA->>GPT: Summarize full transcript
GPT-->>AVA: Summary text
AVA-->>Signal: π Call summary + priority
AVA-->>Signal: π Full transcript
Note over AVA: Save JSON to /data/calls/<br/>Cleanup after 90s delay
| Parameter | Value | Location | Description |
|---|---|---|---|
speech_timeout |
2 s | main.py (all 4 Gather calls) |
Silence after speech ends before Twilio fires callback |
enhanced |
true |
main.py (Gather) |
Use enhanced STT model for better accuracy |
GPT max_tokens |
350 | conversation.py |
Max response length per turn |
GPT temperature |
0.75 | conversation.py |
Creativity level for responses |
Summary max_tokens |
400 | conversation.py |
Max summary length |
Summary temperature |
0.2 | conversation.py |
Low creativity for factual summaries |
| Context window | last 20 messages | conversation.py |
Sliding window of conversation history |
| Hard turn limit | 10 exchanges | conversation.py |
AVA wraps up after 10 user turns |
| Wrap-up warning | 8+ exchanges | conversation.py |
System prompt warns AVA to end soon |
| ElevenLabs timeout | 15 s | tts.py (httpx) |
HTTP timeout for TTS API |
| ElevenLabs circuit breaker | 10 min | tts.py |
Disable after 401/403/429, auto-reset |
| Signal poll interval | 3 s | main.py / owner_channel.py |
How often AVA checks for new Signal messages |
| Signal HTTP timeout | 10 s | owner_channel.py (httpx) |
Timeout for Signal API calls |
| CNAM lookup timeout | 5 s | contact_lookup.py (httpx) |
Twilio CNAM API timeout |
| Rate limiter | 30 req/min per IP | main.py |
Sliding window, auto-cleanup every 5 min |
| Rate limiter cleanup | 5 min | main.py |
Stale entry eviction interval |
| Call state cleanup | 90 s after end | main.py |
Delayed cleanup of in-memory call state |
| TTS cache | no expiry | tts.py |
MD5(lang:text) keyed, persists in Docker volume |
| Seen Signal timestamps | 500 entries | owner_channel.py |
Deque for deduplication |
flowchart TD
Start([CALL START]) --> Prefix["Phone prefix detection<br/>+41 β de-CH<br/>+48 β pl-PL<br/>+44 β en-GB<br/>(52 prefixes)"]
Prefix --> ContactCheck{Contact has<br/>lang override?}
ContactCheck -->|Yes| ContactLang["Use contact language<br/>contacts.json<br/>e.g. {lang: pl}"]
ContactCheck -->|No| PrefixLang["Use prefix language"]
ContactLang --> Gather
PrefixLang --> Gather
Gather["Twilio STT Gather<br/>language = detected locale<br/>speech_timeout = 2s<br/>enhanced = true"]
Gather --> Speech["SpeechResult (text)"]
Speech --> Detect["langdetect on text<br/>(if 3+ words)<br/>e.g. DzieΕ dobry β pl"]
Detect --> GPT["GPT-4o processes text<br/>Responds in caller's language<br/>Returns meta with lang: pl"]
GPT --> Switch{GPT lang β <br/>current STT?}
Switch -->|Yes| Update["Switch STT language<br/>for NEXT Gather<br/>e.g. de-CH β pl-PL"]
Switch -->|No| Keep["Keep current STT language"]
Update --> Gather
Keep --> Gather
style Start fill:#059669,color:#fff
style Gather fill:#2563eb,color:#fff
style GPT fill:#7c3aed,color:#fff
style Switch fill:#d97706,color:#fff
Important limitation: Twilio STT only supports one language per Gather. If the caller speaks Polish but STT is set to German, the transcript will be garbled. The language switch only takes effect on the next turn.
flowchart TD
Input["Text to speak"] --> Cache{Disk cache hit?<br/>key = MD5 lang:text}
Cache -->|Yes| Serve["Return cached URL<br/>PUBLIC_URL/audio/hash.mp3"]
Cache -->|No| ELCheck{ElevenLabs<br/>available?<br/>API key set?<br/>Circuit breaker OK?}
ELCheck -->|Yes| EL["ElevenLabs API<br/>voice_id (env)<br/>model_id (env)<br/>timeout: 15s"]
ELCheck -->|No| OpenAI
EL -->|Success| Save["Save to cache<br/>Return URL"]
EL -->|Fail| OpenAI["OpenAI TTS<br/>model: tts-1<br/>voice: OPENAI_TTS_VOICE<br/>(default: nova)"]
OpenAI -->|Success| Save
OpenAI -->|Fail| Polly["Twilio Say (Polly)<br/>Last resort<br/>Built-in voice"]
EL -->|"401/403/429"| CB["Circuit Breaker<br/>Disable ElevenLabs<br/>for 10 minutes"]
CB --> OpenAI
Save --> Done([Audio URL returned])
Polly --> Done2([TwiML Say fallback])
style Input fill:#2563eb,color:#fff
style EL fill:#7c3aed,color:#fff
style OpenAI fill:#059669,color:#fff
style Polly fill:#dc2626,color:#fff
style CB fill:#d97706,color:#fff
style Done fill:#059669,color:#fff
sequenceDiagram
participant Owner as Owner's Signal
participant CLI as signal-cli REST API
participant AVA as AVA Server
loop Every 3 seconds
AVA->>CLI: GET /v1/receive
CLI-->>AVA: [] (no messages)
end
Note over AVA: INCOMING CALL
AVA->>CLI: POST /v2/send
CLI->>Owner: π Incoming call<br/>From: Jan (+48...)<br/>π pl-PL
Owner->>CLI: "tell him I'll call back"
AVA->>CLI: GET /v1/receive
CLI-->>AVA: [message data]
Note over AVA: Queue instruction<br/>for active call
AVA->>CLI: POST /v2/send
CLI->>Owner: β
AVA will tell the caller
Note over AVA: Next speech turn:<br/>inject instruction<br/>into GPT context
Note over AVA: After 4 transcript entries
AVA->>CLI: POST /v2/send
CLI->>Owner: π Call in progress<br/>π‘ Topic: invoice dispute<br/>Last 6 lines of transcript
Note over AVA: CALL ENDS
AVA->>CLI: POST /v2/send
CLI->>Owner: π Call summary<br/>Priority + AI summary
AVA->>CLI: POST /v2/send
CLI->>Owner: π Full transcript
| Command | Description |
|---|---|
/ping |
Alive check + timestamp |
/status |
Uptime, active calls, public URL |
/stats |
Call count, memory, TTS cache size |
/calls |
Last 5 call records with topics |
/restart |
Restart AVA (requires /restart confirm) |
/help |
Command list |
flowchart LR
subgraph Signal["Owner sends via Signal"]
A["tell him I'll call at 3"]
B["ask for order number"]
C["be more formal"]
D["end"]
end
subgraph GPT["AVA injects into GPT context"]
A2["[RELAY_TO_CALLER: I'll call at 3]"]
B2["[ASK_CALLER: order number]"]
C2["[OWNER_INSTRUCTION: be more formal]"]
D2["END_CALL_NOW + force_end flag"]
end
A --> A2
B --> B2
C --> C2
D --> D2
GPT --> Response["GPT acts on markers<br/>naturally within response"]
style Signal fill:#f0f9ff,stroke:#2563eb
style GPT fill:#ecfdf5,stroke:#059669
Every GPT response ends with an invisible metadata block:
Hello, I'm Maya, Jacek's assistant. How can I help you today?
<meta>{"end_call": false, "urgency": "low", "topic": "general inquiry",
"caller_name": "Jan", "lang": "en"}</meta>
| Field | Purpose |
|---|---|
end_call |
true β AVA hangs up after this response |
urgency |
low / medium / high β emoji in Signal summary |
topic |
Short English description for Signal notifications |
caller_name |
First name if mentioned by caller |
lang |
Two-letter code (pl, en, de) β used to switch STT language |
graph LR
subgraph compose["docker-compose.yml"]
ava["ava<br/>FastAPI :8000<br/>Python 3.11"]
signal["signal-cli<br/>REST API :8080<br/>Native mode"]
caddy["caddy<br/>:80 / :443<br/>Let's Encrypt"]
tunnel["cloudflared<br/>Cloudflare Tunnel<br/>outbound only"]
end
ava -->|depends_on| signal
caddy -->|depends_on| ava
tunnel -->|depends_on| ava
caddy -.-|"profile: caddy"| note1["Open ports 80/443"]
tunnel -.-|"profile: tunnel"| note2["No open ports"]
style ava fill:#059669,color:#fff
style signal fill:#2563eb,color:#fff
style caddy fill:#d97706,color:#fff
style tunnel fill:#7c3aed,color:#fff
| Variable | Default | Description |
|---|---|---|
| Twilio | ||
TWILIO_ACCOUNT_SID |
(required) | Twilio account identifier |
TWILIO_AUTH_TOKEN |
(required) | Auth token, also validates webhook signatures |
TWILIO_PHONE_NUMBER |
(required) | Your Twilio virtual number |
| Signal | ||
SIGNAL_CLI_URL |
http://signal-cli:8080 |
Internal signal-cli API address |
SIGNAL_SENDER_NUMBER |
(required) | Bot's Signal number |
SIGNAL_RECIPIENT |
(required) | Your personal Signal number |
SIGNAL_LANG |
en |
Signal notification language (en / pl) |
| OpenAI | ||
OPENAI_API_KEY |
(required) | OpenAI API key |
OPENAI_MODEL |
gpt-4o |
GPT model for conversation |
| ElevenLabs | ||
ELEVENLABS_API_KEY |
(empty) | Leave blank to skip ElevenLabs |
ELEVENLABS_VOICE_ID |
WAhoMTNdLdMoq1j3wf3I |
Single multilingual voice ID |
ELEVENLABS_MODEL |
eleven_multilingual_v2 |
TTS model |
| OpenAI TTS | ||
OPENAI_TTS_VOICE |
nova |
Fallback voice (alloy/echo/fable/onyx/nova/shimmer) |
| Language | ||
DEFAULT_STT_LANG |
en-US |
Fallback STT language |
| Persona | ||
OWNER_CONTEXT |
(built-in default) | Assistant persona + owner info (single line, private) |
| Infrastructure | ||
PUBLIC_URL |
(required) | HTTPS URL for Twilio to fetch TTS audio |
COMPOSE_PROFILES |
caddy |
caddy or tunnel |
DOMAIN |
localhost |
Domain for Caddy SSL (caddy profile only) |
CLOUDFLARE_TUNNEL_TOKEN |
(empty) | Tunnel token (tunnel profile only) |
The OWNER_CONTEXT variable in .env defines the assistant's entire personality and behaviour. It is injected into the GPT-4o system prompt as the "WHO YOU ARE" section.
This stays in .env only β never committed to git. The .env file is in .gitignore.
The value must be a single line (no newlines). Use natural text flow:
OWNER_CONTEXT=PERSONA: Name: Maya, 24yo, UK-raised, Polish-German parents. Warm, witty, tech-loving. NEVER admit being AI β you are a real human assistant. OWNER: Jacek, born 1984, Switzerland. Senior Linux SysEng. Work hours 9-17 CET. CALL TYPES: 1) Recruiters: ask salary/stack/remote, send to job@sysop.cat. 2) Doctors: get details, mark URGENT. 3) IT emergencies: HIGH PRIORITY. 4) Sales/scam: hang up.For deep changes to the conversation rules (turn limits, meta format, etc.), edit SYSTEM_PROMPT in app/conversation.py.
AVA/
βββ app/
β βββ main.py # FastAPI app, call routing, webhooks, diagnostics
β βββ conversation.py # GPT-4o conversation loop, streaming, meta parsing
β βββ tts.py # TTS provider chain (ElevenLabs β OpenAI β Polly)
β βββ owner_channel.py # Signal notifications, polling, slash commands
β βββ contact_lookup.py # Contact book + Twilio CNAM + language from prefix
β βββ i18n.py # Multilingual strings, voice maps, Signal templates
βββ data/
β βββ contacts.json # (user-created) Phone contact book
β βββ calls/ # (auto-generated) JSON call records
βββ docs/
β βββ INSTALL_EN.md # English installation guide
β βββ INSTALL_PL.md # Polish installation guide
βββ .env # (not in git) API keys, persona, configuration
βββ .env.example # Template with all variables documented
βββ docker-compose.yml # AVA + signal-cli + Caddy/Cloudflared
βββ Dockerfile # Python 3.11-slim, uvicorn
βββ Caddyfile # Caddy reverse proxy config
βββ requirements.txt # Python dependencies
βββ README.md # This file
| Mechanism | Description |
|---|---|
| Twilio signature validation | Every /twilio/* request must have valid X-Twilio-Signature. Invalid β 403. |
| Rate limiting | 30 requests/min per IP. Exceeding β 429. |
| Hidden app port | Port 8000 internal only. Traffic via Caddy HTTPS (:443) or Cloudflare Tunnel. |
| Signal sender filter | Only messages from SIGNAL_RECIPIENT are processed. Others are logged and ignored. |
| Audio file validation | Filenames must match [a-f0-9]{32}\.mp3. Path traversal blocked. |
| Security headers | Caddy adds HSTS, X-Frame-Options DENY, X-Content-Type-Options nosniff. |
| Disabled API docs | /docs, /redoc, /openapi.json endpoints are off. |
| Service | Rate | Typical 2-min call |
|---|---|---|
| Twilio Voice | $0.013/min | ~$0.03 |
| Twilio STT (enhanced) | $0.02/15s | ~$0.16 |
| OpenAI GPT-4o | ~$0.01/1k tokens | ~$0.005 |
| ElevenLabs | from $5/month | (30k chars free tier) |
| Twilio CNAM Lookup | $0.01/query | $0.01 (unknown numbers only) |
Typical call: ~$0.20β0.25
| Message | What happens |
|---|---|
tell him I'll call back tomorrow at 10 |
AVA naturally relays this to the caller |
ask for the order number |
AVA asks the caller |
end / stop / koniec |
AVA wraps up the call gracefully |
status or ? |
Confirms whether a call is active |
| Any other text | Forwarded as a generic instruction |
See the detailed installation guides:
- English: docs/INSTALL_EN.md
- Polish: docs/INSTALL_PL.md
cp .env.example .env
# Edit .env β fill in API keys, OWNER_CONTEXT, PUBLIC_URL
mkdir -p data/calls
docker compose up -d
curl https://your-domain.com/health# Twilio can't reach the webhook?
curl -I https://your-domain.com/health
# TTS audio not playing?
docker compose logs ava | grep -i tts
# Signal not sending?
docker compose logs ava-signal-cli
curl http://localhost:8080/v1/accounts
# Check active calls
# Send "status" or "/status" to the Signal bot
# Clear TTS cache (after voice change)
docker exec ava sh -c 'rm -f /tmp/tts_cache/*.mp3'
# View recent call logs
ls -lt data/calls/ | head