Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,19 @@ For the **complete schema** of all available properties on each resource type, c
- **Assistant names** use natural language: `Intake Assistant`, `Booking Assistant`
- **Structured output names** use `snake_case`: `customer_data`, `call_summary`

### Renaming an existing resource

The engine has a `name_mismatch` guard that auto-bootstraps state from the dashboard before applying changes. **Editing `.vapi-state.<org>.json` by hand to repoint a renamed file at the existing dashboard UUID does not work** — the bootstrap runs first, overwrites your manual edit, and the rename gets treated as "delete the old resource + create a new one."

What this means in practice for renames:

| Approach | What happens |
|---|---|
| Rename the file locally + `npm run push -- <org>` | New UUID is minted for the renamed file; the old UUID becomes orphaned in the dashboard. Run `npm run cleanup -- <org> --force` (or `npm run push -- <org> --force <file>`) to delete the orphan. |
| Rename in the dashboard first, then `npm run pull -- <org>` | UUID is preserved. The pulled file lands with the new name and the existing UUID suffix; no orphan. |

If preserving the UUID matters (e.g. it's referenced from a phone number, outbound campaign, or external integration), rename via the dashboard first and pull. Otherwise, accept the new UUID and clean up the orphan.

---

## Common Patterns
Expand Down
46 changes: 46 additions & 0 deletions docs/learnings/assistants.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,38 @@ Priority order: top-level `inputPunctuationBoundaries` > `chunkPlan.punctuationB

When merging voice overrides in squads, `inputPunctuationBoundaries` arrays are **unioned** (combined), not replaced. This can lead to more chunk boundaries than expected.

### Cartesia-specific config gotchas

Cartesia voices share the `voice` schema with other providers but reject several fields and require a few non-obvious nesting paths. Pushes fail with confusing 400s if you carry over an ElevenLabs config wholesale.

| Field | Behavior on Cartesia | Workaround |
|---|---|---|
| `enableSsmlParsing` | **Rejected** — ElevenLabs-only field | Omit it on Cartesia voice config |
| Top-level `voice.speed` | **Rejected** — must be nested | Use `voice.generationConfig.speed: 0.95` |
| Top-level `voice.stability` / `voice.similarityBoost` | Ignored — ElevenLabs-only | Omit; Cartesia tunes consistency through `generationConfig` knobs |
| `pronunciationDictId` | Supported on `sonic-3` only | Confirm `model: sonic-3` before attaching a dict |
| `accentLocalization` | Nested under `generationConfig.experimental` | `voice.generationConfig.experimental.accentLocalization: 1` |

```yaml
voice:
provider: cartesia
model: sonic-3
voiceId: your-voice-id
pronunciationDictId: pdict_xxxxxxxxxxxxx
generationConfig:
speed: 0.95
experimental:
accentLocalization: 1
```

### Cartesia Sonic-3 garbles em-dashes and SSML `<break>` tags

**What you might expect:** Em-dashes and `<break time='0.3s'/>` give you natural pauses, the same way they do on ElevenLabs.

**What actually happens:** Sonic-3's chunking pipeline mishandles both. Em-dashes can produce truncated or stitched audio (occasionally swallowing the next word), and explicit `<break>` tags inside Cartesia output sometimes mangle nearby phonemes. The failure mode is intermittent and shows up as "weird audio glitches" in QA.

**Recommendation:** When writing prompts for Cartesia Sonic-3, prefer commas, semicolons, and periods for pacing. If you're porting prompts from another TTS provider, search-and-replace `—` and `<break .../>` before pushing.

---

## Transcriber Configuration
Expand Down Expand Up @@ -384,6 +416,20 @@ These are complementary, not alternatives.

`numWords: 2` means the user must speak 2 words before the assistant stops talking. Lower values make the assistant more interruptible.

### `numWords: 2` produces a 500–800ms TTS overlap window

**Why this matters for transcript quality, not just feel:** While the assistant waits for the second word to land before stopping, both speakers are talking simultaneously. That overlap window is typically **500–800ms** at conversational pace. STT confidence drops sharply during overlap, so the customer's first sentence after a barge-in often arrives garbled — wrong words, dropped clauses, or low-confidence transcripts that get filtered out (see `confidenceThreshold` above).

**Recommendation:** For barge-in-heavy use cases (objection handling, fast-paced dialogue), use `numWords: 1` and lean on Krisp denoising (`backgroundDenoisingEnabled: true`) to keep the assistant's own audio out of the customer's transcript. The trade-off is slightly more "false interrupts" on filler words like "um" or "yeah", which is usually preferable to garbled customer turns.

```yaml
stopSpeakingPlan:
numWords: 1
voiceSeconds: 0.2
backoffSeconds: 1.0
backgroundDenoisingEnabled: true
```

---

## Analysis & Artifacts
Expand Down
14 changes: 14 additions & 0 deletions docs/learnings/multilingual.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,20 @@ voice: { provider: eleven-labs, voiceId: your-spanish-voice }

---

## English-heavy `keyterm` array biases Deepgram `language: multi` toward English

**What you might expect:** `keyterm` is a vocabulary boost — terms get recognized more reliably regardless of which language the customer is speaking.

**What actually happens:** With `model: nova-3` and `language: multi`, the language ID step uses partial transcripts as a signal. A `keyterm` array dominated by English brand names, English product terms, or English acronyms tilts that signal toward English, especially on short utterances or code-switched turns. The result is Deepgram routing non-English speech through the English pipeline, producing low-confidence transcripts that may get filtered out entirely (see `confidenceThreshold` in `assistants.md`).

This is most visible when a Spanish-only customer is misrecognized as English on their first utterance, which then cascades — the assistant responds in English, the customer gets confused, and the loop continues.

**Recommendation for code-switching customers:** Use **Gladia Solaria** (`provider: gladia`, `languageBehaviour: automatic multiple languages`) instead of Deepgram `language: multi`. Solaria is built around code-switching as a first-class case and isn't biased by `keyterm` content the same way. See [Approach 1](#approach-1-single-static-agent) for the full transcriber comparison.

**If you must stay on Deepgram multi:** Keep `keyterm` short (under 20 entries), include the customer's expected non-English equivalents, and avoid English-only acronyms that have no foreign-language form.

---

## Further Reading

- [Vapi Multilingual Documentation](https://docs.vapi.ai/customization/multilingual)
Expand Down
85 changes: 85 additions & 0 deletions docs/learnings/squads.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,91 @@ assistantOverrides:

---

## Inline `model.messages` in `assistantOverrides` silently shadows the assistant `.md`

**What you might expect:** `assistantOverrides` is a deep merge — partial fields override partial fields, and the assistant's own `.md` system prompt is preserved if you don't touch `model.messages`.

**What actually happens:** If you add `model.messages` (or any `model.*` field that includes the system message) inside a squad member's `assistantOverrides`, that array fully replaces the assistant's compiled prompt at runtime. The `.md` body becomes dead code for that member — silently. There is no warning at push time, no diff in the dashboard that calls it out, and the only symptom is the assistant behaving differently in the squad than it does standalone.

This is especially insidious when the override is large (e.g. a multi-thousand-character prompt pasted inline), because the inline text drifts away from the `.md` source over time and no longer matches.

**Recommendation:**

- Treat the assistant `.md` file as the single source of truth for the system prompt.
- Use `assistantOverrides` for non-prompt knobs (`tools:append`, `temperature`, `firstMessage`, `firstMessageMode`, `voice`, `transcriber`).
- If you genuinely need a different prompt for a squad context, create a second assistant `.md` and reference it as a separate squad member instead of inlining the prompt.

```yaml
# WRONG — this silently replaces the assistant's .md prompt
members:
- assistantId: faq-specialist-a1b2c3d4
assistantOverrides:
model:
provider: openai
model: gpt-4.1
messages:
- role: system
content: |
You are an FAQ specialist. (...8000 chars of prompt drifting from the .md...)

# CORRECT — keep the .md as the only prompt source
members:
- assistantId: faq-specialist-a1b2c3d4
assistantOverrides:
model:
temperature: 0.3 # non-prompt overrides only
tools:append:
- type: handoff
# ...
```

---

## `firstMessage` replays on every handoff re-entry, not just call start

**What you might expect:** `firstMessage` is the assistant's opening line at the start of a call.

**What actually happens:** With `firstMessageMode` at its default (`assistant-speaks-first`), `firstMessage` fires **every time control hands back to that assistant** — not just on the initial call. In a squad with cyclical routing (e.g. Primary → FAQ → Primary, or Closeout → Primary on objection), the customer hears the intro line repeated on each re-entry, which sounds like a hard reset of the conversation.

**Recommendation:** For any squad member that can be re-entered after a handoff (i.e. any member except a strictly terminal one like Closeout), set:

```yaml
firstMessage: ""
firstMessageMode: assistant-speaks-first-with-model-generated-message
```

The LLM then synthesizes a contextual continuation line on re-entry rather than replaying the intro. Pair this with a "CALL-START vs HANDOFF-RE-ENTRY" block at the top of the system prompt so the model knows which behavior to use:

```
# RE-ENTRY PROTOCOL

If this is the first turn of the call (no prior conversation in your context),
greet the caller and begin the workflow.

If you are receiving control via a handoff (prior conversation present), do
NOT re-greet. Pick up from where the previous specialist left off.
```

The terminal member (Closeout, etc.) is the only place a static `firstMessage` is safe — and only because nothing should hand back to it.

---

## Two silence handlers fire at once when both are configured

**What you might expect:** `messagePlan.idleMessages` (per-assistant) and `customer.speech.timeout` hooks (per-assistant or via `membersOverrides.hooks`) are alternative ways to handle silence — pick one and the other is dormant.

**What actually happens:** Both fire independently on the same silence event. If a member has `idleMessages` AND the squad has `membersOverrides.hooks` with a `customer.speech.timeout` action, the customer hears the idle message **and** the hook's spoken action back-to-back, often within the same beat. It feels like the agent is interrupting itself.

**Recommendation:** Pick one mechanism per squad. Squad-level `customer.speech.timeout` hooks are usually preferable because:

- They apply uniformly to every member without per-assistant duplication.
- They support escalation patterns (`triggerMaxCount`, `triggerResetMode: onUserSpeech`) that idle messages don't.
- They can chain `say` + `endCall` for graceful timeout-based hangup.

If you choose hooks, leave `messagePlan` unset on each member (or set `idleMessages: []`). If you choose idle messages, omit the silence hook from `membersOverrides`. See [call-duration.md](call-duration.md) for the timeout-vs-hook distinction.

---

## FAQ agent consolidation pattern

When a squad has multiple specialist agents that each carry one knowledge base tool, the LLM must correctly classify and route the question before it even reaches a KB. If the routing is wrong, the KB returns "I don't have enough information" — not because the knowledge doesn't exist, but because the wrong KB was queried.
Expand Down