From 2f204ccd28ec637bd83f092f88a98c3cfb2cc54a Mon Sep 17 00:00:00 2001 From: Ryan McWhorter Date: Wed, 26 Mar 2025 16:35:01 -0700 Subject: [PATCH 1/2] docs for smartEndpointingPlan.waitFunction --- fern/customization/speech-configuration.mdx | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index 9a34243de..607112056 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -23,7 +23,16 @@ This plan defines the parameters for when the assistant begins speaking after th ![LiveKit Smart Endpointing Configuration](../static/images/advanced-tab/livekit-smart-endpointing.png) - **Example:** In insurance claims, Vapi's smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak. + **LiveKit Smart Endpointing Configuration:** + When using LiveKit, you can customize the `waitFunction` parameter which determines how long the bot will wait to start speaking based on the likelihood that the user has finished speaking: + + ``` + waitFunction: "200 + 8000 * x" + ``` + + This function maps probabilities (0-1) to milliseconds of wait time. A probability of 0 means high confidence the caller has stopped speaking, while 1 means high confidence they're still speaking. The default function (`200 + 8000 * x`) creates a wait time between 200ms (when x=0) and 8200ms (when x=1). You can customize this with your own mathematical expression, such as `4000 * (1 - cos(pi * x))` for a different response curve. + + **Example:** In insurance claims, smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak. - **Transcription-Based Detection**: Customize how the assistant determines that the customer has stopped speaking based on what they’re saying. This offers more control over the timing. **Example:** When a customer says, "My account number is 123456789, I want to transfer $500." - The system detects the number "123456789" and waits for 0.5 seconds (`WaitSeconds`) to ensure the customer isn't still speaking. From 2b3a9f8ec2177909ba0f24c7c3c0755bfdcd8c9e Mon Sep 17 00:00:00 2001 From: Ryan McWhorter Date: Wed, 26 Mar 2025 17:00:15 -0700 Subject: [PATCH 2/2] sesame docs --- fern/docs.yml | 2 ++ fern/providers/voice/sesame.mdx | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 fern/providers/voice/sesame.mdx diff --git a/fern/docs.yml b/fern/docs.yml index 9d771ec44..2d4df5bdc 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -276,6 +276,8 @@ navigation: path: providers/voice/rimeai.mdx - page: Deepgram path: providers/voice/deepgram.mdx + - page: Sesame + path: providers/voice/sesame.mdx - section: Video Models contents: diff --git a/fern/providers/voice/sesame.mdx b/fern/providers/voice/sesame.mdx new file mode 100644 index 000000000..1752c528e --- /dev/null +++ b/fern/providers/voice/sesame.mdx @@ -0,0 +1,32 @@ +--- +title: Sesame +subtitle: What is Sesame CSM-1B? +slug: providers/voice/sesame +--- + +**What is Sesame CSM-1B?** + +Sesame CSM-1B is an open source text-to-speech (TTS) model that Vapi hosts for seamless integration into your voice applications. Currently in beta, this model delivers natural-sounding speech synthesis with a single default voice option. + +**Key Features:** + +- **Vapi-Hosted Solution**: Access this open source model directly through Vapi without managing your own infrastructure +- **Single Default Voice**: Currently offers one voice option optimized for clarity and naturalness +- **Beta Release**: Early access to this emerging TTS technology + +**Integration Benefits:** + +- Simplified setup with no need to self-host the model +- Consistent performance through Vapi's optimized infrastructure +- Seamless compatibility with all Vapi voice applications + +**Use Cases:** + +- Virtual assistants and conversational AI +- Content narration and audio generation +- Interactive voice applications +- Prototyping voice-driven experiences + +**Current Limitations:** + +As this is a beta release, the model currently offers limited customization options with only one default voice available. Additional features and voice options may be introduced in future updates. \ No newline at end of file